Xen project Mailing List

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

To: Tim Deegan <tim@xxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 11 Feb 2014 14:52:25 +0000

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, KeirFraser <keir@xxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, roger.pau@xxxxxxxxxx

Delivery-date: Tue, 11 Feb 2014 15:09:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/02/14 14:10, Tim Deegan wrote: > At 13:59 +0000 on 11 Feb (1392123546), Andrew Cooper wrote: >> On 11/02/14 13:15, Tim Deegan wrote: >>> At 12:50 +0000 on 11 Feb (1392119457), Jan Beulich wrote: >>>>>>> On 11.02.14 at 13:11, Tim Deegan <tim@xxxxxxx> wrote: >>>>> At 09:15 +0000 on 11 Feb (1392106520), Jan Beulich wrote: >>>>>>>>> On 10.02.14 at 18:21, Tim Deegan <tim@xxxxxxx> wrote: >>>>>>> That is the main change of this cset: we go back to driving >>>>>>> the interrupt from the vpt code and fixing up the RTC state after vpt >>>>>>> tells us it's injected an interrupt. >>>>>> And that's what is wrong imo, as it doesn't allow driving PF correctly >>>>>> when !PIE. >>>>> Oh, I see -- the current code doesn't turn the vpt off when !PIE. Can >>>>> you remember why not? Have I forgotten some wrinkle or race here? >>>> Because an OS could inspect PF without setting PIE. >>> Ugh. :( >>> >>>>>>> Yeah, this has nothing to do with the bug being fixed here. The old >>>>>>> REG_C read was operating correctly, but on the return-to-guest path: >>>>>>> - vpt sees another RTC interrupt is due and calls RTC code >>>>>>> - RTC code sees REG_C clear, sets PF|IRQF and asserts the line >>>>>>> - vlapic code sees the last interrupt is still in the ISR and does >>>>>>> nothing; >>>>>>> - we return to the guest having set IRQF but not consumed a timer >>>>>>> event, so vpt stste is the same >>>>>>> - the guest sees the old REG_C, with PF|IRQF set, and re-reads, >>>>>>> waiting for a read of 0. >>>>>>> - repeat forever. >>>>>> Which would call for a flag suppressing the setting of PF|IRQF >>>>>> until the timer event got consumed. Possibly with some safety >>>>>> belt for this to not get deferred indefinitely (albeit if the interrupt >>>>>> doesn't get injected for extended periods of time, the guest >>>>>> would presumably have more severe problems than these flags >>>>>> not getting updated as expected). >>>>> That's pretty much what we're doing here -- the pt_intr_post callback >>>>> sets PF|IRQF when the interrupt is injected. >>>> Right, except you do this be reverting other stuff rather than >>>> adding the missing functionality on top. >>> Absolutely -- because once we went back to having PF set only when the >>> interrupt was injected, it seemed better to reduce the amount of >>> special-case plumbing for RTC than to add yet more. >>> >>> But for the case of an OS polling for PF with PIE clear, I guess we >>> might need to keep all the current special cases. Was that a known >>> observed bug or a theoretical one? I can't see a way of handling >>> both that case and the w2k3 case. >>> >>> Either we always set PF when the tick happens, even if the interrupt >>> is masked (which breaks w2k3) or we don't set it until we can deliver >>> the interrupt (which breaks pollers). >> This doesn't break w2k3. Setting PF when a tick happens (or should >> happen for !PIE) is the correct thing to do. >> >> The bug is that we see an interrupt pending and set PF when we >> shouldn't > We _are_ setting PF when the tick happens; it's just that because of > no-missed-ticks mode the tick happens before w2k3 has finished > handling the last one. At that point, anything we do breaks w2k3 in > some way -- either we leave the tick pending until the interrupt is > actually delivered (which leads to the hang) or we consume the tick > even though the interrupt will be lost (which causes clock drift). > > Tim. No - we are setting PF on every vmentry, not every tick. * pt_update_irq() finds the timer pending and decides to inject an interrupt. This sets REG_C.PF * {svm,vmx}_intr_assist() bails early because it can't actually inject the interrupt. * pt_intr_post() doesn't run, which doesn't update the PT state, yet because pt_update_irq() thought it was injecting an interrupt, it didn't run its faked-up pt_intr_post() * Next VMentry finds an erroneously pending tick and decides to inject an interrupt. If w2k3 were to repeatedly read REG_C alone without writing to the index register in-between, it would observe PF being alternately set and clear. w2k3 would still work perfectly fine if we only set PF when actually injecting the interrupt, which is why the patch at the root of this thread fixes the observed hang. However, Jans comment about the behaviour of the PF bit when !PIE is quite correct. Therefore, for !PIE, PF must be updated ahead of time, but for PIE, it must be updated when the interrupt is actually injected. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.