[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

On 11/02/14 14:10, Tim Deegan wrote:
> At 13:59 +0000 on 11 Feb (1392123546), Andrew Cooper wrote:
>> On 11/02/14 13:15, Tim Deegan wrote:
>>> At 12:50 +0000 on 11 Feb (1392119457), Jan Beulich wrote:
>>>>>>> On 11.02.14 at 13:11, Tim Deegan <tim@xxxxxxx> wrote:
>>>>> At 09:15 +0000 on 11 Feb (1392106520), Jan Beulich wrote:
>>>>>>>>> On 10.02.14 at 18:21, Tim Deegan <tim@xxxxxxx> wrote:
>>>>>>> That is the main change of this cset:  we go back to driving
>>>>>>> the interrupt from the vpt code and fixing up the RTC state after vpt
>>>>>>> tells us it's injected an interrupt.
>>>>>> And that's what is wrong imo, as it doesn't allow driving PF correctly
>>>>>> when !PIE.
>>>>> Oh, I see -- the current code doesn't turn the vpt off when !PIE.  Can
>>>>> you remember why not?  Have I forgotten some wrinkle or race here?
>>>> Because an OS could inspect PF without setting PIE.
>>> Ugh. :( 
>>>>>>> Yeah, this has nothing to do with the bug being fixed here.  The old
>>>>>>> REG_C read was operating correctly, but on the return-to-guest path:
>>>>>>>  - vpt sees another RTC interrupt is due and calls RTC code
>>>>>>>  - RTC code sees REG_C clear, sets PF|IRQF and asserts the line
>>>>>>>  - vlapic code sees the last interrupt is still in the ISR and does
>>>>>>>    nothing;
>>>>>>>  - we return to the guest having set IRQF but not consumed a timer
>>>>>>>    event, so vpt stste is the same
>>>>>>>  - the guest sees the old REG_C, with PF|IRQF set, and re-reads, 
>>>>>>>    waiting for a read of 0.
>>>>>>>  - repeat forever.
>>>>>> Which would call for a flag suppressing the setting of PF|IRQF
>>>>>> until the timer event got consumed. Possibly with some safety
>>>>>> belt for this to not get deferred indefinitely (albeit if the interrupt
>>>>>> doesn't get injected for extended periods of time, the guest
>>>>>> would presumably have more severe problems than these flags
>>>>>> not getting updated as expected).
>>>>> That's pretty much what we're doing here -- the pt_intr_post callback
>>>>> sets PF|IRQF when the interrupt is injected.
>>>> Right, except you do this be reverting other stuff rather than
>>>> adding the missing functionality on top.
>>> Absolutely -- because once we went back to having PF set only when the
>>> interrupt was injected, it seemed better to reduce the amount of
>>> special-case plumbing for RTC than to add yet more.
>>> But for the case of an OS polling for PF with PIE clear, I guess we
>>> might need to keep all the current special cases.  Was that a known
>>> observed bug or a theoretical one?  I can't see a way of handling
>>> both that case and the w2k3 case.
>>> Either we always set PF when the tick happens, even if the interrupt
>>> is masked (which breaks w2k3) or we don't set it until we can deliver
>>> the interrupt (which breaks pollers).
>> This doesn't break w2k3.  Setting PF when a tick happens (or should
>> happen for !PIE) is the correct thing to do.
>> The bug is that we see an interrupt pending and set PF when we
>> shouldn't
> We _are_ setting PF when the tick happens; it's just that because of
> no-missed-ticks mode the tick happens before w2k3 has finished
> handling the last one.  At that point, anything we do breaks w2k3 in
> some way -- either we leave the tick pending until the interrupt is
> actually delivered (which leads to the hang) or we consume the tick
> even though the interrupt will be lost (which causes clock drift).
> Tim.

No - we are setting PF on every vmentry, not every tick.

* pt_update_irq() finds the timer pending and decides to inject an
interrupt.  This sets REG_C.PF
* {svm,vmx}_intr_assist() bails early because it can't actually inject
the interrupt.
* pt_intr_post() doesn't run, which doesn't update the PT state, yet
because pt_update_irq() thought it was injecting an interrupt, it didn't
run its faked-up pt_intr_post()
* Next VMentry finds an erroneously pending tick and decides to inject
an interrupt.

If w2k3 were to repeatedly read REG_C alone without writing to the index
register in-between, it would observe PF being alternately set and clear.

w2k3 would still work perfectly fine if we only set PF when actually
injecting the interrupt, which is why the patch at the root of this
thread fixes the observed hang.  However, Jans comment about the
behaviour of the PF bit when !PIE is quite correct.

Therefore, for !PIE, PF must be updated ahead of time, but for PIE, it
must be updated when the interrupt is actually injected.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.