[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

>>> On 10.02.14 at 18:21, Tim Deegan <tim@xxxxxxx> wrote:
> At 16:34 +0000 on 10 Feb (1392046444), Jan Beulich wrote:
>> > The underlying problem here comes because the AF and UF bits of RTC 
>> > interrupt
>> > state is modelled by the RTC code, but the PF is modelled by the pt code.  
>> > The
>> > root cause of windows infinite loop is that RTC_PF is being re-set on 
>> > vmentry
>> > before the interrupt logic has worked out that it can't actually inject an 
>> > RTC
>> > interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it
>> > should be reading 0.
>> So you're undoing a whole lot of changes done with the goal of
>> getting the overall emulation closer to what real hardware does,
>> just to paper over an issue elsewhere in the code? Not really an
>> approach I'm in favor of.
> My understanding was that the problem is explicitly in the change to
> how RTC code is called from vpt code.
> Originally, the RTC callback was called from pt_intr_post, like other
> vpt sources.  Your rework changed it to be called much earlier, when
> the vpt was considering which time source to choose.  AIUI that was to
> let the RTC code tell the VPT not to inject, if the guest hasn't acked
> the last interrupt, right?

Not only, it was also a necessary pre-adjustment for the !PIE case
to work (see below).

> Since that was changed later to allow a certain number of dead ticks
> before deciding to stop the timer chain, the decision no longer has to
> be made so early -- we can allow one more IRQ to go in and then
> disable it. 
> That is the main change of this cset:  we go back to driving
> the interrupt from the vpt code and fixing up the RTC state after vpt
> tells us it's injected an interrupt.

And that's what is wrong imo, as it doesn't allow driving PF correctly
when !PIE.

>> > -        rtc_update_irq(s);
>> So given the problem description, this would seem to be the most
>> important part at a first glance. But looking more closely, I'm getting
>> the impression that the call to rtc_update_irq() had no effect at all
>> here anyway: The function would always bail on the second if() due
>> to REG_C having got cleared a few lines up.
> Yeah, this has nothing to do with the bug being fixed here.  The old
> REG_C read was operating correctly, but on the return-to-guest path:
>  - vpt sees another RTC interrupt is due and calls RTC code
>  - RTC code sees REG_C clear, sets PF|IRQF and asserts the line
>  - vlapic code sees the last interrupt is still in the ISR and does
>    nothing;
>  - we return to the guest having set IRQF but not consumed a timer
>    event, so vpt stste is the same
>  - the guest sees the old REG_C, with PF|IRQF set, and re-reads, 
>    waiting for a read of 0.
>  - repeat forever.

Which would call for a flag suppressing the setting of PF|IRQF
until the timer event got consumed. Possibly with some safety
belt for this to not get deferred indefinitely (albeit if the interrupt
doesn't get injected for extended periods of time, the guest
would presumably have more severe problems than these flags
not getting updated as expected).


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.