Xen project Mailing List

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

From: Tim Deegan <tim@xxxxxxx>

Date: Mon, 10 Feb 2014 18:21:40 +0100

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, KeirFraser <keir@xxxxxxx>, roger.pau@xxxxxxxxxx

Delivery-date: Mon, 10 Feb 2014 17:21:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 16:34 +0000 on 10 Feb (1392046444), Jan Beulich wrote: > > The underlying problem here comes because the AF and UF bits of RTC > > interrupt > > state is modelled by the RTC code, but the PF is modelled by the pt code. > > The > > root cause of windows infinite loop is that RTC_PF is being re-set on > > vmentry > > before the interrupt logic has worked out that it can't actually inject an > > RTC > > interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it > > should be reading 0. > > So you're undoing a whole lot of changes done with the goal of > getting the overall emulation closer to what real hardware does, > just to paper over an issue elsewhere in the code? Not really an > approach I'm in favor of. My understanding was that the problem is explicitly in the change to how RTC code is called from vpt code. Originally, the RTC callback was called from pt_intr_post, like other vpt sources. Your rework changed it to be called much earlier, when the vpt was considering which time source to choose. AIUI that was to let the RTC code tell the VPT not to inject, if the guest hasn't acked the last interrupt, right? Since that was changed later to allow a certain number of dead ticks before deciding to stop the timer chain, the decision no longer has to be made so early -- we can allow one more IRQ to go in and then disable it. That is the main change of this cset: we go back to driving the interrupt from the vpt code and fixing up the RTC state after vpt tells us it's injected an interrupt. > > + if ( new_b & new_c & (RTC_PF | RTC_AF | RTC_UF) ) > > + new_c |= RTC_IRQF; > > Without going back to reading the spec, iirc RTC_IRQF is a sticky > bit cleared _only_ be REG_C reads, i.e. you shouldn't clear it > earlier in the function and then conditionally set it here. All those bits are sticky, so if IRQF was set before it will be now; but true that we could drop the mask at the top of the function. (+ again in rtc_irq_event) > > > > - /* IRQ is raised if any source is both raised & enabled */ > > - if ( !(s->hw.cmos_data[RTC_REG_B] & > > - s->hw.cmos_data[RTC_REG_C] & > > - (RTC_PF | RTC_AF | RTC_UF)) ) > > - return; > > + s->hw.cmos_data[RTC_REG_B] = new_b; > > + s->hw.cmos_data[RTC_REG_C] = new_c; > > > > - s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF; > > - if ( rtc_mode_is(s, no_ack) ) > > - hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); > > - hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ); > > + if ( new_c & RTC_IRQF ) > > + rtc_toggle_irq(s); > > Which then implies that the condition here would also need to > consider the old state of the flag. Sure. > > @@ -647,9 +674,6 @@ static uint32_t rtc_ioport_read(RTCState *s, uint32_t > > addr) > > case RTC_REG_C: > > ret = s->hw.cmos_data[s->hw.cmos_index]; > > s->hw.cmos_data[RTC_REG_C] = 0x00; > > - if ( (ret & RTC_IRQF) && !rtc_mode_is(s, no_ack) ) > > - hvm_isa_irq_deassert(d, RTC_IRQ); > > Why? With RTC_IRQF going from 1 to 0, the interrupt line should > get de-asserted. After this patch the RTC model only sends edges (as mentioned in the description, maybe not clearly enough). I think that might be caused by some confusion on my part. I'll go into it a bit more below. > > - rtc_update_irq(s); > > So given the problem description, this would seem to be the most > important part at a first glance. But looking more closely, I'm getting > the impression that the call to rtc_update_irq() had no effect at all > here anyway: The function would always bail on the second if() due > to REG_C having got cleared a few lines up. Yeah, this has nothing to do with the bug being fixed here. The old REG_C read was operating correctly, but on the return-to-guest path: - vpt sees another RTC interrupt is due and calls RTC code - RTC code sees REG_C clear, sets PF|IRQF and asserts the line - vlapic code sees the last interrupt is still in the ISR and does nothing; - we return to the guest having set IRQF but not consumed a timer event, so vpt stste is the same - the guest sees the old REG_C, with PF|IRQF set, and re-reads, waiting for a read of 0. - repeat forever. AFAICS this is caused by the disconnect where the RTC tries to reinject but doesn't use up a tick. A real RTC doesn't have that problem, because it doesn't have all the scheduling artefacts that the VPT code is intended to work around. So IMO the right fix is to go back to letting the VPT code control IRQ injection for the PF source, which is what this patch does. (I don't consider it to be a revert of any particular csets, BTW -- there have been a lot of other improvements in the RTC as part of that and later series that this leaves in place). The switch to always sending edges is incidental, and I'd be happy to get rid of it. IIRC the reason we ended up with that is that when the vpt code injects an interrupt, we need to set IRQF|PF in REG_C but _not_ assert the line (which would cause a second interrupt). Tracking that disconnect seemed tricky, esp. across save/restore. If you have any idea how we could sort that out, I'd be inclined to go back to a level-triggered model (i.e. following the RTC spec) even in no-ack mode. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.