Xen project Mailing List

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 11 Feb 2014 09:28:28 +0000

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, KeirFraser <keir@xxxxxxx>, roger.pau@xxxxxxxxxx

Delivery-date: Tue, 11 Feb 2014 09:28:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 10.02.14 at 18:13, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 10/02/14 16:34, Jan Beulich wrote: >>>>> On 10.02.14 at 12:17, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: >>> This reverts large amounts of: >>> 9607327abbd3e77bde6cc7b5327f3efd781fc06e >>> "x86/HVM: properly handle RTC periodic timer even when !RTC_PIE" >>> 620d5dad54008e40798c4a0c4322aef274c36fa3 >>> "x86/HVM: assorted RTC emulation adjustments" >>> >>> and by extentsion: >>> f3347f520cb4d8aa4566182b013c6758d80cbe88 >>> "x86/HVM: adjust IRQ (de-)assertion" >>> c2f79c464849e5f796aa9d1d0f26fe356abd1a1a >>> "x86/HVM: fix processing of RTC REG_B writes" >>> 527824f41f5fac9cba3d4441b2e73d3118d98837 >>> "x86/hvm: Centralize and simplify the RTC IRQ logic." >> So what does "by extension" mean here? Are these being >> reverted? > > The logic here was based on the logic being reverted, so the changes > themselves are mostly gone. > >> >>> The current code has a pathological case, tickled by the access pattern of >>> Windows 2003 Server SP2. Occasonally on boot (which I presume is during a >>> time calibration against the RTC Periodic Timer), Windows gets stuck in an >>> infinite loop reading RTC REG_C. This affects 32 and 64 bit guests. >>> >>> In the pathological case, the VM state looks like this: >>> * RTC: 64Hz period, periodic interrupts enabled >>> * RTC_IRQ in IOAPIC as vector 0xd1, edge triggered, not pending >>> * vector 0xd1 set in LAPIC IRR and ISR, TPR at 0xd0 >>> * Reads from REG_C return 'RTC_PF | RTC_IRQF' >>> >>> With an intstrumented Xen, dumping the periodic timers with a guest in this >>> state shows a single timer with pt->irq_issued=1 and pt->pending_intr_nr=2. >>> >>> Windows is presumably waiting for reads of REG_C to drop to 0, and reading >>> REG_C clears the value each time in the emulated RTC. However: >>> >>> * {svm,vmx}_intr_assist() calls pt_update_irq() unconditionally. >>> * pt_update_irq() always finds the RTC as earliest_pt. >>> * rtc_periodic_interrupt() unconditionally sets RTC_PF in no_ack mode. It >>> returns true, indicating that pt_update_irq() should really inject the >>> interrupt. >>> * pt_update_irq() decides that it doesn't need to fake up part of >>> pt_intr_post() because this is a real interrupt. >>> * {svm,vmx}_intr_assist() can't inject the interrupt as it is already >>> pending, so exits early without calling pt_intr_post(). >>> >>> The underlying problem here comes because the AF and UF bits of RTC >>> interrupt >>> state is modelled by the RTC code, but the PF is modelled by the pt code. >>> The >>> root cause of windows infinite loop is that RTC_PF is being re-set on >>> vmentry >>> before the interrupt logic has worked out that it can't actually inject an >>> RTC >>> interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it >>> should be reading 0. >> So you're undoing a whole lot of changes done with the goal of >> getting the overall emulation closer to what real hardware does, >> just to paper over an issue elsewhere in the code? Not really an >> approach I'm in favor of. > > I fail to see how the current code is closer to what hardware does than > this proposed patch. The thing is that if any of the changes you revert (fully or partly) was actively wrong, you should have broken up the patch such that it reverts the broken pieces in individual steps. The way I see the situation right now is that you effectively say "individually all those changes have been fine, but collectively they have a problem which we can't really isolate, so let's revert them altogether". The most prominent example of further diverging from actual hardware behavior is the correct driving of PF without PIE, as just explained in another response on this thread. >>> - if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) >>> ) >>> - return; >>> + if ( new_b & new_c & (RTC_PF | RTC_AF | RTC_UF) ) >>> + new_c |= RTC_IRQF; >> Without going back to reading the spec, iirc RTC_IRQF is a sticky >> bit cleared _only_ be REG_C reads, i.e. you shouldn't clear it >> earlier in the function and then conditionally set it here. > > The main problem is that the MC146818 states that the interrupt is line > level, while the PIIX3 mandates that it is edge triggered, and our ACPI > tables define it to be edge triggered. > > The datasheet states "The IRQF bit in Register C is a '1' whenever the > ÂIRQ pin is being driven low". With edge semantics where the line never > actually stays asserted, this would degrade to being sticky until read. Edge semantics say nothing about the line remaining asserted, i.e. an edge triggered interrupt could have its line remaining asserted for all the time except when a new interrupt is needed, at which point it would get briefly de-asserted and then asserted again. Hence the interrupt assertion requirements of PIIX3 and those of the MC146818 aren't really contradicting with one another. And our code (in no-ack mode) mostly does what I just described: It de-asserts the line briefly before re-asserting it. In "strict" mode the de-assertion happens when IRQF gets cleared. > However, with edge semantics, we need to send new interrupts when > enabling bits in REG_B even if IRQF is already outstanding, Why? > which is why > the logic starts by masking it back out. Furthermore, in no-ack mode, > we expect IRQF to be set (as the guest isn't reading REG_C), but still > wanting interrupts. Which is precisely what rtc_update_irq() does. >> As a round-up: I'm not going to veto this, but I'm also not going to >> be putting my name under it, nor am I going to make another >> attempt to clean up the RTC emulation if this is to go in unchanged. >> I'm personally getting the impression that the root cause of the >> observed problem is still being left in place (and perhaps still not >> being fully understood), and hence this whole change goes in the >> wrong direction, _even_ if it makes the problem it is aiming at >> fixing indeed appear to go away. > > I am not sure I follow you. The root case of the infinite loop is > because Xen was erroniously setting REG_C.PF, because pt_update_irq() > was trying to pre-guess what the interrupt injection logic would do, > (and getting it wrong). So it would be - as also said in the response to Tim - that interaction between vpt and rtc code that needs further tweaking. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.