Xen project Mailing List

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 10 Feb 2014 16:34:04 +0000

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>, KeirFraser <keir@xxxxxxx>, roger.pau@xxxxxxxxxx

Delivery-date: Mon, 10 Feb 2014 16:34:30 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 10.02.14 at 12:17, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > This reverts large amounts of: > 9607327abbd3e77bde6cc7b5327f3efd781fc06e > "x86/HVM: properly handle RTC periodic timer even when !RTC_PIE" > 620d5dad54008e40798c4a0c4322aef274c36fa3 > "x86/HVM: assorted RTC emulation adjustments" > > and by extentsion: > f3347f520cb4d8aa4566182b013c6758d80cbe88 > "x86/HVM: adjust IRQ (de-)assertion" > c2f79c464849e5f796aa9d1d0f26fe356abd1a1a > "x86/HVM: fix processing of RTC REG_B writes" > 527824f41f5fac9cba3d4441b2e73d3118d98837 > "x86/hvm: Centralize and simplify the RTC IRQ logic." So what does "by extension" mean here? Are these being reverted? > The current code has a pathological case, tickled by the access pattern of > Windows 2003 Server SP2. Occasonally on boot (which I presume is during a > time calibration against the RTC Periodic Timer), Windows gets stuck in an > infinite loop reading RTC REG_C. This affects 32 and 64 bit guests. > > In the pathological case, the VM state looks like this: > * RTC: 64Hz period, periodic interrupts enabled > * RTC_IRQ in IOAPIC as vector 0xd1, edge triggered, not pending > * vector 0xd1 set in LAPIC IRR and ISR, TPR at 0xd0 > * Reads from REG_C return 'RTC_PF | RTC_IRQF' > > With an intstrumented Xen, dumping the periodic timers with a guest in this > state shows a single timer with pt->irq_issued=1 and pt->pending_intr_nr=2. > > Windows is presumably waiting for reads of REG_C to drop to 0, and reading > REG_C clears the value each time in the emulated RTC. However: > > * {svm,vmx}_intr_assist() calls pt_update_irq() unconditionally. > * pt_update_irq() always finds the RTC as earliest_pt. > * rtc_periodic_interrupt() unconditionally sets RTC_PF in no_ack mode. It > returns true, indicating that pt_update_irq() should really inject the > interrupt. > * pt_update_irq() decides that it doesn't need to fake up part of > pt_intr_post() because this is a real interrupt. > * {svm,vmx}_intr_assist() can't inject the interrupt as it is already > pending, so exits early without calling pt_intr_post(). > > The underlying problem here comes because the AF and UF bits of RTC > interrupt > state is modelled by the RTC code, but the PF is modelled by the pt code. > The > root cause of windows infinite loop is that RTC_PF is being re-set on > vmentry > before the interrupt logic has worked out that it can't actually inject an > RTC > interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it > should be reading 0. So you're undoing a whole lot of changes done with the goal of getting the overall emulation closer to what real hardware does, just to paper over an issue elsewhere in the code? Not really an approach I'm in favor of. > * The results from XenRT suggest that the new emulation is better than the > old. "Better" in the sense of the limited set of uses of the virtual hardware by whatever selection of guest OSes is being run there. But very likely not "better" in the sense on matching up with how the respective hardware specification would require it to behave. > --- a/xen/arch/x86/hvm/rtc.c > +++ b/xen/arch/x86/hvm/rtc.c > @@ -59,48 +59,78 @@ static void rtc_set_time(RTCState *s); > static inline int from_bcd(RTCState *s, int a); > static inline int convert_hour(RTCState *s, int hour); > > -static void rtc_update_irq(RTCState *s) > +/* > + * Send an edge on the RTC ISA IRQ line. The RTC spec states that it should > + * be a line level interrupt, but the PIIX3 states that it must be edge > + * triggered. We model the RTC using edge semantics. > + */ > +static void rtc_toggle_irq(RTCState *s) > { > + hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); > + hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ); > +} > + > +static void rtc_update_regb(RTCState *s, uint8_t new_b) > +{ > + uint8_t new_c = s->hw.cmos_data[RTC_REG_C] & ~RTC_IRQF; > + > ASSERT(spin_is_locked(&s->lock)); > > - if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) ) > - return; > + if ( new_b & new_c & (RTC_PF | RTC_AF | RTC_UF) ) > + new_c |= RTC_IRQF; Without going back to reading the spec, iirc RTC_IRQF is a sticky bit cleared _only_ be REG_C reads, i.e. you shouldn't clear it earlier in the function and then conditionally set it here. > > - /* IRQ is raised if any source is both raised & enabled */ > - if ( !(s->hw.cmos_data[RTC_REG_B] & > - s->hw.cmos_data[RTC_REG_C] & > - (RTC_PF | RTC_AF | RTC_UF)) ) > - return; > + s->hw.cmos_data[RTC_REG_B] = new_b; > + s->hw.cmos_data[RTC_REG_C] = new_c; > > - s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF; > - if ( rtc_mode_is(s, no_ack) ) > - hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ); > - hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ); > + if ( new_c & RTC_IRQF ) > + rtc_toggle_irq(s); Which then implies that the condition here would also need to consider the old state of the flag. > +static void rtc_irq_event(RTCState *s, uint8_t event) > +{ > + uint8_t b = s->hw.cmos_data[RTC_REG_B]; > + uint8_t old_c = s->hw.cmos_data[RTC_REG_C]; > + uint8_t new_c = old_c & ~RTC_IRQF; > + > + ASSERT(spin_is_locked(&s->lock)); > + > + if ( b & new_c & (RTC_PF | RTC_AF | RTC_UF) ) > + new_c |= RTC_IRQF; Same comment as above. > @@ -647,9 +674,6 @@ static uint32_t rtc_ioport_read(RTCState *s, uint32_t > addr) > case RTC_REG_C: > ret = s->hw.cmos_data[s->hw.cmos_index]; > s->hw.cmos_data[RTC_REG_C] = 0x00; > - if ( (ret & RTC_IRQF) && !rtc_mode_is(s, no_ack) ) > - hvm_isa_irq_deassert(d, RTC_IRQ); Why? With RTC_IRQF going from 1 to 0, the interrupt line should get de-asserted. > - rtc_update_irq(s); So given the problem description, this would seem to be the most important part at a first glance. But looking more closely, I'm getting the impression that the call to rtc_update_irq() had no effect at all here anyway: The function would always bail on the second if() due to REG_C having got cleared a few lines up. > @@ -270,47 +263,12 @@ int pt_update_irq(struct vcpu *v) > earliest_pt->irq_issued = 1; > irq = earliest_pt->irq; > is_lapic = (earliest_pt->source == PTSRC_lapic); > - pt_priv = earliest_pt->priv; > > spin_unlock(&v->arch.hvm_vcpu.tm_lock); > > if ( is_lapic ) > - vlapic_set_irq(vcpu_vlapic(v), irq, 0); > - else if ( irq == RTC_IRQ && pt_priv ) > { > - if ( !rtc_periodic_interrupt(pt_priv) ) > - irq = -1; > - > - pt_lock(earliest_pt); > - > - if ( irq < 0 && earliest_pt->pending_intr_nr ) > - { > - /* > - * RTC periodic timer runs without the corresponding interrupt > - * being enabled - need to mimic enough of pt_intr_post() to keep > - * things going. > - */ > - earliest_pt->pending_intr_nr = 0; > - earliest_pt->irq_issued = 0; > - set_timer(&earliest_pt->timer, earliest_pt->scheduled); > - } > - else if ( irq >= 0 && pt_irq_masked(earliest_pt) ) > - { > - if ( earliest_pt->on_list ) > - { > - /* suspend timer emulation */ > - list_del(&earliest_pt->list); > - earliest_pt->on_list = 0; > - } > - irq = -1; > - } > - > - /* Avoid dropping the lock if we can. */ > - if ( irq < 0 && v == earliest_pt->vcpu ) > - goto rescan_locked; > - pt_unlock(earliest_pt); > - if ( irq < 0 ) > - goto rescan; > + vlapic_set_irq(vcpu_vlapic(v), irq, 0); If you didn't put this single function call in braces, the patch would become more clear, as it would then be exactly the "else if()" branch that got removed by it. As a round-up: I'm not going to veto this, but I'm also not going to be putting my name under it, nor am I going to make another attempt to clean up the RTC emulation if this is to go in unchanged. I'm personally getting the impression that the root cause of the observed problem is still being left in place (and perhaps still not being fully understood), and hence this whole change goes in the wrong direction, _even_ if it makes the problem it is aiming at fixing indeed appear to go away. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.