[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling



>>> On 10.02.14 at 12:17, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> This reverts large amounts of:
>   9607327abbd3e77bde6cc7b5327f3efd781fc06e
>     "x86/HVM: properly handle RTC periodic timer even when !RTC_PIE"
>   620d5dad54008e40798c4a0c4322aef274c36fa3
>     "x86/HVM: assorted RTC emulation adjustments"
> 
> and by extentsion:
>   f3347f520cb4d8aa4566182b013c6758d80cbe88
>     "x86/HVM: adjust IRQ (de-)assertion"
>   c2f79c464849e5f796aa9d1d0f26fe356abd1a1a
>     "x86/HVM: fix processing of RTC REG_B writes"
>   527824f41f5fac9cba3d4441b2e73d3118d98837
>     "x86/hvm: Centralize and simplify the RTC IRQ logic."

So what does "by extension" mean here? Are these being
reverted?

> The current code has a pathological case, tickled by the access pattern of
> Windows 2003 Server SP2.  Occasonally on boot (which I presume is during a
> time calibration against the RTC Periodic Timer), Windows gets stuck in an
> infinite loop reading RTC REG_C.  This affects 32 and 64 bit guests.
> 
> In the pathological case, the VM state looks like this:
>   * RTC: 64Hz period, periodic interrupts enabled
>   * RTC_IRQ in IOAPIC as vector 0xd1, edge triggered, not pending
>   * vector 0xd1 set in LAPIC IRR and ISR, TPR at 0xd0
>   * Reads from REG_C return 'RTC_PF | RTC_IRQF'
> 
> With an intstrumented Xen, dumping the periodic timers with a guest in this
> state shows a single timer with pt->irq_issued=1 and pt->pending_intr_nr=2.
> 
> Windows is presumably waiting for reads of REG_C to drop to 0, and reading
> REG_C clears the value each time in the emulated RTC.  However:
> 
>   * {svm,vmx}_intr_assist() calls pt_update_irq() unconditionally.
>   * pt_update_irq() always finds the RTC as earliest_pt.
>   * rtc_periodic_interrupt() unconditionally sets RTC_PF in no_ack mode.  It
>     returns true, indicating that pt_update_irq() should really inject the
>     interrupt.
>   * pt_update_irq() decides that it doesn't need to fake up part of
>     pt_intr_post() because this is a real interrupt.
>   * {svm,vmx}_intr_assist() can't inject the interrupt as it is already
>     pending, so exits early without calling pt_intr_post().
> 
> The underlying problem here comes because the AF and UF bits of RTC 
> interrupt
> state is modelled by the RTC code, but the PF is modelled by the pt code.  
> The
> root cause of windows infinite loop is that RTC_PF is being re-set on 
> vmentry
> before the interrupt logic has worked out that it can't actually inject an 
> RTC
> interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it
> should be reading 0.

So you're undoing a whole lot of changes done with the goal of
getting the overall emulation closer to what real hardware does,
just to paper over an issue elsewhere in the code? Not really an
approach I'm in favor of.

>   * The results from XenRT suggest that the new emulation is better than the
>     old.

"Better" in the sense of the limited set of uses of the virtual hardware
by whatever selection of guest OSes is being run there. But very
likely not "better" in the sense on matching up with how the respective
hardware specification would require it to behave.

> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -59,48 +59,78 @@ static void rtc_set_time(RTCState *s);
>  static inline int from_bcd(RTCState *s, int a);
>  static inline int convert_hour(RTCState *s, int hour);
>  
> -static void rtc_update_irq(RTCState *s)
> +/*
> + * Send an edge on the RTC ISA IRQ line.  The RTC spec states that it should
> + * be a line level interrupt, but the PIIX3 states that it must be edge
> + * triggered.  We model the RTC using edge semantics.
> + */
> +static void rtc_toggle_irq(RTCState *s)
>  {
> +    hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
> +    hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ);
> +}
> +
> +static void rtc_update_regb(RTCState *s, uint8_t new_b)
> +{
> +    uint8_t new_c = s->hw.cmos_data[RTC_REG_C] & ~RTC_IRQF;
> +
>      ASSERT(spin_is_locked(&s->lock));
>  
> -    if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) )
> -        return;
> +    if ( new_b & new_c & (RTC_PF | RTC_AF | RTC_UF) )
> +        new_c |= RTC_IRQF;

Without going back to reading the spec, iirc RTC_IRQF is a sticky
bit cleared _only_ be REG_C reads, i.e. you shouldn't clear it
earlier in the function and then conditionally set it here.

>  
> -    /* IRQ is raised if any source is both raised & enabled */
> -    if ( !(s->hw.cmos_data[RTC_REG_B] &
> -           s->hw.cmos_data[RTC_REG_C] &
> -           (RTC_PF | RTC_AF | RTC_UF)) )
> -        return;
> +    s->hw.cmos_data[RTC_REG_B] = new_b;
> +    s->hw.cmos_data[RTC_REG_C] = new_c;
>  
> -    s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
> -    if ( rtc_mode_is(s, no_ack) )
> -        hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
> -    hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ);
> +    if ( new_c & RTC_IRQF )
> +        rtc_toggle_irq(s);

Which then implies that the condition here would also need to
consider the old state of the flag.

> +static void rtc_irq_event(RTCState *s, uint8_t event)
> +{
> +    uint8_t b = s->hw.cmos_data[RTC_REG_B];
> +    uint8_t old_c = s->hw.cmos_data[RTC_REG_C];
> +    uint8_t new_c = old_c & ~RTC_IRQF;
> +
> +    ASSERT(spin_is_locked(&s->lock));
> +
> +    if ( b & new_c & (RTC_PF | RTC_AF | RTC_UF) )
> +        new_c |= RTC_IRQF;

Same comment as above.

> @@ -647,9 +674,6 @@ static uint32_t rtc_ioport_read(RTCState *s, uint32_t 
> addr)
>      case RTC_REG_C:
>          ret = s->hw.cmos_data[s->hw.cmos_index];
>          s->hw.cmos_data[RTC_REG_C] = 0x00;
> -        if ( (ret & RTC_IRQF) && !rtc_mode_is(s, no_ack) )
> -            hvm_isa_irq_deassert(d, RTC_IRQ);

Why? With RTC_IRQF going from 1 to 0, the interrupt line should
get de-asserted.

> -        rtc_update_irq(s);

So given the problem description, this would seem to be the most
important part at a first glance. But looking more closely, I'm getting
the impression that the call to rtc_update_irq() had no effect at all
here anyway: The function would always bail on the second if() due
to REG_C having got cleared a few lines up.

> @@ -270,47 +263,12 @@ int pt_update_irq(struct vcpu *v)
>      earliest_pt->irq_issued = 1;
>      irq = earliest_pt->irq;
>      is_lapic = (earliest_pt->source == PTSRC_lapic);
> -    pt_priv = earliest_pt->priv;
>  
>      spin_unlock(&v->arch.hvm_vcpu.tm_lock);
>  
>      if ( is_lapic )
> -        vlapic_set_irq(vcpu_vlapic(v), irq, 0);
> -    else if ( irq == RTC_IRQ && pt_priv )
>      {
> -        if ( !rtc_periodic_interrupt(pt_priv) )
> -            irq = -1;
> -
> -        pt_lock(earliest_pt);
> -
> -        if ( irq < 0 && earliest_pt->pending_intr_nr )
> -        {
> -            /*
> -             * RTC periodic timer runs without the corresponding interrupt
> -             * being enabled - need to mimic enough of pt_intr_post() to keep
> -             * things going.
> -             */
> -            earliest_pt->pending_intr_nr = 0;
> -            earliest_pt->irq_issued = 0;
> -            set_timer(&earliest_pt->timer, earliest_pt->scheduled);
> -        }
> -        else if ( irq >= 0 && pt_irq_masked(earliest_pt) )
> -        {
> -            if ( earliest_pt->on_list )
> -            {
> -                /* suspend timer emulation */
> -                list_del(&earliest_pt->list);
> -                earliest_pt->on_list = 0;
> -            }
> -            irq = -1;
> -        }
> -
> -        /* Avoid dropping the lock if we can. */
> -        if ( irq < 0 && v == earliest_pt->vcpu )
> -            goto rescan_locked;
> -        pt_unlock(earliest_pt);
> -        if ( irq < 0 )
> -            goto rescan;
> +        vlapic_set_irq(vcpu_vlapic(v), irq, 0);

If you didn't put this single function call in braces, the patch would
become more clear, as it would then be exactly the "else if()"
branch that got removed by it.

As a round-up: I'm not going to veto this, but I'm also not going to
be putting my name under it, nor am I going to make another
attempt to clean up the RTC emulation if this is to go in unchanged.
I'm personally getting the impression that the root cause of the
observed problem is still being left in place (and perhaps still not
being fully understood), and hence this whole change goes in the
wrong direction, _even_ if it makes the problem it is aiming at
fixing indeed appear to go away.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.