[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch] x86/HVM: Fix RTC interrupt modelling

On 10/02/14 12:17, Andrew Cooper wrote:
> This reverts large amounts of:
>   9607327abbd3e77bde6cc7b5327f3efd781fc06e
>     "x86/HVM: properly handle RTC periodic timer even when !RTC_PIE"
>   620d5dad54008e40798c4a0c4322aef274c36fa3
>     "x86/HVM: assorted RTC emulation adjustments"
> and by extentsion:
>   f3347f520cb4d8aa4566182b013c6758d80cbe88
>     "x86/HVM: adjust IRQ (de-)assertion"
>   c2f79c464849e5f796aa9d1d0f26fe356abd1a1a
>     "x86/HVM: fix processing of RTC REG_B writes"
>   527824f41f5fac9cba3d4441b2e73d3118d98837
>     "x86/hvm: Centralize and simplify the RTC IRQ logic."
> The current code has a pathological case, tickled by the access pattern of
> Windows 2003 Server SP2.  Occasonally on boot (which I presume is during a
> time calibration against the RTC Periodic Timer), Windows gets stuck in an
> infinite loop reading RTC REG_C.  This affects 32 and 64 bit guests.
> In the pathological case, the VM state looks like this:
>   * RTC: 64Hz period, periodic interrupts enabled
>   * RTC_IRQ in IOAPIC as vector 0xd1, edge triggered, not pending
>   * vector 0xd1 set in LAPIC IRR and ISR, TPR at 0xd0
>   * Reads from REG_C return 'RTC_PF | RTC_IRQF'
> With an intstrumented Xen, dumping the periodic timers with a guest in this
> state shows a single timer with pt->irq_issued=1 and pt->pending_intr_nr=2.
> Windows is presumably waiting for reads of REG_C to drop to 0, and reading
> REG_C clears the value each time in the emulated RTC.  However:
>   * {svm,vmx}_intr_assist() calls pt_update_irq() unconditionally.
>   * pt_update_irq() always finds the RTC as earliest_pt.
>   * rtc_periodic_interrupt() unconditionally sets RTC_PF in no_ack mode.  It
>     returns true, indicating that pt_update_irq() should really inject the
>     interrupt.
>   * pt_update_irq() decides that it doesn't need to fake up part of
>     pt_intr_post() because this is a real interrupt.
>   * {svm,vmx}_intr_assist() can't inject the interrupt as it is already
>     pending, so exits early without calling pt_intr_post().
> The underlying problem here comes because the AF and UF bits of RTC interrupt
> state is modelled by the RTC code, but the PF is modelled by the pt code.  The
> root cause of windows infinite loop is that RTC_PF is being re-set on vmentry
> before the interrupt logic has worked out that it can't actually inject an RTC
> interrupt, causing Windows to erroniously read (RTC_PF|RTC_IRQF) when it
> should be reading 0.
> This patch reverts the RTC_PF logic handling to its former state, whereby
> rtc_periodic_cb() is called strictly when the periodic timer logic has
> successfully injected a periodic interrupt.  In doing so, it is important that
> the RTC code itself never directly triggers an interrupt for the periodic
> timer (other than the case when setting REG_B.PIE, where the pt code will have
> dropped the interrupt).
> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Signed-off-by: Tim Deegan <tim@xxxxxxx>
> CC: Keir Fraser <keir@xxxxxxx>
> CC: Jan Beulich <JBeulich@xxxxxxxx>
> CC: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
> CC: Roger Pau Monnà <roger.pau@xxxxxxxxxx>
> ---
> I still dont know exactly what condition causes windows to tickle this
> behavour.  It is seen about 1 or 2 times in 9 tests running a 12 hour VM
> lifecycle test.  Over the weekend, 100 of these tests have passed without a
> single reoccurence of the infinite loop.  The change has also passed a windows
> extended regression test, so it would appear that other versions of windows
> are still fine with the change.
> Roger: as this caused issues for FreeBSD, would you mind testing it again
> please?

Tested-by: Roger Pau Monnà <roger.pau@xxxxxxxxxx>
On FreeBSD 10.0, 9.2 and 8.4

No apparent regressions AFAICT.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.