[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit moratorium to staging



On Tue, Oct 31, 2017 at 10:49:35AM +0000, Julien Grall wrote:
> Hi all,
> 
> Master lags 15 days behind staging due to tests failing reliably on some of
> the hardware in osstest (see [1]).
> 
> At the moment a force push is not feasible because the same tests passes on
> different hardware (see [2]).

I've been looking into this, and I'm afraid I don't yet have a cause
for those issues. I'm going to post what I've found so far, maybe
someone is able to spot something I'm missing.

Since I assumed this was somehow related to the ACPI PM1A_STS/EN
blocks (which is how the power button even gets notified to the OS),
I've added the following instrumentation to the pmtimer.c code:

diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 435647ff1e..051fc46df8 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -61,9 +61,15 @@ static void pmt_update_sci(PMTState *s)
     ASSERT(spin_is_locked(&s->lock));
 
     if ( acpi->pm1a_en & acpi->pm1a_sts & SCI_MASK )
+    {
+        printk("asserting SCI IRQ\n");
         hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ, NULL);
+    }
     else
+    {
+        printk("de-asserting SCI IRQ\n");
         hvm_isa_irq_deassert(s->vcpu->domain, SCI_IRQ);
+    }
 }
 
 void hvm_acpi_power_button(struct domain *d)
@@ -73,6 +79,7 @@ void hvm_acpi_power_button(struct domain *d)
     if ( !has_vpm(d) )
         return;
 
+    printk("hvm_acpi_power_button for d%d\n", d->domain_id);
     spin_lock(&s->lock);
     d->arch.hvm_domain.acpi.pm1a_sts |= PWRBTN_STS;
     pmt_update_sci(s);
@@ -86,6 +93,7 @@ void hvm_acpi_sleep_button(struct domain *d)
     if ( !has_vpm(d) )
         return;
 
+    printk("hvm_acpi_sleep_button for d%d\n", d->domain_id);
     spin_lock(&s->lock);
     d->arch.hvm_domain.acpi.pm1a_sts |= PWRBTN_STS;
     pmt_update_sci(s);
@@ -170,6 +178,7 @@ static int handle_evt_io(
 
     if ( dir == IOREQ_WRITE )
     {
+        printk("write PM1a addr: %#x val: %#x\n", addr, *val);
         /* Handle this I/O one byte at a time */
         for ( i = bytes, data = *val;
               i > 0;
@@ -197,6 +206,8 @@ static int handle_evt_io(
                          bytes, *val, port);
             }
         }
+        printk("result pm1a_sts: %#x pm1a_en: %#x\n",
+              acpi->pm1a_sts, acpi->pm1a_en);
         /* Fix up the SCI state to match the new register state */
         pmt_update_sci(s);
     }

I've then rerun the failing test, and this is what I got in the
failure case (ie: windows ignoring the power event):

(XEN) hvm_acpi_power_button for d14
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x1
(XEN) result pm1a_sts: 0x100 pm1a_en: 0x320
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x100
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ

Strangely enough, the second time I've tried the same command (xl
shutdown -wF ...) on the same guest, it succeed and windows shut down
without issues, this is the log in that case:

(XEN) hvm_acpi_power_button for d14
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x1
(XEN) result pm1a_sts: 0x100 pm1a_en: 0x320
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x100
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x8000
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ

I have to admit I have no idea why Windows clears the STS power bit
and then completely ignores it on certain occasions.

I'm also afraid I have no idea how to debug Windows in order to know
why this event is acknowledged but ignored.

I've also tried to reproduce the same with a Debian guest, by doing
the same amount of save/restores and migrations, and finally issuing a
xl trigger <guest> power, but Debian has always worked fine and
shut down.

Any comments are welcome.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.