[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



CPU#1 got stuck in loop in cpu_init() as it appears to be ‘already initialised’ in cpu_initialized bitmap. CPU#0 detects it is stuck and carries on, but the resume code assumes all CPUs are brought back online and crashes later.

I wonder how long this has been broken. I recall reworking the CPU bringup code a lot early during 4.1.0 development... And I didn’t test S3.

 -- Keir

On 19/09/2012 22:07, "Ben Guthro" <ben@xxxxxxxxxx> wrote:

No hardware debugger just yet - but I've moved to another machine (Lenovo T400 laptop) - and am now seeing the following stack trace when I resume
(this is using the tip of the 4.2-testing tree)

It looks like either the vcpu, or the runstate is NULL, at this point in the resume process...


(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) CPU#1 already initialized!
(XEN) Stuck ??
(XEN) Error taking CPU1 up: -5
[   38.570054] ACPI: Low-level resume complete
[   38.570054] PM: Restoring platform NVS memory
[   38.570054] Enabling non-boot CPUs ...
(XEN) ----[ Xen-4.2.1-pre  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: 00007d3b7fd17180   rbx: ffff8300bd2fe000   rcx: 0000000000000000
(XEN) rdx: ffff08003fc8bd80   rsi: ffff82c48029fe28   rdi: ffff8300bd2fe000
(XEN) rbp: ffff82c48029fe28   rsp: ffff82c48029fdf8   r8:  0000000000000008
(XEN) r9:  00000000000001c0   r10: ffff82c48021f4a0   r11: 0000000000000282
(XEN) r12: ffff82c4802e8ee0   r13: ffff880039762da0   r14: ffff82c4802d3140
(XEN) r15: fffffffffffffff2   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000139ee4000   cr2: 0000000000000060
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fdf8:
(XEN)    ffff8300bd2fe000 ffff82c48029ff18 ffff880037481d40 ffff880039762da0
(XEN)    0000000000000001 ffff82c480157df4 0000000000000070 ffff82f6016db300
(XEN)    00000000000b6d98 ffff8301355d8000 0000000000000070 ffff82c4801702ab
(XEN)    ffff88003fc8bd80 0000000000000000 0000000000000020 ffff8300bd2fe000
(XEN)    ffff8301355d8000 ffff880037481d40 ffff880039762da0 0000000000000001
(XEN)    0000000000000003 ffff82c4801058df ffff82c48029ff18 ffff82c48011462e
(XEN)    0000000000000000 0000000000000000 0000000400000004 ffff82c48029ff18
(XEN)    0000000000000010 ffff8300bd6a0000 ffff8800374819a8 ffff8300bd6a0000
(XEN)    ffff880037481d48 0000000000000001 ffff880039762da0 ffff82c480214288
(XEN)    0000000000000003 0000000000000001 ffff880039762da0 0000000000000001
(XEN)    ffff880037481d48 0000000000000001 0000000000000282 ffff880002dc4240
(XEN)    00000000000001c0 00000000000001c0 0000000000000018 ffffffff8100130a
(XEN)    ffff880037481d40 0000000000000001 0000000000000005 0000010000000000
(XEN)    ffffffff8100130a 000000000000e033 0000000000000282 ffff880037481d20
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300bd6a0000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130
(XEN)    [<ffff82c480157df4>] arch_do_vcpu_op+0x134/0x5d0
(XEN)    [<ffff82c4801702ab>] do_update_descriptor+0x1db/0x220
(XEN)    [<ffff82c4801058df>] do_vcpu_op+0x6f/0x4a0
(XEN)    [<ffff82c48011462e>] do_multicall+0x13e/0x330
(XEN)    [<ffff82c480214288>] syscall_enter+0x88/0x8d
(XEN)    
(XEN) Pagetable walk from 0000000000000060:
(XEN)  L4[0x000] = 00000001004a5067 0000000000038c9d
(XEN)  L3[0x000] = 000000013a703067 0000000000003094
(XEN)  L2[0x000] = 0000000000000000 ffffffffffffffff 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000060
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


On Fri, Sep 7, 2012 at 12:06 PM, Ben Guthro <ben@xxxxxxxxxx> wrote:
I'll work on getting a JTAG, ICE, or something else - it is on an
Intel SDP - so it should have the ports for it.

My current suspicion on this is that the hardware registers are not
being programmed the same way as they were in 4.0.x
(Since the "pulsing power button LED" on the laptops, and the behavior
of the Desktop SDP are now similar)

Once again - I don't have a lot of evidence to back this up - however,
if I ifdef out the register writes that actually start the low level
suspend - in
xen/arch/x86/acpi/power.c  acpi_enter_sleep_state() - the rest of the
suspend process completes as though the machine suspended, and then
immediately resumed.

In this case - the system seems to be functioning properly.





Hack to prevent low level S3 attached.



On Fri, Sep 7, 2012 at 8:18 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 07.09.12 at 13:51, Ben Guthro <ben@xxxxxxxxxx> wrote:
>> However, when I run with console=none, the observed behavior is very
>> different.
>> The system seems to go to sleep successfully - but when I press the
>> power button to wake it up - the power comes on - the fans spin up -
>> but the system is unresponsive.
>> No video
>> No network
>> keyboard LEDs (Caps,Numlock) do not light up.
>>
>>
>> Alternate debugging strategies welcome.
>
> I'm afraid other than being lucky to spot something via code
> inspection, the only alternative is an ITP/ICE. Maybe Intel folks
> could help out debugging this if it's reproducible for them.
>
> Jan
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.