[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen4.2 S3 regression?



On 20/09/2012 07:13, "Keir Fraser" <keir.xen@xxxxxxxxx> wrote:

> CPU#1 got stuck in loop in cpu_init() as it appears to be ?already
> initialised¹ in cpu_initialized bitmap. CPU#0 detects it is stuck and carries
> on, but the resume code assumes all CPUs are brought back online and crashes
> later.
> 
> I wonder how long this has been broken. I recall reworking the CPU bringup
> code a lot early during 4.1.0 development... And I didn¹t test S3.
> 
>  -- Keir

However, I did test CPU hotplug a lot, and S3 uses the hotplug logic to take
down and bring up CPUs. So I don't think I can have broken this.

Are you able to hotplug physical CPUs from dom0 using the
tools/misc/xen-hptool utility? If not, at least this might be a friendlier
test method and environment than a full S3.

 -- Keir

> On 19/09/2012 22:07, "Ben Guthro" <ben@xxxxxxxxxx> wrote:
> 
>> No hardware debugger just yet - but I've moved to another machine (Lenovo
>> T400 laptop) - and am now seeing the following stack trace when I resume
>> (this is using the tip of the 4.2-testing tree)
>> 
>> It looks like either the vcpu, or the runstate is NULL, at this point in the
>> resume process...
>> 
>> 
>> (XEN) Finishing wakeup from ACPI S3 state.
>> (XEN) Enabling non-boot CPUs  ...
>> (XEN) CPU#1 already initialized!
>> (XEN) Stuck ??
>> (XEN) Error taking CPU1 up: -5
>> [   38.570054] ACPI: Low-level resume complete
>> [   38.570054] PM: Restoring platform NVS memory
>> [   38.570054] Enabling non-boot CPUs ...
>> (XEN) ----[ Xen-4.2.1-pre  x86_64  debug=n  Tainted:    C ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130
>> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
>> (XEN) rax: 00007d3b7fd17180   rbx: ffff8300bd2fe000   rcx: 0000000000000000
>> (XEN) rdx: ffff08003fc8bd80   rsi: ffff82c48029fe28   rdi: ffff8300bd2fe000
>> (XEN) rbp: ffff82c48029fe28   rsp: ffff82c48029fdf8   r8:  0000000000000008
>> (XEN) r9:  00000000000001c0   r10: ffff82c48021f4a0   r11: 0000000000000282
>> (XEN) r12: ffff82c4802e8ee0   r13: ffff880039762da0   r14: ffff82c4802d3140
>> (XEN) r15: fffffffffffffff2   cr0: 000000008005003b   cr4: 00000000000026f0
>> (XEN) cr3: 0000000139ee4000   cr2: 0000000000000060
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c48029fdf8:
>> (XEN)    ffff8300bd2fe000 ffff82c48029ff18 ffff880037481d40 ffff880039762da0
>> (XEN)    0000000000000001 ffff82c480157df4 0000000000000070 ffff82f6016db300
>> (XEN)    00000000000b6d98 ffff8301355d8000 0000000000000070 ffff82c4801702ab
>> (XEN)    ffff88003fc8bd80 0000000000000000 0000000000000020 ffff8300bd2fe000
>> (XEN)    ffff8301355d8000 ffff880037481d40 ffff880039762da0 0000000000000001
>> (XEN)    0000000000000003 ffff82c4801058df ffff82c48029ff18 ffff82c48011462e
>> (XEN)    0000000000000000 0000000000000000 0000000400000004 ffff82c48029ff18
>> (XEN)    0000000000000010 ffff8300bd6a0000 ffff8800374819a8 ffff8300bd6a0000
>> (XEN)    ffff880037481d48 0000000000000001 ffff880039762da0 ffff82c480214288
>> (XEN)    0000000000000003 0000000000000001 ffff880039762da0 0000000000000001
>> (XEN)    ffff880037481d48 0000000000000001 0000000000000282 ffff880002dc4240
>> (XEN)    00000000000001c0 00000000000001c0 0000000000000018 ffffffff8100130a
>> (XEN)    ffff880037481d40 0000000000000001 0000000000000005 0000010000000000
>> (XEN)    ffffffff8100130a 000000000000e033 0000000000000282 ffff880037481d20
>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 ffff8300bd6a0000 0000000000000000
>> (XEN)    0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c480120585>] vcpu_runstate_get+0xe5/0x130
>> (XEN)    [<ffff82c480157df4>] arch_do_vcpu_op+0x134/0x5d0
>> (XEN)    [<ffff82c4801702ab>] do_update_descriptor+0x1db/0x220
>> (XEN)    [<ffff82c4801058df>] do_vcpu_op+0x6f/0x4a0
>> (XEN)    [<ffff82c48011462e>] do_multicall+0x13e/0x330
>> (XEN)    [<ffff82c480214288>] syscall_enter+0x88/0x8d
>> (XEN)    
>> (XEN) Pagetable walk from 0000000000000060:
>> (XEN)  L4[0x000] = 00000001004a5067 0000000000038c9d
>> (XEN)  L3[0x000] = 000000013a703067 0000000000003094
>> (XEN)  L2[0x000] = 0000000000000000 ffffffffffffffff 
>> (XEN) 
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) FATAL PAGE FAULT
>> (XEN) [error_code=0000]
>> (XEN) Faulting linear address: 0000000000000060
>> (XEN) ****************************************
>> (XEN) 
>> (XEN) Reboot in five seconds...
>> 
>> 
>> On Fri, Sep 7, 2012 at 12:06 PM, Ben Guthro <ben@xxxxxxxxxx> wrote:
>>> I'll work on getting a JTAG, ICE, or something else - it is on an
>>> Intel SDP - so it should have the ports for it.
>>> 
>>> My current suspicion on this is that the hardware registers are not
>>> being programmed the same way as they were in 4.0.x
>>> (Since the "pulsing power button LED" on the laptops, and the behavior
>>> of the Desktop SDP are now similar)
>>> 
>>> Once again - I don't have a lot of evidence to back this up - however,
>>> if I ifdef out the register writes that actually start the low level
>>> suspend - in
>>> xen/arch/x86/acpi/power.c  acpi_enter_sleep_state() - the rest of the
>>> suspend process completes as though the machine suspended, and then
>>> immediately resumed.
>>> 
>>> In this case - the system seems to be functioning properly.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hack to prevent low level S3 attached.
>>> 
>>> 
>>> 
>>> On Fri, Sep 7, 2012 at 8:18 AM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>>>>> On 07.09.12 at 13:51, Ben Guthro <ben@xxxxxxxxxx> wrote:
>>>>> However, when I run with console=none, the observed behavior is very
>>>>> different.
>>>>> The system seems to go to sleep successfully - but when I press the
>>>>> power button to wake it up - the power comes on - the fans spin up -
>>>>> but the system is unresponsive.
>>>>> No video
>>>>> No network
>>>>> keyboard LEDs (Caps,Numlock) do not light up.
>>>>> 
>>>>> 
>>>>> Alternate debugging strategies welcome.
>>>> 
>>>> I'm afraid other than being lucky to spot something via code
>>>> inspection, the only alternative is an ITP/ICE. Maybe Intel folks
>>>> could help out debugging this if it's reproducible for them.
>>>> 
>>>> Jan
>>>> 
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.