[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: S3 resume issue in xstate_init
On Tue, Aug 17, 2021 at 12:14:36PM +0100, Andrew Cooper wrote: > On 17/08/2021 12:02, Marek Marczykowski-Górecki wrote: > > On Tue, Aug 17, 2021 at 03:25:21AM +0200, Marek Marczykowski-Górecki wrote: > >> Hi, > >> > >> I've got another S3 issue: > >> > >> (XEN) Preparing system for ACPI S3 state. > >> (XEN) Disabling non-boot CPUs ... > >> (XEN) Broke affinity for IRQ1, new: ffff > >> (XEN) Broke affinity for IRQ16, new: ffff > >> (XEN) Broke affinity for IRQ9, new: ffff > >> (XEN) Broke affinity for IRQ139, new: ffff > >> (XEN) Broke affinity for IRQ8, new: ffff > >> (XEN) Broke affinity for IRQ14, new: ffff > >> (XEN) Broke affinity for IRQ20, new: ffff > >> (XEN) Broke affinity for IRQ137, new: ffff > >> (XEN) Broke affinity for IRQ138, new: ffff > >> (XEN) Entering ACPI S3 state. > >> (XEN) mce_intel.c:773: MCA Capability: firstbank 0, extended MCE MSR 0, > >> BCAST, CMCI > >> (XEN) CPU0 CMCI LVT vector (0xf1) already installed > >> (XEN) Finishing wakeup from ACPI S3 state. > >> (XEN) microcode: CPU0 updated from revision 0xca to 0xea, date = 2021-01-05 > >> (XEN) xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f > >> (XEN) Enabling non-boot CPUs ... > >> (XEN) xstate: size: 0x440 (uncompressed 0x240) and states: 0x1f > >> (XEN) Xen BUG at xstate.c:673 > >> (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]---- > >> (XEN) CPU: 1 > >> (XEN) RIP: e008:[<ffff82d040350ee4>] xstate_init+0x24b/0x2ff > >> (XEN) RFLAGS: 0000000000010087 CONTEXT: hypervisor > >> (XEN) rax: 0000000000000240 rbx: 000000000000001f rcx: 0000000000000440 > >> (XEN) rdx: 0000000000000001 rsi: 000000000000000a rdi: 000000000000001f > >> (XEN) rbp: ffff83025dc9fd38 rsp: ffff83025dc9fd20 r8: 0000000000000001 > >> (XEN) r9: ffff83025dc9fc88 r10: 0000000000000001 r11: 0000000000000001 > >> (XEN) r12: ffff83025dc9fd80 r13: 000000000000001f r14: 0000000000000001 > >> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000003526e0 > >> (XEN) cr3: 0000000049656000 cr2: 0000000000000000 > >> (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > >> (XEN) Xen code around <ffff82d040350ee4> (xstate_init+0x24b/0x2ff): > >> (XEN) ff e9 a2 00 00 00 0f 0b <0f> 0b 89 f8 89 f1 0f a2 89 f2 4c 8b 0d cb > >> b4 0f > >> (XEN) Xen stack trace from rsp=ffff83025dc9fd20: > >> (XEN) 0000000000000240 ffff83025dc9fd80 0000000000000001 > >> ffff83025dc9fd70 > >> (XEN) ffff82d04027e7a1 000000004035a7f1 7ffafbbf01100800 > >> 00000000bfebfbff > >> (XEN) 0000000000000001 00000000000000c8 ffff83025dc9feb8 > >> ffff82d0402e43ce > >> (XEN) 000000160a9e0106 bfebfbff80000008 2c1008007ffaf3bf > >> 0000000f00000121 > >> (XEN) 00000000029c6fbf 0000000000000100 000000009c002e00 > >> 02afcd7f00000000 > >> (XEN) 756e654700000000 6c65746e49656e69 65746e4904b21920 > >> 726f43202952286c > >> (XEN) 376920294d542865 432048303537382d 322e322040205550 > >> 000000007a484730 > >> (XEN) ffff830000000000 ffff83025dc9fe18 00002400402e8e0b > >> 000000085dc9fe30 > >> (XEN) 00000002402e9f21 0000000000000001 ffffffff00000000 > >> ffff82d0402e0040 > >> (XEN) 00000000003526e0 ffff83025dc9fe68 ffff82d04027bd15 > >> 0000000000000001 > >> (XEN) ffff8302590a0000 0000000000000000 00000000000000c8 > >> 0000000000000001 > >> (XEN) 0000000000000001 ffff83025dc9feb8 ffff82d0402e32b7 > >> 0000000000000001 > >> (XEN) 0000000000000001 00000000000000c8 0000000000000001 > >> ffff83025dc9fee8 > >> (XEN) ffff82d04030e401 0000000000000001 0000000000000000 > >> 0000000000000000 > >> (XEN) 0000000000000000 0000000000000000 ffff82d040200122 > >> 0800002000000002 > >> (XEN) 0100000400010000 0000002000000000 2000000000100000 > >> 0000001000000000 > >> (XEN) 2000000000000000 0000000029000000 0000008000000000 > >> 00110000a0000000 > >> (XEN) 8000000080000000 4000000000000008 0000100000000000 > >> 0200000040000080 > >> (XEN) 0004000000000000 0000010000000002 0400002030000000 > >> 0000000060000000 > >> (XEN) 0400001000010000 0000000010000000 0000004010000000 > >> 0000000000000000 > >> (XEN) Xen call trace: > >> (XEN) [<ffff82d040350ee4>] R xstate_init+0x24b/0x2ff > >> (XEN) [<ffff82d04027e7a1>] F identify_cpu+0x318/0x4af > >> (XEN) [<ffff82d0402e43ce>] F recheck_cpu_features+0x1f/0x72 > >> (XEN) [<ffff82d04030e401>] F start_secondary+0x255/0x38a > >> (XEN) [<ffff82d040200122>] F __high_start+0x82/0x91 > >> (XEN) > >> (XEN) > >> (XEN) **************************************** > >> (XEN) Panic on CPU 1: > >> (XEN) Xen BUG at xstate.c:673 > >> (XEN) **************************************** > >> (XEN) > >> (XEN) Reboot in five seconds... > >> > >> This is with added debug patch: > >> > >> diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c > >> index 6aaf9a2f1546..7873a21b356a 100644 > >> --- a/xen/arch/x86/xstate.c > >> +++ b/xen/arch/x86/xstate.c > >> @@ -668,6 +668,8 @@ void xstate_init(struct cpuinfo_x86 *c) > >> else > >> { > >> BUG_ON(xfeature_mask != feature_mask); > >> + printk("xstate: size: %#x (uncompressed %#x) and states: > >> %#"PRIx64"\n", > >> + xsave_cntxt_size, hw_uncompressed_size(feature_mask), > >> feature_mask); > >> BUG_ON(xsave_cntxt_size != hw_uncompressed_size(feature_mask)); > >> } > >> > >> > >> As can be seen above - the xsave size differs between BSP and other > >> CPU(s) - likely because of (not) loaded ucode update there. > >> I guess it's a matter of moving ucode loading somewhere else, right? > > > > Few more data points: > > > > 1. The CPU is i7-8750H (family 6, model 158, stepping 10). > > 2. I do have "smt=off" on the Xen cmdline, if that matters. > > As a datapoint, it would be interesting to confirm what the behaviour is > with SMT enabled. > > I'd expect it not to make a difference, because smt=off is a purely Xen > construct and doesn't change the hardware configuration. Uhm, changing to smt=on actually _did_ change it. Now it doesn't crash! Let me add CPU number to the above printk - is smp_processor_id() the thing I want? With that, I get: https://gist.github.com/marmarek/ae604a1e5cf49639a1eec9e220c037ca Note that at boot all CPUs reports 0x440 (but only later are parked). Maybe resume path for the parked CPUs is missing some step? > > I've tried the same without letting Xen load the ucode update (so, > > staying at 0xca) and got the same effect. So, I think it isn't about > > ucode... > > Any chance of a full boot log? No problem, see above :) > This is bizzare. Looking through start_secondary(), we've got an > ordering error between updating microcode and checking for dropped > features, but again I don't think this would be relevant here. > > I suspect this is going to take some more custom debugging logic. Hints welcome ;) I can easily test any Xen patches there. PS I'm pretty happy with my Xen debug setup there - I boot via PXE, which allows me quickly iterate (build and test with just one reboot, not two), and then collect crash message via kexec (sadly this laptop refuses to boot with anything non-SSD plugged into M.2 slot :/). One thing that could use improvement, is extracting console messages from memory dump - `crash` doesn't work for me, and with `gdb` I get `conring` of all zero-es (likely invalid address?). So, I'm using `strings` :/ I should make some writeup about this setup ;) -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |