|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 0/3] x86: S3 resume adjustments
On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
>>>> 1: correct ordering of operations during S3 resume
>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>> 3: check feature flags after resume
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>>
>>>> Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:
I can't find that object. I presume this isn't an upstream tree?
>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0,
> BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from
> 0x00000000fee00c00 to 0x00000000fee00800
Bad dom0. It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7
Can you disassemble the binary and find out where this is? On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.
> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
> (XEN) rax: ffffc90040cd4068 rbx: 0000000000000000 rcx: 000000000000000a
> (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: 000036ffbf32bf77 rsp: ffffc90040cd4000 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8
> (XEN) fsb: 0000000000000000 gsb: ffff88021e6c0000 gss: 0000000000000000
> (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected
> ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000,
> sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0
Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.
The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter. SYSENTER_ESP should be identical to TSS.RSP0
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
_show_registers(regs, crs, CTXT_hypervisor, NULL);
show_stack_overflow(cpu, regs);
+ {
+ uint64_t val;
+
+ rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+ printk("*** SYSENTER_ESP: %p\n", _p(val));
+ }
+
panic("DOUBLE FAULT -- system shutdown");
}
so this bit of debugging should help track things down. If not, then
we've probably got an issue (re)writing the syscall trampolines.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |