[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Yet another S3 issue in Xen 4.14
On Fri, Oct 02, 2020 at 04:42:50PM +0100, Andrew Cooper wrote: > On 02/10/2020 16:39, Jan Beulich wrote: > > On 02.10.2020 17:08, Marek Marczykowski-Górecki wrote: > >> I've done another bisect on the commit broken up in separate changes > >> (https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/dbg-s3) > >> and the bad part seems to be this: > >> > >> From dbdb32f8c265295d6af7cd4cd0aa12b6d04a0430 Mon Sep 17 00:00:00 2001 > >> From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >> Date: Fri, 2 Oct 2020 15:40:22 +0100 > >> Subject: [PATCH 1/1] CR4 > > Interesting - I was wild guessing so yesterday, but couldn't come > > up with even a vague reason why this would be. I think you could > > further split it up: > > > >> --- a/xen/arch/x86/acpi/power.c > >> +++ b/xen/arch/x86/acpi/power.c > >> @@ -195,7 +195,6 @@ static int enter_state(u32 state) > >> unsigned long flags; > >> int error; > >> struct cpu_info *ci; > >> - unsigned long cr4; > >> > >> if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) ) > >> return -EINVAL; > >> @@ -270,15 +269,15 @@ static int enter_state(u32 state) > >> > >> system_state = SYS_STATE_resume; > >> > >> - /* Restore CR4 and EFER from cached values. */ > >> - cr4 = read_cr4(); > >> - write_cr4(cr4 & ~X86_CR4_MCE); > >> + /* Restore EFER from cached value. */ > >> write_efer(read_efer()); > > This one should be possible to leave in place despite ... > > > >> device_power_up(SAVED_ALL); > >> > >> mcheck_init(&boot_cpu_data, false); > >> - write_cr4(cr4); > >> + > >> + /* Restore CR4 from cached value, now MCE is set up. */ > >> + write_cr4(read_cr4()); > > ... this change. > > > > Further, while I can't see how the set_in_cr4() in mcheck_init() > > could badly interact with the CR4 writes here, another option > > might be to suppress it when system_state == SYS_STATE_resume > > && c == &boot_cpu_data (or !bsp && c == &boot_cpu_data). > > > >> --- a/xen/arch/x86/acpi/suspend.c > >> +++ b/xen/arch/x86/acpi/suspend.c > >> @@ -23,7 +23,4 @@ void save_rest_processor_state(void) > >> void restore_rest_processor_state(void) > >> { > >> load_system_tables(); > >> - > >> - /* Restore full CR4 (inc MCE) now that the IDT is in place. */ > >> - write_cr4(mmu_cr4_features); > >> } > > This one should be possible to leave in place despite the other > > changes. > > We're continuing to debug in private. mmu_cr4_features and read_cr4() > are equivalent (as expected), but very different from MINIMAL_CR4 which > is what the trampoline configures, so I think we're suffering a > CR4-related #UD/#GP somewhere in device_power_up() or mcheck_init(). > > Its not INVPCID. Trying others. Some update: It's OSFXSR + probably other flags, the crash happens in enter_state()->device_power_up()->time_resume()->efi_get_time() This also explains the difference between legacy / UEFI boot. Disabling efi_get_time() or setting CR4 earlier solves _this_ issue, but applied on top of stable-4.14 still doesn't work. Looks like there is yet another S3 breakage in between. I'm bisecting it further... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |