[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Resend: Linux 4.11-rc7: kernel BUG at drivers/xen/events/events_base.c:1221
On 25/04/17 13:38, Juergen Gross wrote: > On 25/04/17 13:28, Sander Eikelenboom wrote: >> On 25/04/17 13:00, Juergen Gross wrote: >>> On 25/04/17 12:33, Sander Eikelenboom wrote: >>>> On 25/04/17 09:01, Juergen Gross wrote: >>>>> On 25/04/17 08:57, Sander Eikelenboom wrote: >>>>>> On 25/04/17 08:42, Juergen Gross wrote: >>>>>>> On 25/04/17 08:35, Sander Eikelenboom wrote: >>>>>>>> (XEN) [2017-04-24 21:20:53.203] d0v0 Unhandled invalid opcode >>>>>>>> fault/trap [#6, ec=ffffffff] >>>>>>>> (XEN) [2017-04-24 21:20:53.203] domain_crash_sync called from entry.S: >>>>>>>> fault at ffff82d080358f70 entry.o#create_bounce_frame+0x145/0x154 >>>>>>>> (XEN) [2017-04-24 21:20:53.203] Domain 0 (vcpu#0) crashed on cpu#0: >>>>>>>> (XEN) [2017-04-24 21:20:53.203] ----[ Xen-4.9-unstable x86_64 >>>>>>>> debug=y Not tainted ]---- >>>>>>>> (XEN) [2017-04-24 21:20:53.203] CPU: 0 >>>>>>>> (XEN) [2017-04-24 21:20:53.203] RIP: e033:[<ffffffff8255a485>] >>>>>>> >>>>>>> Can you please tell us symbol+offset for RIP? >>>>>>> >>>>>>> Juergen >>>>>>> >>>>>> >>>>>> Sure: >>>>>> # addr2line -e vmlinux-4.11.0-rc8-20170424-linus-doflr-xennext-boris+ >>>>>> ffffffff8255a485 >>>>>> linux-linus/arch/x86/xen/enlighten_pv.c:288 >>>>>> >>>>>> Which is: >>>>>> static bool __init xen_check_xsave(void) >>>>>> { >>>>>> unsigned int err, eax, edx; >>>>>> >>>>>> /* >>>>>> * Xen 4.0 and older accidentally leaked the host XSAVE flag >>>>>> into guest >>>>>> * view, despite not being able to support guests using the >>>>>> * functionality. Probe for the actual availability of XSAVE by >>>>>> seeing >>>>>> * whether xgetbv executes successfully or raises #UD. >>>>>> */ >>>>>> HERE --> asm volatile("1: .byte 0x0f,0x01,0xd0\n\t" /* xgetbv */ >>>>>> "xor %[err], %[err]\n" >>>>>> "2:\n\t" >>>>>> ".pushsection .fixup,\"ax\"\n\t" >>>>>> "3: movl $1,%[err]\n\t" >>>>>> "jmp 2b\n\t" >>>>>> ".popsection\n\t" >>>>>> _ASM_EXTABLE(1b, 3b) >>>>>> : [err] "=r" (err), "=a" (eax), "=d" (edx) >>>>>> : "c" (0)); >>>>>> >>>>>> return err == 0; >>>>> >>>>> I hoped so. :-) >>>>> >>>>> I posted a patch to repair this some minutes ago. Would you mind to try >>>>> it? See: >>>>> >>>>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg02925.html >>>>> >>>>> >>>>> Juergen >>>> >>>> Hmm next up seems to be a hanging dom0 kernel somewhat later during boot, >>>> with not too many clues. >>>> (any output of xen debug-keys that could be of interest ?) >>>> >>> ... >>>> [ 0.000000] ACPI: Early table checksum verification disabled >>>> [ 0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM) >>>> [ 0.000000] ACPI: RSDT 0x00000000AFF90000 000048 (v01 MSI OEMSLIC >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: FACP 0x00000000AFF90200 000084 (v01 7640MS A7640100 >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: DSDT 0x00000000AFF905E0 009427 (v01 A7640 A7640100 >>>> 00000100 INTL 20051117) >>>> [ 0.000000] ACPI: FACS 0x00000000AFF9E000 000040 >>>> [ 0.000000] ACPI: APIC 0x00000000AFF90390 000088 (v01 7640MS A7640100 >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: MCFG 0x00000000AFF90420 00003C (v01 7640MS OEMMCFG >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: SLIC 0x00000000AFF90460 000176 (v01 MSI OEMSLIC >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: OEMB 0x00000000AFF9E040 000072 (v01 7640MS A7640100 >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: SRAT 0x00000000AFF9A5E0 000108 (v03 AMD FAM_F_10 >>>> 00000002 AMD 00000001) >>>> [ 0.000000] ACPI: HPET 0x00000000AFF9A6F0 000038 (v01 7640MS OEMHPET >>>> 20100913 MSFT 00000097) >>>> [ 0.000000] ACPI: IVRS 0x00000000AFF9A730 000110 (v01 AMD RD890S >>>> 00202031 AMD 00000000) >>>> [ 0.000000] ACPI: SSDT 0x00000000AFF9A840 000DA4 (v01 A M I POWERNOW >>>> 00000001 AMD 00000001) >>>> [ 0.000000] ACPI: Local APIC address 0xfee00000 >>>> [ 0.000000] Setting APIC ro >>> >>> Hmm, this seems to be only the first part of a message. >>> >>> Could you try debug-key "0" (probably multiple times) and have a look >>> where dom0 vcpu 0 is spending its time? >>> >>> >>> Juergen >>> >> >> Here you are: >> >> [ 0.000000] ACPI: Early table checksum verification disabled >> [ 0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM) >> [ 0.000000] ACPI: RSDT 0x00000000AFF90000 000048 (v01 MSI OEMSLIC >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: FACP 0x00000000AFF90200 000084 (v01 7640MS A7640100 >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: DSDT 0x00000000AFF905E0 009427 (v01 A7640 A7640100 >> 00000100 INTL 20051117) >> [ 0.000000] ACPI: FACS 0x00000000AFF9E000 000040 >> [ 0.000000] ACPI: APIC 0x00000000AFF90390 000088 (v01 7640MS A7640100 >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: MCFG 0x00000000AFF90420 00003C (v01 7640MS OEMMCFG >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: SLIC 0x00000000AFF90460 000176 (v01 MSI OEMSLIC >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: OEMB 0x00000000AFF9E040 000072 (v01 7640MS A7640100 >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: SRAT 0x00000000AFF9A5E0 000108 (v03 AMD FAM_F_10 >> 00000002 AMD 00000001) >> [ 0.000000] ACPI: HPET 0x00000000AFF9A6F0 000038 (v01 7640MS OEMHPET >> 20100913 MSFT 00000097) >> [ 0.000000] ACPI: IVRS 0x00000000AFF9A730 000110 (v01 AMD RD890S >> 00202031 AMD 00000000) >> [ 0.000000] ACPI: SSDT 0x00000000AFF9A840 000DA4 (v01 A M I POWERNOW >> 00000001 AMD 00000001) >> [ 0.000000] ACPI: Local APIC address 0xfee00000 >> [ 0.000000] Setting AP(XEN) [2017-04-25 11:11:35.568] *** Serial input -> >> Xen (type 'CTRL-a' three times to switch input to DOM0) >> (XEN) [2017-04-25 11:11:37.000] '0' pressed -> dumping Dom0's registers >> (XEN) [2017-04-25 11:11:37.001] *** Dumping Dom0 vcpu#0 state: *** >> (XEN) [2017-04-25 11:11:37.001] RIP: e033:[<ffffffff81cccde9>] >> (XEN) [2017-04-25 11:11:45.422] RIP: e033:[<ffffffff81cccde9>] >> (XEN) [2017-04-25 11:11:56.132] RIP: e033:[<ffffffff81ccc740>] >> (XEN) [2017-04-25 11:12:02.474] RIP: e033:[<ffffffff81cccde9>] >> (XEN) [2017-04-25 11:12:06.224] RIP: e033:[<ffffffff81ccc740>] ffffffff81cccde9 arch/x86/entry/entry_64.o:? ffffffff81ccc740> arch/x86/entry/entry_64.S:1007 #ifdef CONFIG_XEN idtentry xen_debug do_debug has_error_code=0 idtentry xen_int3 do_int3 has_error_code=0 HERE --> idtentry xen_stack_segment do_stack_segment has_error_code=1 #endif > Thanks. If you just could translate those again to symbol+offset? > And the suspicious addresses on the stack as well: > > ffffffff81023bfd arch/x86/kernel/process_64.c:485 if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) { /* * AMD CPUs have a misfeature: SYSRET sets the SS selector but * does not update the cached descriptor. As a result, if we * do SYSRET while SS is NULL, we'll end up in user mode with * SS apparently equal to __USER_DS but actually unusable. * * The straightforward workaround would be to fix it up just * before SYSRET, but that would slow down the system call * fast paths. Instead, we ensure that SS is never NULL in * system call context. We do this by replacing NULL SS * selectors at every context switch. SYSCALL sets up a valid * SS, so the only way to get NULL is to re-enter the kernel * from CPL 3 through an interrupt. Since that can't happen * in the same task as a running syscall, we are guaranteed to * context switch between every interrupt vector entry and a * subsequent SYSRET. * * We read SS first because SS reads are much faster than * writes. Out of caution, we force SS to __KERNEL_DS even if * it previously had a different non-NULL value. */ unsigned short ss_sel; savesegment(ss, ss_sel); if (ss_sel != __KERNEL_DS) HERE --> loadsegment(ss, __KERNEL_DS); } > ffffffff81cc4620 init/main.c:956 static int __ref kernel_init(void *unused) HERE --> { int ret; kernel_init_freeable(); /* need to finish all async __init code before freeing the memory */ async_synchronize_full(); > ffffffff81ccb060 arch/x86/entry/entry_64.S:412 /* * A newly forked process directly context switches into this address. * * rax: prev task we switched from * rbx: kernel thread func (NULL for user thread) * r12: kernel thread arg */ ENTRY(ret_from_fork) HERE --> FRAME_BEGIN /* help unwinder find end of stack */ movq %rax, %rdi call schedule_tail /* rdi: 'prev' task parameter */ > > > Juergen > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |