[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux 6.13-rc3 many different panics in Xen PV dom0
On Thu, Jan 02, 2025 at 01:24:21PM +0100, Marek Marczykowski-Górecki wrote: > On Thu, Jan 02, 2025 at 12:30:10PM +0100, Juergen Gross wrote: > > On 02.01.25 11:20, Jürgen Groß wrote: > > > On 19.12.24 17:14, Marek Marczykowski-Górecki wrote: > > > > Hi, > > > > > > > > It crashes on boot like below, most of the times. But sometimes (rarely) > > > > it manages to stay alive. Below I'm pasting few of the crashes that look > > > > distinctly different, if you follow the links, you can find more of > > > > them. IMHO it looks like some memory corruption bug somewhere. I tested > > > > also Linux 6.13-rc2 before, and it had very similar issue. > > > > > > ... > > > > > > > > > > > Full log: > > > > https://openqa.qubes-os.org/tests/122879/logfile?filename=serial0.txt > > > > > > I can reproduce a crash with 6.13-rc5 PV dom0. > > > > > > What is really interesting in the logs: most crashes seem to happen right > > > after a module being loaded (in my reproducer it was right after loading > > > the first module). > > > > > > I need to go through the 6.13 commits, but I think I remember having seen > > > a patch optimizing module loading by using large pages for addressing the > > > loaded modules. Maybe the case of no large pages being available isn't > > > handled properly. > > > > Seems I was right. > > > > For me the following diff fixes the issue. Marek, can you please confirm > > it fixes your crashes, too? > > Thanks for looking into it! > Will do, I've pushed it to > https://github.com/QubesOS/qubes-linux-kernel/pull/662, CI will build it > and then I'll post it to openQA. It is much better! Tests are still running, but I already see that many are green. There is one issue (likely unrelated to this change) - sys-usb (HVM domU with USB controllers passed through) crashes on a system with Raptor Lake CPU (only, others, including ADL and MTL look fine): [ 75.770849] Bluetooth: Core ver 2.22 [ 75.770866] Oops: general protection fault, probably for non-canonical address 0xc9d2315bc82c3bbd: 0000 [#1] PREEMPT SMP NOPTI [ 75.770880] CPU: 0 UID: 0 PID: 923 Comm: (udev-worker) Not tainted 6.13.0-0.rc5.2.qubes.1.fc41.x86_64 #1 [ 75.770890] Hardware name: Xen HVM domU, BIOS 4.19.0 01/02/2025 [ 75.770897] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.770924] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 75.770943] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246 [ 75.770950] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000 [ 75.770958] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598 [ 75.770967] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab [ 75.770975] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00 [ 75.770983] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100 [ 75.770991] FS: 000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000 [ 75.771000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.771007] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0 [ 75.771016] PKRU: 55555554 [ 75.771019] Call Trace: [ 75.771024] <TASK> [ 75.771028] ? show_trace_log_lvl+0x1b0/0x2f0 [ 75.771036] ? show_trace_log_lvl+0x1b0/0x2f0 [ 75.771042] ? do_one_initcall+0x58/0x310 [ 75.771048] ? __die_body.cold+0x8/0x12 [ 75.771053] ? die_addr+0x3c/0x60 [ 75.771059] ? exc_general_protection+0x17d/0x400 [ 75.771066] ? asm_exc_general_protection+0x26/0x30 [ 75.771074] ? msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.771095] ? bt_init+0x54/0x1d0 [bluetooth] [ 75.771114] ? __pfx_bt_init+0x10/0x10 [bluetooth] [ 75.771131] ? do_one_initcall+0x58/0x310 [ 75.771137] ? do_init_module+0x90/0x250 [ 75.771142] ? init_module_from_file+0x86/0xc0 [ 75.771149] ? idempotent_init_module+0x115/0x310 [ 75.771156] ? __x64_sys_finit_module+0x65/0xc0 [ 75.771163] ? do_syscall_64+0x82/0x160 [ 75.771168] ? backing_file_read_iter+0x156/0x1f0 [ 75.771176] ? ovl_read_iter+0x94/0xa0 [overlay] [ 75.771189] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] [ 75.771199] ? rseq_get_rseq_cs+0x1d/0x220 [ 75.771205] ? rseq_ip_fixup+0x8d/0x1d0 [ 75.771210] ? __seccomp_filter+0x303/0x520 [ 75.771216] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 [ 75.771224] ? syscall_exit_to_user_mode+0x10/0x210 [ 75.771231] ? do_syscall_64+0x8e/0x160 [ 75.771236] ? do_sys_openat2+0x9c/0xe0 [ 75.771241] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 [ 75.771249] ? syscall_exit_to_user_mode+0x10/0x210 [ 75.771255] ? do_syscall_64+0x8e/0x160 [ 75.771260] ? do_user_addr_fault+0x1ec/0x7b0 [ 75.771267] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 75.771274] </TASK> [ 75.771277] Modules linked in: bluetooth(+) rfkill snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject intel_rapl_msr intel_rapl_common nft_ct intel_uncore_frequency_common intel_pmc_core intel_vsec joydev nft_masq pmt_telemetry pmt_class nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni xhci_pci polyval_generic ghash_clmulni_intel xhci_hcd sha512_ssse3 sha256_ssse3 nf_tables sha1_ssse3 ehci_pci mei_me ehci_hcd pcspkr mei ata_generic pata_acpi i2c_piix4 i2c_smbus serio_raw xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn loop fuse nfnetlink overlay xen_blkfront [ 75.771370] ---[ end trace 0000000000000000 ]--- [ 75.771376] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.771397] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 75.771416] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246 [ 75.771422] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000 [ 75.771431] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598 [ 75.771439] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab [ 75.771446] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00 [ 75.771454] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100 [ 75.771463] FS: 000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000 [ 75.771471] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.771477] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0 [ 75.771485] PKRU: 55555554 [ 75.771488] Kernel panic - not syncing: Fatal exception [ 75.771519] Kernel Offset: 0x3b800000 from 0xffffffff80200000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Full log inside https://openqa.qubes-os.org/tests/124736/file/usbvm-var_log.tar.gz (log/xen/console/guest-sys-usb.log) -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |