[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Linux 6.13-rc3 many different panics in Xen PV dom0
On 02.01.25 19:54, Marek Marczykowski-Górecki wrote: On Thu, Jan 02, 2025 at 01:24:21PM +0100, Marek Marczykowski-Górecki wrote:On Thu, Jan 02, 2025 at 12:30:10PM +0100, Juergen Gross wrote:On 02.01.25 11:20, Jürgen Groß wrote:On 19.12.24 17:14, Marek Marczykowski-Górecki wrote:Hi, It crashes on boot like below, most of the times. But sometimes (rarely) it manages to stay alive. Below I'm pasting few of the crashes that look distinctly different, if you follow the links, you can find more of them. IMHO it looks like some memory corruption bug somewhere. I tested also Linux 6.13-rc2 before, and it had very similar issue....Full log: https://openqa.qubes-os.org/tests/122879/logfile?filename=serial0.txtI can reproduce a crash with 6.13-rc5 PV dom0. What is really interesting in the logs: most crashes seem to happen right after a module being loaded (in my reproducer it was right after loading the first module). I need to go through the 6.13 commits, but I think I remember having seen a patch optimizing module loading by using large pages for addressing the loaded modules. Maybe the case of no large pages being available isn't handled properly.Seems I was right. For me the following diff fixes the issue. Marek, can you please confirm it fixes your crashes, too?Thanks for looking into it! Will do, I've pushed it to https://github.com/QubesOS/qubes-linux-kernel/pull/662, CI will build it and then I'll post it to openQA.It is much better! Tests are still running, but I already see that many are green. So are you fine with me adding your "Tested-by:"? There is one issue (likely unrelated to this change) - sys-usb (HVM domU with USB controllers passed through) crashes on a system with Raptor Lake CPU (only, others, including ADL and MTL look fine): [ 75.770849] Bluetooth: Core ver 2.22 [ 75.770866] Oops: general protection fault, probably for non-canonical address 0xc9d2315bc82c3bbd: 0000 [#1] PREEMPT SMP NOPTI [ 75.770880] CPU: 0 UID: 0 PID: 923 Comm: (udev-worker) Not tainted 6.13.0-0.rc5.2.qubes.1.fc41.x86_64 #1 [ 75.770890] Hardware name: Xen HVM domU, BIOS 4.19.0 01/02/2025 [ 75.770897] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.770924] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 This code is looking suspicious. Large areas of binary 0 in a normal function? And the code itself is nonsense, as it is using a memory access via ES:, which doesn't make any sense in 64-bit kernel. Juergen [ 75.770943] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246 [ 75.770950] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000 [ 75.770958] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598 [ 75.770967] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab [ 75.770975] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00 [ 75.770983] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100 [ 75.770991] FS: 000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000 [ 75.771000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.771007] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0 [ 75.771016] PKRU: 55555554 [ 75.771019] Call Trace: [ 75.771024] <TASK> [ 75.771028] ? show_trace_log_lvl+0x1b0/0x2f0 [ 75.771036] ? show_trace_log_lvl+0x1b0/0x2f0 [ 75.771042] ? do_one_initcall+0x58/0x310 [ 75.771048] ? __die_body.cold+0x8/0x12 [ 75.771053] ? die_addr+0x3c/0x60 [ 75.771059] ? exc_general_protection+0x17d/0x400 [ 75.771066] ? asm_exc_general_protection+0x26/0x30 [ 75.771074] ? msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.771095] ? bt_init+0x54/0x1d0 [bluetooth] [ 75.771114] ? __pfx_bt_init+0x10/0x10 [bluetooth] [ 75.771131] ? do_one_initcall+0x58/0x310 [ 75.771137] ? do_init_module+0x90/0x250 [ 75.771142] ? init_module_from_file+0x86/0xc0 [ 75.771149] ? idempotent_init_module+0x115/0x310 [ 75.771156] ? __x64_sys_finit_module+0x65/0xc0 [ 75.771163] ? do_syscall_64+0x82/0x160 [ 75.771168] ? backing_file_read_iter+0x156/0x1f0 [ 75.771176] ? ovl_read_iter+0x94/0xa0 [overlay] [ 75.771189] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] [ 75.771199] ? rseq_get_rseq_cs+0x1d/0x220 [ 75.771205] ? rseq_ip_fixup+0x8d/0x1d0 [ 75.771210] ? __seccomp_filter+0x303/0x520 [ 75.771216] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 [ 75.771224] ? syscall_exit_to_user_mode+0x10/0x210 [ 75.771231] ? do_syscall_64+0x8e/0x160 [ 75.771236] ? do_sys_openat2+0x9c/0xe0 [ 75.771241] ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0 [ 75.771249] ? syscall_exit_to_user_mode+0x10/0x210 [ 75.771255] ? do_syscall_64+0x8e/0x160 [ 75.771260] ? do_user_addr_fault+0x1ec/0x7b0 [ 75.771267] ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 75.771274] </TASK> [ 75.771277] Modules linked in: bluetooth(+) rfkill snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 nft_reject_ipv4 nf_reject_ipv4 nft_reject intel_rapl_msr intel_rapl_common nft_ct intel_uncore_frequency_common intel_pmc_core intel_vsec joydev nft_masq pmt_telemetry pmt_class nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni xhci_pci polyval_generic ghash_clmulni_intel xhci_hcd sha512_ssse3 sha256_ssse3 nf_tables sha1_ssse3 ehci_pci mei_me ehci_hcd pcspkr mei ata_generic pata_acpi i2c_piix4 i2c_smbus serio_raw xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn loop fuse nfnetlink overlay xen_blkfront [ 75.771370] ---[ end trace 0000000000000000 ]--- [ 75.771376] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth] [ 75.771397] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 75.771416] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246 [ 75.771422] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000 [ 75.771431] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598 [ 75.771439] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab [ 75.771446] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00 [ 75.771454] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100 [ 75.771463] FS: 000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) knlGS:0000000000000000 [ 75.771471] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 75.771477] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0 [ 75.771485] PKRU: 55555554 [ 75.771488] Kernel panic - not syncing: Fatal exception [ 75.771519] Kernel Offset: 0x3b800000 from 0xffffffff80200000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Full log inside https://openqa.qubes-os.org/tests/124736/file/usbvm-var_log.tar.gz (log/xen/console/guest-sys-usb.log) Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |