[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linux 6.13-rc3 many different panics in Xen PV dom0



On Thu, Jan 02, 2025 at 01:24:21PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 02, 2025 at 12:30:10PM +0100, Juergen Gross wrote:
> > On 02.01.25 11:20, Jürgen Groß wrote:
> > > On 19.12.24 17:14, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > It crashes on boot like below, most of the times. But sometimes (rarely)
> > > > it manages to stay alive. Below I'm pasting few of the crashes that look
> > > > distinctly different, if you follow the links, you can find more of
> > > > them. IMHO it looks like some memory corruption bug somewhere. I tested
> > > > also Linux 6.13-rc2 before, and it had very similar issue.
> > > 
> > > ...
> > > 
> > > > 
> > > > Full log:
> > > > https://openqa.qubes-os.org/tests/122879/logfile?filename=serial0.txt
> > > 
> > > I can reproduce a crash with 6.13-rc5 PV dom0.
> > > 
> > > What is really interesting in the logs: most crashes seem to happen right
> > > after a module being loaded (in my reproducer it was right after loading
> > > the first module).
> > > 
> > > I need to go through the 6.13 commits, but I think I remember having seen
> > > a patch optimizing module loading by using large pages for addressing the
> > > loaded modules. Maybe the case of no large pages being available isn't
> > > handled properly.
> > 
> > Seems I was right.
> > 
> > For me the following diff fixes the issue. Marek, can you please confirm
> > it fixes your crashes, too?
> 
> Thanks for looking into it!
> Will do, I've pushed it to
> https://github.com/QubesOS/qubes-linux-kernel/pull/662, CI will build it
> and then I'll post it to openQA.

It is much better!

Tests are still running, but I already see that many are green. There is
one issue (likely unrelated to this change) - sys-usb (HVM domU with USB
controllers passed through) crashes on a system with Raptor Lake CPU
(only, others, including ADL and MTL look fine):

[   75.770849] Bluetooth: Core ver 2.22
[   75.770866] Oops: general protection fault, probably for non-canonical 
address 0xc9d2315bc82c3bbd: 0000 [#1] PREEMPT SMP NOPTI
[   75.770880] CPU: 0 UID: 0 PID: 923 Comm: (udev-worker) Not tainted 
6.13.0-0.rc5.2.qubes.1.fc41.x86_64 #1
[   75.770890] Hardware name: Xen HVM domU, BIOS 4.19.0 01/02/2025
[   75.770897] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.770924] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b 
ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   75.770943] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246
[   75.770950] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000
[   75.770958] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598
[   75.770967] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab
[   75.770975] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00
[   75.770983] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100
[   75.770991] FS:  000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) 
knlGS:0000000000000000
[   75.771000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.771007] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0
[   75.771016] PKRU: 55555554
[   75.771019] Call Trace:
[   75.771024]  <TASK>
[   75.771028]  ? show_trace_log_lvl+0x1b0/0x2f0
[   75.771036]  ? show_trace_log_lvl+0x1b0/0x2f0
[   75.771042]  ? do_one_initcall+0x58/0x310
[   75.771048]  ? __die_body.cold+0x8/0x12
[   75.771053]  ? die_addr+0x3c/0x60
[   75.771059]  ? exc_general_protection+0x17d/0x400
[   75.771066]  ? asm_exc_general_protection+0x26/0x30
[   75.771074]  ? msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.771095]  ? bt_init+0x54/0x1d0 [bluetooth]
[   75.771114]  ? __pfx_bt_init+0x10/0x10 [bluetooth]
[   75.771131]  ? do_one_initcall+0x58/0x310
[   75.771137]  ? do_init_module+0x90/0x250
[   75.771142]  ? init_module_from_file+0x86/0xc0
[   75.771149]  ? idempotent_init_module+0x115/0x310
[   75.771156]  ? __x64_sys_finit_module+0x65/0xc0
[   75.771163]  ? do_syscall_64+0x82/0x160
[   75.771168]  ? backing_file_read_iter+0x156/0x1f0
[   75.771176]  ? ovl_read_iter+0x94/0xa0 [overlay]
[   75.771189]  ? __pfx_ovl_file_accessed+0x10/0x10 [overlay]
[   75.771199]  ? rseq_get_rseq_cs+0x1d/0x220
[   75.771205]  ? rseq_ip_fixup+0x8d/0x1d0
[   75.771210]  ? __seccomp_filter+0x303/0x520
[   75.771216]  ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[   75.771224]  ? syscall_exit_to_user_mode+0x10/0x210
[   75.771231]  ? do_syscall_64+0x8e/0x160
[   75.771236]  ? do_sys_openat2+0x9c/0xe0
[   75.771241]  ? syscall_exit_to_user_mode_prepare+0x15e/0x1a0
[   75.771249]  ? syscall_exit_to_user_mode+0x10/0x210
[   75.771255]  ? do_syscall_64+0x8e/0x160
[   75.771260]  ? do_user_addr_fault+0x1ec/0x7b0
[   75.771267]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   75.771274]  </TASK>
[   75.771277] Modules linked in: bluetooth(+) rfkill snd_seq_dummy snd_hrtimer 
snd_seq snd_seq_device snd_timer snd soundcore nft_reject_ipv6 nf_reject_ipv6 
nft_reject_ipv4 nf_reject_ipv4 nft_reject intel_rapl_msr intel_rapl_common 
nft_ct intel_uncore_frequency_common intel_pmc_core intel_vsec joydev nft_masq 
pmt_telemetry pmt_class nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni 
xhci_pci polyval_generic ghash_clmulni_intel xhci_hcd sha512_ssse3 sha256_ssse3 
nf_tables sha1_ssse3 ehci_pci mei_me ehci_hcd pcspkr mei ata_generic pata_acpi 
i2c_piix4 i2c_smbus serio_raw xen_scsiback target_core_mod xen_netback 
xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn loop fuse nfnetlink 
overlay xen_blkfront
[   75.771370] ---[ end trace 0000000000000000 ]---
[   75.771376] RIP: 0010:msft_monitor_device_del+0x93/0x170 [bluetooth]
[   75.771397] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0 65 21 <26> 2b 8b 
ad 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   75.771416] RSP: 0000:ffffad644108fa40 EFLAGS: 00010246
[   75.771422] RAX: ffff93da8a149600 RBX: c9d2315bc82c3810 RCX: 0000000100000000
[   75.771431] RDX: 0000000000000001 RSI: ffff93da905e9180 RDI: ffff93da81404598
[   75.771439] RBP: ffffad644108fa58 R08: 0000000000000064 R09: 00000000000012ab
[   75.771446] R10: ffff93da81207000 R11: 0000000000000286 R12: ffffad644108fb00
[   75.771454] R13: ffffad644108fa68 R14: ffff93da9089b840 R15: ffff93da8c265100
[   75.771463] FS:  000078fa4cec4bc0(0000) GS:ffff93da97000000(0000) 
knlGS:0000000000000000
[   75.771471] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   75.771477] CR2: 000074fa64aadc08 CR3: 00000000105d2006 CR4: 0000000000770ef0
[   75.771485] PKRU: 55555554
[   75.771488] Kernel panic - not syncing: Fatal exception
[   75.771519] Kernel Offset: 0x3b800000 from 0xffffffff80200000 (relocation 
range: 0xffffffff80000000-0xffffffffbfffffff)

Full log inside
https://openqa.qubes-os.org/tests/124736/file/usbvm-var_log.tar.gz
(log/xen/console/guest-sys-usb.log)

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.