[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: NULL pointer dereference in cpufreq_update_limits(?) under Xen PV dom0 - regression in 6.13



On Thu, Mar 27, 2025 at 11:14 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 27.03.2025 01:51, Marek Marczykowski-Górecki wrote:
> > Hi,
> >
> > I've got a report[1] that 6.13.6 crashes as listed below. It worked fine in
> > 6.12.11. We've tried few simple things to narrow the problem down, but
> > without much success.
> >
> > This is running in Xen 4.17.5, PV dom0, which probably is relevant here.
> > This is running on AMD Ryzen 9 7950X3D, with ASRock X670E Taichi
> > motherboard.
> > There are few more details in the original report (link below).
> >
> > The kernel package (including its config saved into /boot) is here:
> > https://yum.qubes-os.org/r4.2/current/host/fc37/rpm/kernel-latest-6.13.6-1.qubes.fc37.x86_64.rpm
> > https://yum.qubes-os.org/r4.2/current/host/fc37/rpm/kernel-latest-modules-6.13.6-1.qubes.fc37.x86_64.rpm
> >
> > The crash message:
> > [    9.367048] BUG: kernel NULL pointer dereference, address: 
> > 0000000000000070
> > [    9.368251] #PF: supervisor read access in kernel mode
> > [    9.369273] #PF: error_code(0x0000) - not-present page
> > [    9.370346] PGD 0 P4D 0
> > [    9.371222] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [    9.372114] CPU: 0 UID: 0 PID: 128 Comm: kworker/0:2 Not tainted 
> > 6.13.6-1.qubes.fc37.x86_64 #1
> > [    9.373184] Hardware name: ASRock X670E Taichi/X670E Taichi, BIOS 3.20 
> > 02/21/2025
> > [    9.374183] Workqueue: kacpi_notify acpi_os_execute_deferred
> > [    9.375124] RIP: e030:cpufreq_update_limits+0x10/0x30
> > [    9.375840] Code: 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 
> > 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 05 98 e4 21 02 
> > <48> 8b 40 70 48 85 c0 74 06 e9 a2 36 38 00 cc e9 ec fe ff ff 66 66
> > [    9.377009] RSP: e02b:ffffc9004058be28 EFLAGS: 00010246
> > [    9.377667] RAX: 0000000000000000 RBX: ffff888005bf4800 RCX: 
> > ffff88805d635fa8
> > [    9.378415] RDX: ffff888005bf4800 RSI: 0000000000000085 RDI: 
> > 0000000000000000
> > [    9.379127] RBP: ffff888005cd7800 R08: 0000000000000000 R09: 
> > 8080808080808080
> > [    9.379887] R10: ffff88800391abc0 R11: fefefefefefefeff R12: 
> > ffff888004e8aa00
> > [    9.380669] R13: ffff88805d635f80 R14: ffff888004e8aa15 R15: 
> > ffff8880059baf00
> > [    9.381514] FS:  0000000000000000(0000) GS:ffff88805d600000(0000) 
> > knlGS:0000000000000000
> > [    9.382345] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    9.383045] CR2: 0000000000000070 CR3: 000000000202c000 CR4: 
> > 0000000000050660
> > [    9.383786] Call Trace:
> > [    9.384335]  <TASK>
> > [    9.384886]  ? __die+0x23/0x70
> > [    9.385456]  ? page_fault_oops+0x95/0x190
> > [    9.386036]  ? exc_page_fault+0x76/0x190
> > [    9.386636]  ? asm_exc_page_fault+0x26/0x30
> > [    9.387215]  ? cpufreq_update_limits+0x10/0x30
> > [    9.387805]  acpi_processor_notify.part.0+0x79/0x150
> > [    9.388402]  acpi_ev_notify_dispatch+0x4b/0x80
> > [    9.389013]  acpi_os_execute_deferred+0x1a/0x30
> > [    9.389610]  process_one_work+0x186/0x3b0
> > [    9.390205]  worker_thread+0x251/0x360
> > [    9.390765]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [    9.391376]  ? __pfx_worker_thread+0x10/0x10
> > [    9.391957]  kthread+0xd2/0x100
> > [    9.392493]  ? __pfx_kthread+0x10/0x10
> > [    9.393043]  ret_from_fork+0x34/0x50
> > [    9.393575]  ? __pfx_kthread+0x10/0x10
> > [    9.394090]  ret_from_fork_asm+0x1a/0x30
> > [    9.394621]  </TASK>
> > [    9.395106] Modules linked in: gpio_generic amd_3d_vcache acpi_pad(-) 
> > loop fuse xenfs dm_thin_pool dm_persistent_data dm_bio_prison amdgpu amdxcp 
> > i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_exec crc32_pclmul 
> > gpu_sched
> > crc32c_intel drm_suballoc_helper polyval_clmulni drm_panel_backlight_quirks 
> > polyval_generic drm_buddy ghash_clmulni_intel sha512_ssse3 
> > drm_display_helper sha256_ssse3 sha1_ssse3 xhci_pci cec nvme sp5100_tco 
> > xhci_hcd nvme_core nvme_auth
> > video wmi xen_acpi_processor xen_privcmd xen_pciback xen_blkback 
> > xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> > uinput dm_multipath
> > [    9.398698] CR2: 0000000000000070
> > [    9.399266] ---[ end trace 0000000000000000 ]---
> > [    9.399880] RIP: e030:cpufreq_update_limits+0x10/0x30
> > [    9.400528] Code: 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 
> > 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 05 98 e4 21 02 
> > <48> 8b 40 70 48 85 c0 74 06 e9 a2 36 38 00 cc e9 ec fe ff ff 66 66
> > [    9.401673] RSP: e02b:ffffc9004058be28 EFLAGS: 00010246
> > [    9.402316] RAX: 0000000000000000 RBX: ffff888005bf4800 RCX: 
> > ffff88805d635fa8
> > [    9.403060] RDX: ffff888005bf4800 RSI: 0000000000000085 RDI: 
> > 0000000000000000
> > [    9.403819] RBP: ffff888005cd7800 R08: 0000000000000000 R09: 
> > 8080808080808080
> > [    9.404581] R10: ffff88800391abc0 R11: fefefefefefefeff R12: 
> > ffff888004e8aa00
> > [    9.405332] R13: ffff88805d635f80 R14: ffff888004e8aa15 R15: 
> > ffff8880059baf00
> > [    9.406063] FS:  0000000000000000(0000) GS:ffff88805d600000(0000) 
> > knlGS:0000000000000000
> > [    9.406830] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    9.407561] CR2: 0000000000000070 CR3: 000000000202c000 CR4: 
> > 0000000000050660
> > [    9.408318] Kernel panic - not syncing: Fatal exception
> > [    9.409022] Kernel Offset: disabled
> > (XEN) Hardware Dom0 crashed: 'noreboot' set - not rebooting.
> >
> > Looking at the call trace, it's likely related to ACPI, and Xen too, so
> > I'm adding relevant lists too.
> >
> > Any ideas?
> >
> > #regzbot introduced: v6.12.11..v6.13.6
>
> That code looks to have been introduced for 6.9, so I wonder if so far you 
> merely
> were lucky not to have observed any "highest perf changed" notification. See
> 9c4a13a08a9b ("ACPI: cpufreq: Add highest perf change notification"), which 
> imo
> merely adds a 2nd path to a pre-existing problem: cpufreq_update_limits() 
> assumes
> that cpufreq_driver is non-NULL, and only checks 
> cpufreq_driver->update_limits.
> But of course the assumption there may be legitimate, and it's logic elsewhere
> which is or has become flawed.

cpufreq_update_limits() needs to ensure that the driver is there.

The attached patch should address this issue, Marek please verify.

Attachment: cpufreq-update-limits-fix.patch
Description: Text Data


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.