[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel BUG around vmap/vfree - xen_enter_lazy_mmu()/xen_leave_lazy_mmu() - Linux 7.0-rc1


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jürgen Groß <jgross@xxxxxxxx>
  • Date: Thu, 26 Feb 2026 14:41:12 +0100
  • Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==
  • Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
  • Delivery-date: Thu, 26 Feb 2026 13:41:21 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 26.02.26 14:27, Andrew Cooper wrote:
On 26/02/2026 1:17 pm, Marek Marczykowski-Górecki wrote:
Hi,

When testing Linux 7.0-rc1 in PV dom0, I hit the following panic
sometimes:

[  436.849614] ------------[ cut here ]------------
[  436.849669] kernel BUG at arch/x86/include/asm/xen/hypervisor.h:78!
[  436.849693] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[  436.849710] CPU: 3 UID: 0 PID: 4021 Comm: kworker/u25:1 Not tainted 
7.0.0-0.rc1.1.qubes.1001.fc41.x86_64 #1 PREEMPT(full)
[  436.849729] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
[  436.849743] Workqueue: i915_flip intel_atomic_commit_work [i915]
[  436.850226] RIP: e030:xen_enter_lazy_mmu+0x24/0x30
[  436.850245] Code: 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 65 8b 05 b8 e5 02 
03 85 c0 75 10 65 c7 05 a9 e5 02 03 01 00 00 00 c3 cc cc cc cc <0f> 0b 66 2e 0f 
1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90
[  436.850270] RSP: e02b:ffffc90045727a68 EFLAGS: 00010202
[  436.850283] RAX: 0000000000000001 RBX: ffff8881042fa6d0 RCX: 000fffffffe00000
[  436.850296] RDX: 0000000000000001 RSI: ffff88810a5a2980 RDI: 0000000000000000
[  436.850308] RBP: ffffc90049eda000 R08: ffffc90049edc000 R09: ffffc90049edc000
[  436.850320] R10: ffffc90049edc000 R11: ffffc90049edbfff R12: ffffc90049edc000
[  436.850332] R13: ffffc90045727bb0 R14: ffffc90045727b28 R15: 800000000000006b
[  436.850356] FS:  0000000000000000(0000) GS:ffff888201e6e000(0000) 
knlGS:0000000000000000
[  436.850371] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  436.850383] CR2: 00006543dbade250 CR3: 0000000115ef1000 CR4: 0000000000050660
[  436.850401] Call Trace:
[  436.850410]  <TASK>
[  436.850420]  vmap_pages_pud_range+0x47c/0x530
[  436.850439]  vmap_small_pages_range_noflush+0x1f1/0x2b0
[  436.850451]  ? __get_vm_area_node+0x10a/0x170
[  436.850465]  vmap+0x79/0xd0
[  436.850476]  i915_gem_object_map_page+0x13b/0x210 [i915]
[  436.850812]  i915_gem_object_pin_map+0x1e2/0x210 [i915]
[  436.851123]  i915_gem_object_pin_map_unlocked+0x2d/0xa0 [i915]
[  436.851424]  intel_dsb_buffer_create+0xed/0x1a0 [i915]
[  436.851778]  intel_dsb_prepare+0xca/0x1a0 [i915]
[  436.852110]  intel_atomic_dsb_finish+0x92/0x350 [i915]
[  436.852456]  intel_atomic_commit_tail+0x326/0xd40 [i915]
[  436.852769]  process_one_work+0x18d/0x380
[  436.852779]  worker_thread+0x196/0x300
[  436.852787]  ? __pfx_worker_thread+0x10/0x10
[  436.852796]  kthread+0xe3/0x120
[  436.852805]  ? __pfx_kthread+0x10/0x10
[  436.852815]  ret_from_fork+0x19e/0x260
[  436.852824]  ? __pfx_kthread+0x10/0x10
[  436.852832]  ret_from_fork_asm+0x1a/0x30
[  436.852842]  </TASK>
[  436.852847] Modules linked in: snd_seq_dummy snd_hrtimer 
snd_hda_codec_intelhdmi snd_hda_codec_hdmi snd_hda_codec_alc269 
snd_hda_codec_realtek_lib snd_hda_scodec_component snd_hda_codec_generic 
snd_hda_intel snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl 
snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt 
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink 
snd_sof_intel_hda soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof 
snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks 
soundwire_generic_allocation snd_soc_sdw_utils snd_soc_acpi crc8 intel_rapl_msr 
soundwire_bus intel_rapl_common snd_soc_sdca snd_soc_avs snd_soc_hda_codec 
snd_hda_ext_core snd_hda_codec vfat intel_uncore_frequency_common fat 
snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_hwdep intel_powerclamp 
snd_soc_core iwlwifi snd_compress spi_nor iTCO_wdt ac97_bus intel_pmc_bxt 
ee1004 mtd snd_pcm_dmaengine snd_seq cfg80211 snd_seq_device pcspkr 
spi_intel_pci snd_pcm rfkill spi_intel snd_timer snd
[  436.852939]  i2c_i801 soundcore i2c_smbus idma64 intel_pmc_core 
pmt_telemetry pmt_discovery pmt_class intel_hid intel_pmc_ssram_telemetry 
intel_scu_pltdrv sparse_keymap joydev loop fuse xenfs nfnetlink vsock_loopback 
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci 
lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison 
dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec 
drm_gpusvm_helper i915 i2c_algo_bit drm_buddy hid_multitouch i2c_hid_acpi 
ghash_clmulni_intel video nvme wmi ttm i2c_hid nvme_core nvme_keyring 
drm_display_helper nvme_auth xhci_pci pinctrl_tigerlake thunderbolt hkdf cec 
xhci_hcd intel_vsec serio_raw xen_acpi_processor xen_privcmd xen_pciback 
xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc 
scsi_dh_alua uinput i2c_dev
[  436.853183] ---[ end trace 0000000000000000 ]---

or this:

[  548.736884] ------------[ cut here ]------------
[  548.736907] kernel BUG at arch/x86/include/asm/xen/hypervisor.h:85!
[  548.736923] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[  548.736935] CPU: 0 UID: 0 PID: 206 Comm: kworker/0:2 Not tainted 
7.0.0-0.rc1.1.qubes.1001.fc41.x86_64 #1 PREEMPT(full)
[  548.736949] Hardware name: LENOVO 2347A45/2347A45, BIOS CBET4000 
Nitrokey-v0.2.0-2608-ga649597 01/01/1970
[  548.736962] Workqueue: events delayed_vfree_work
[  548.736976] RIP: e030:xen_leave_lazy_mmu+0x44/0x50
[  548.736989] Code: 02 03 83 f8 01 75 23 65 c7 05 6c e4 02 03 00 00 00 00 65 ff 0d 
7d b8 02 03 74 05 c3 cc cc cc cc e8 61 5d fd ff c3 cc cc cc cc <0f> 0b 66 2e 0f 
1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90
[  548.737010] RSP: e02b:ffffc90040607cf0 EFLAGS: 00010297
[  548.737018] RAX: 0000000000000000 RBX: ffff888164a70408 RCX: 0000000000000000
[  548.737029] RDX: 0000000000000000 RSI: 000ffffffffff000 RDI: ffff8881069c0000
[  548.737039] RBP: ffffc90049681000 R08: ffffc90049681000 R09: 0000000000000027
[  548.737050] R10: 0000000000000027 R11: fefefefefefefeff R12: ffffc90049681000
[  548.737060] R13: ffff8881002fd258 R14: 0000000000000000 R15: ffffc90040607dac
[  548.737079] FS:  0000000000000000(0000) GS:ffff8881f88ee000(0000) 
knlGS:0000000000000000
[  548.737090] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  548.737099] CR2: 000055576c2e6058 CR3: 000000010d47b000 CR4: 0000000000050660
[  548.737115] Call Trace:
[  548.737123]  <TASK>
[  548.737128]  vunmap_pmd_range.isra.0+0x1f1/0x2e0
[  548.737142]  vunmap_p4d_range+0x17d/0x290
[  548.737151]  __vunmap_range_noflush+0x182/0x1d0
[  548.737161]  ? _raw_spin_unlock+0xe/0x30
[  548.737171]  remove_vm_area+0x40/0x70
[  548.737180]  vfree.part.0+0x1b/0x290
[  548.737189]  delayed_vfree_work+0x35/0x50
[  548.737198]  process_one_work+0x18d/0x380
[  548.737207]  worker_thread+0x196/0x300
[  548.737215]  ? __pfx_worker_thread+0x10/0x10
[  548.737224]  kthread+0xe3/0x120
[  548.737233]  ? __pfx_kthread+0x10/0x10
[  548.737242]  ret_from_fork+0x19e/0x260
[  548.737250]  ? __pfx_kthread+0x10/0x10
[  548.737258]  ret_from_fork_asm+0x1a/0x30
[  548.737269]  </TASK>
[  548.737274] Modules linked in: vfat fat snd_seq_dummy snd_hrtimer ath9k 
ath9k_common snd_hda_codec_intelhdmi snd_hda_codec_hdmi ath9k_hw 
snd_hda_codec_alc269 snd_hda_codec_realtek_lib snd_hda_scodec_component 
snd_hda_codec_generic snd_hda_intel snd_hda_codec mac80211 snd_hda_core 
snd_intel_dspcfg snd_intel_sdw_acpi snd_hwdep ath snd_seq snd_seq_device 
snd_ctl_led cfg80211 snd_pcm at24 thinkpad_acpi intel_rapl_msr i2c_i801 
snd_timer sparse_keymap iTCO_wdt intel_rapl_common platform_profile 
intel_powerclamp intel_pmc_bxt pcspkr i2c_smbus rfkill libarc4 snd soundcore 
mei_me e1000e mei joydev lpc_ich loop fuse xenfs nfnetlink vsock_loopback 
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock zram vmw_vmci 
lz4hc_compress lz4_compress dm_thin_pool dm_persistent_data dm_bio_prison 
dm_crypt i915 i2c_algo_bit drm_buddy ghash_clmulni_intel ttm sdhci_pci 
drm_display_helper sdhci_uhs2 sdhci video xhci_pci cqhci wmi cec xhci_hcd 
ehci_pci mmc_core ehci_hcd serio_raw xen_acpi_processor xen_privcmd xen_pciback
[  548.737348]  xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua uinput i2c_dev
[  548.737469] ---[ end trace 0000000000000000 ]---

I don't have clear pattern when this happens, one was during host
suspend, but the other was during "normal" test run (starting/stopping
domUs and running stuff around them). Note also one of those is Intel
and the other AMD, so it isn't really hardware specific.

Slightly more details with links (especially serial0.txt in the logs
tab) at
https://github.com/QubesOS/qubes-linux-kernel/pull/662#issuecomment-3963326188

Any idea?


That looks like the issue Juergen fixed with:

https://lore.kernel.org/xen-devel/20260220123715.834848-1-jgross@xxxxxxxx/

No, it doesn't. The fix is already in rc1, and the crash was quite early during
boot (before any secondary CPUs were brought up).

I guess this problem is related to the lazy_mmu_state series [1].


Juergen

[1]: 
https://lore.kernel.org/lkml/20251215150323.2218608-1-kevin.brodsky@xxxxxxx/

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.