[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel NULL pointer dereference in gntdev_mmap -> mmu_interval_notifier_remove



On 18.04.21 16:44, Marek Marczykowski-Górecki wrote:
Hi,

I've recently got the crash like below. I'm not sure what exactly
triggers it (besides grant table mapping as seen in the call trace), and
also I don't have reliable reproducer. It happened once for about ~30
startups.

Previous version tested was 5.10.25 and it didn't happened there, but
since reproduction rate is not great, it could be just luck...

[ 1053.550389] BUG: kernel NULL pointer dereference, address: 00000000000003b0
[ 1053.557844] #PF: supervisor read access in kernel mode
[ 1053.557847] #PF: error_code(0x0000) - not-present page
[ 1053.557851] PGD 0 P4D 0
[ 1053.557858] Oops: 0000 [#1] SMP NOPTI
[ 1053.557863] CPU: 1 PID: 8806 Comm: Xorg Tainted: G        W         
5.10.28-1.fc32.qubes.x86_64 #1
[ 1053.557865] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
[ 1053.557876] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
[ 1053.557879] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 
04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 
03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
[ 1053.557881] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
[ 1053.557883] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1053.557884] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
[ 1053.557885] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
[ 1053.557886] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
[ 1053.557887] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
[ 1053.557913] FS:  0000716f7f9a3a40(0000) GS:ffff888140300000(0000) 
knlGS:0000000000000000
[ 1053.557915] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1053.557916] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
[ 1053.557919] Call Trace:
[ 1053.557944]  gntdev_mmap+0x275/0x2f9 [xen_gntdev]
[ 1053.557950]  mmap_region+0x47e/0x720
[ 1053.557953]  do_mmap+0x438/0x540
[ 1053.557959]  ? security_mmap_file+0x81/0xd0
[ 1053.557963]  vm_mmap_pgoff+0xdf/0x130
[ 1053.557967]  ksys_mmap_pgoff+0x1d6/0x240
[ 1053.557973]  do_syscall_64+0x33/0x40
[ 1053.557977]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1053.557981] RIP: 0033:0x716f7fe8c2e6
[ 1053.557985] Code: 01 00 66 90 f3 0f 1e fa 41 f7 c1 ff 0f 00 00 75 2b 55 48 89 fd 
53 89 cb 48 85 ff 74 37 41 89 da 48 89 ef b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff 
ff 77 62 5b 5d c3 0f 1f 80 00 00 00 00 48 8b 05 79
[ 1053.557986] RSP: 002b:00007ffcb4ef35c8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000009
[ 1053.557988] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000716f7fe8c2e6
[ 1053.557989] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000
[ 1053.557990] RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
[ 1053.557991] R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffcb4ef35e0
[ 1053.557992] R13: 0000000000000001 R14: 0000000000000009 R15: 0000000000000001
[ 1053.557995] Modules linked in: loop nf_tables nfnetlink vfat fat xfs 
snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg 
soundwire_intel soundwire_generic_allocation ppdev snd_soc_core snd_compress 
snd_pcm_dmaengine soundwire_cadence joydev snd_hda_codec snd_hda_core ac97_bus 
snd_hwdep snd_seq snd_seq_device snd_pcm edac_mce_amd snd_timer pcspkr snd 
soundcore e1000e i2c_piix4 parport_pc parport xenfs fuse ip_tables dm_crypt 
bochs_drm drm_vram_helper drm_kms_helper cec drm_ttm_helper ttm serio_raw drm 
virtio_scsi virtio_console ehci_pci ehci_hcd ata_generic pata_acpi floppy 
qemu_fw_cfg xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev 
xen_evtchn uinput
[ 1053.558040] CR2: 00000000000003b0
[ 1053.558135] ---[ end trace 3c5c2ca63aac717a ]---
[ 1054.277085] snd_hda_intel 0000:00:03.0: IRQ timing workaround is activated 
for card #0. Suggest a bigger bdl_pos_adj.
[ 1054.927022] RIP: e030:mmu_interval_notifier_remove+0x2e/0x190
[ 1054.929170] Code: 00 41 55 41 54 55 48 89 fd 53 48 83 ec 30 4c 8b 67 38 65 48 8b 
04 25 28 00 00 00 48 89 44 24 28 31 c0 48 c7 04 24 00 00 00 00 <49> 8b 9c 24 b0 
03 00 00 48 c7 44 24 08 00 00 00 00 48 c7 44 24 10
[ 1054.937800] RSP: e02b:ffffc90041617d18 EFLAGS: 00010246
[ 1054.947281] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1054.949535] RDX: 0000000000000001 RSI: ffffffff81c3e9a0 RDI: ffff88812588b700
[ 1054.973016] RBP: ffff88812588b700 R08: 7fffffffffffffff R09: 0000000000000000
[ 1054.976678] R10: ffff8881088d4708 R11: ffff888108aa6180 R12: 0000000000000000
[ 1054.978850] R13: 00000000fffffffc R14: ffff888106a3ec00 R15: ffff888106a3ec10
[ 1054.980751] FS:  0000716f7f9a3a40(0000) GS:ffff888140300000(0000) 
knlGS:0000000000000000
[ 1054.982878] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1054.984509] CR2: 00000000000003b0 CR3: 0000000105cf4000 CR4: 0000000000000660
[ 1054.990508] Kernel panic - not syncing: Fatal exception
[ 1054.991967] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

Looking at the surrounding code, it is access to 0x3b0(%r12), which is
0x38(%rdi):

ffffffff812f5930 <mmu_interval_notifier_remove>:
ffffffff812f5930:       e8 8b 09 d7 ff          callq  ffffffff810662c0 
<__fentry__>
ffffffff812f5935:       41 55                   push   %r13
ffffffff812f5937:       41 54                   push   %r12
ffffffff812f5939:       55                      push   %rbp
ffffffff812f593a:       48 89 fd                mov    %rdi,%rbp
ffffffff812f593d:       53                      push   %rbx
ffffffff812f593e:       48 83 ec 30             sub    $0x30,%rsp
ffffffff812f5942:       4c 8b 67 38             mov    0x38(%rdi),%r12
ffffffff812f5946:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
ffffffff812f594d:       00 00
ffffffff812f594f:       48 89 44 24 28          mov    %rax,0x28(%rsp)
ffffffff812f5954:       31 c0                   xor    %eax,%eax
ffffffff812f5956:       48 c7 04 24 00 00 00    movq   $0x0,(%rsp)
ffffffff812f595d:       00
ffffffff812f595e:       49 8b 9c 24 b0 03 00    mov    0x3b0(%r12),%rbx
ffffffff812f5965:       00

If my calculation is right, it means map->notifier->mm is NULL.


Could you try the attached patch?


Juergen

Attachment: 0001-xen-gntdev-fix-gntdev_mmap-error-exit-path.patch
Description: Text Data

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.