[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Kernel Oops trying to share memory with gntalloc
Hello Xen Developers, I am experiencing crashes when attempting to acquire a large number of grant table references through gntalloc. I tried this on a 3.2 kernel and on the most recent stable (3.6.6) kernel with the same result (Xen 4.1.1). A simple program, run from Dom0, which tries to create a shared buffer about 16MB or bigger with a DomU gives the kernel Oops included below. After the crash, I am not able to kill the process (stays in uninterruptible D state), and the xen-gntalloc module can not be removed (resource temporarily unavailable). A hard reboot is required as well, because the system will not reboot, and other programs will randomly block and never return. [ 260.681539] main[4448]: segfault at 0 ip 00007f0c13f7e03b sp 00007fff36608e88 error 6 in libc-2.13.so[7f0c13ef1000+199000] [ 292.498807] BUG: unable to handle kernel paging request at ffffc90206047fc8 [ 292.499020] IP: [<ffffffff813b6a93>] gnttab_query_foreign_access_v2+0x13/0x20 [ 292.499180] PGD fa6446067 PUD 0 [ 292.499365] Oops: 0000 [#1] SMP [ 292.499550] Modules linked in: xen_gntalloc xt_physdev xen_netback xen_blkback ib_cm ib_sa ib_uverbs ib_umad mdio ib_mthca ib_mad ib_core ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc nfsv4 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc xenfs xen_privcmd xen_evtchn usb_storage xen_gntdev w83793 sp5100_tco dm_multipath hwmon_vid [last unloaded: xen_gntalloc] [ 292.502287] CPU 3 [ 292.502346] Pid: 4454, comm: main Tainted: G W 3.6.6 #3 Supermicro H8DGU/H8DGU [ 292.502554] RIP: e030:[<ffffffff813b6a93>] [<ffffffff813b6a93>] gnttab_query_foreign_access_v2+0x13/0x20 [ 292.502740] RSP: e02b:ffff880f8acf3db0 EFLAGS: 00010286 [ 292.502835] RAX: ffffc90006048000 RBX: ffff880e80a74a40 RCX: 0000000000002b4a [ 292.502935] RDX: 0000000000000000 RSI: ffff880e80a749c0 RDI: 00000000ffffffe4 [ 292.503039] RBP: ffff880f8acf3db8 R08: 0000000000016960 R09: ffffea003a029d00 [ 292.503138] R10: ffffffffa0206182 R11: 0000000000000002 R12: ffffffffffffffe4 [ 292.503238] R13: 0000000000001ff8 R14: 0000000000000000 R15: ffff880f9c2e8000 [ 292.503357] FS: 00007f97bc59eb40(0000) GS:ffff880faa860000(0000) knlGS:0000000000000000 [ 292.503487] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [ 292.503582] CR2: ffffc90206047fc8 CR3: 0000000e869d5000 CR4: 0000000000000660 [ 292.503682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 292.503781] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 292.503885] Process main (pid: 4454, threadinfo ffff880f8acf2000, task ffff880f8c405b40) [ 292.504014] Stack: [ 292.504101] ffffffff813b6ab3 ffff880f8acf3df8 ffffffffa0206195 ffff880f8acf3df8 [ 292.504428] ffffffff813b7e70 ffff880e80a74a40 0000000000001ff8 ffff880e80a74a40 [ 292.504752] ffff880e80a74a40 ffff880f8acf3ea8 ffffffffa02069c8 00000000015717f0 [ 292.505074] Call Trace: [ 292.505161] [<ffffffff813b6ab3>] ? gnttab_query_foreign_access+0x13/0x20 [ 292.505264] [<ffffffffa0206195>] __del_gref+0xf5/0x140 [xen_gntalloc] [ 292.505364] [<ffffffff813b7e70>] ? gnttab_grant_foreign_access+0x30/0x70 [ 292.505464] [<ffffffffa02069c8>] gntalloc_ioctl+0x478/0x590 [xen_gntalloc] [ 292.505568] [<ffffffff8118a5df>] do_vfs_ioctl+0x8f/0x4f0 [ 292.505671] [<ffffffff8116a2af>] ? kmem_cache_free+0x2f/0x110 [ 292.505770] [<ffffffff811833d3>] ? putname+0x33/0x50 [ 292.505864] [<ffffffff8118aad1>] sys_ioctl+0x91/0xa0 [ 292.505960] [<ffffffff8166a2e9>] system_call_fastpath+0x16/0x1b [ 292.506055] Code: 90 48 8b 05 88 81 b4 00 89 ff 5d 0f b7 04 f8 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 81 b4 00 89 ff 5d <0f> b7 04 78 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 [ 292.509405] RIP [<ffffffff813b6a93>] gnttab_query_foreign_access_v2+0x13/0x20 [ 292.509586] RSP <ffff880f8acf3db0> [ 292.509673] CR2: ffffc90206047fc8 [ 292.509777] ---[ end trace d8a8651f22939589 ]--- The disassembly of the code looks something like this (function begin/end edited by me). If I did things right, the fault occurs in the movzwl instruction at <+43>): 0x00000000006010a0 <+0>: nop 0x00000000006010a1 <+1>: mov 0xb48188(%rip),%rax # 0x1149230 0x00000000006010a8 <+8>: mov %edi,%edi 0x00000000006010aa <+10>: pop %rbp 0x00000000006010ab <+11>: movzwl (%rax,%rdi,8),%eax 0x00000000006010af <+15>: and $0x18,%eax 0x00000000006010b2 <+18>: retq 0x00000000006010b3 <+19>: nopl 0x0(%rax,%rax,1) ------ begin gnttab_query_foreign_access 0x00000000006010b8 <+24>: push %rbp 0x00000000006010b9 <+25>: mov %rsp,%rbp 0x00000000006010bc <+28>: data32 data32 data32 xchg %ax,%ax 0x00000000006010c1 <+33>: mov 0xb48178(%rip),%rax # 0x1149240 0x00000000006010c8 <+40>: mov %edi,%edi 0x00000000006010ca <+42>: pop %rbp 0x00000000006010cb <+43>: movzwl (%rax,%rdi,2),%eax 0x00000000006010cf <+47>: and $0x18,%eax 0x00000000006010d2 <+50>: retq ------ end gnttab_query_foreign_access 0x00000000006010d3 <+51>: nopl 0x0(%rax,%rax,1) 0x00000000006010d8 <+56>: push %rbp 0x00000000006010d9 <+57>: mov %rsp,%rbp 0x00000000006010dc <+60>: data16 0x00000000006010dd <+61>: data16 0x00000000006010de <+62>: data16 0x00000000006010df <+63>: data16 0x00000000006010e0 <+64>: add %al,(%rax) I tried locating the bug, but could not: something is odd because in order for the call trace to be as it is, it seems to me that gntalloc_ioctl_alloc had to call add_grefs. add_grefs calls gnttab_grant_foreign_access and then __del_gref, who in turn does the call to gnttab_query_foreign_access, which seems to segfault. However, this did not seem logical to me, because the only way for add_grefs to call __del_gref would be for gnttab_grant_foreign_access to fail (gref->gref_id < 0), and in that case __del_gref does not call gnttab_query_foreign_access because it is only called if gref->gref_id > 0. In spite of this, I see no other way execution would result in this call trace (first gnttab_grant_foreign_access, then __del_gref, and then finally gnttab_query_foreign_access). I must have missed something. In case it is needed, my memory alloc code which interfaces with gntalloc looks like this: http://pastie.org/5351941 When loading xen-gntalloc, I am specifying the limit parameter with a value higher than 1024 and higher than the number of pages I request. This crash does not happen when I request a small amount of pages, but as soon as I specify a large number (about 4096 or more), the crash happens immediately. Regardless of the size, the crash happens eventually when enough grant references have been requested. Even if I am not deallocating things right (I think I am doing it right, though), this should probably not happen. I would appreciate any help for fixing this bug. Thank you in advance, Pablo _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |