[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Kernel Oops trying to share memory with gntalloc



Hello Xen Developers,

I am experiencing crashes when attempting to acquire a large number of
grant table references through gntalloc.
I tried this on a 3.2 kernel and on the most recent stable (3.6.6)
kernel with the same result (Xen 4.1.1).
A simple program, run from Dom0, which tries to create a shared buffer
about 16MB or bigger with a DomU gives the kernel Oops included below.
After the crash, I am not able to kill the process (stays in
uninterruptible D state), and the xen-gntalloc module can not be
removed (resource temporarily unavailable). A hard reboot is required
as well, because the system will not reboot, and other programs will
randomly block and never return.

[  260.681539] main[4448]: segfault at 0 ip 00007f0c13f7e03b sp
00007fff36608e88 error 6 in libc-2.13.so[7f0c13ef1000+199000]
[  292.498807] BUG: unable to handle kernel paging request at ffffc90206047fc8
[  292.499020] IP: [<ffffffff813b6a93>] gnttab_query_foreign_access_v2+0x13/0x20
[  292.499180] PGD fa6446067 PUD 0
[  292.499365] Oops: 0000 [#1] SMP
[  292.499550] Modules linked in: xen_gntalloc xt_physdev xen_netback
xen_blkback ib_cm ib_sa ib_uverbs ib_umad mdio ib_mthca ib_mad ib_core
ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp
iptable_filter ip_tables x_tables bridge stp llc nfsv4 nfsd nfs_acl
auth_rpcgss nfs fscache lockd sunrpc xenfs xen_privcmd xen_evtchn
usb_storage xen_gntdev w83793 sp5100_tco dm_multipath hwmon_vid [last
unloaded: xen_gntalloc]
[  292.502287] CPU 3
[  292.502346] Pid: 4454, comm: main Tainted: G        W    3.6.6 #3
Supermicro H8DGU/H8DGU
[  292.502554] RIP: e030:[<ffffffff813b6a93>]  [<ffffffff813b6a93>]
gnttab_query_foreign_access_v2+0x13/0x20
[  292.502740] RSP: e02b:ffff880f8acf3db0  EFLAGS: 00010286
[  292.502835] RAX: ffffc90006048000 RBX: ffff880e80a74a40 RCX: 0000000000002b4a
[  292.502935] RDX: 0000000000000000 RSI: ffff880e80a749c0 RDI: 00000000ffffffe4
[  292.503039] RBP: ffff880f8acf3db8 R08: 0000000000016960 R09: ffffea003a029d00
[  292.503138] R10: ffffffffa0206182 R11: 0000000000000002 R12: ffffffffffffffe4
[  292.503238] R13: 0000000000001ff8 R14: 0000000000000000 R15: ffff880f9c2e8000
[  292.503357] FS:  00007f97bc59eb40(0000) GS:ffff880faa860000(0000)
knlGS:0000000000000000
[  292.503487] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  292.503582] CR2: ffffc90206047fc8 CR3: 0000000e869d5000 CR4: 0000000000000660
[  292.503682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  292.503781] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  292.503885] Process main (pid: 4454, threadinfo ffff880f8acf2000,
task ffff880f8c405b40)
[  292.504014] Stack:
[  292.504101]  ffffffff813b6ab3 ffff880f8acf3df8 ffffffffa0206195
ffff880f8acf3df8
[  292.504428]  ffffffff813b7e70 ffff880e80a74a40 0000000000001ff8
ffff880e80a74a40
[  292.504752]  ffff880e80a74a40 ffff880f8acf3ea8 ffffffffa02069c8
00000000015717f0
[  292.505074] Call Trace:
[  292.505161]  [<ffffffff813b6ab3>] ? gnttab_query_foreign_access+0x13/0x20
[  292.505264]  [<ffffffffa0206195>] __del_gref+0xf5/0x140 [xen_gntalloc]
[  292.505364]  [<ffffffff813b7e70>] ? gnttab_grant_foreign_access+0x30/0x70
[  292.505464]  [<ffffffffa02069c8>] gntalloc_ioctl+0x478/0x590 [xen_gntalloc]
[  292.505568]  [<ffffffff8118a5df>] do_vfs_ioctl+0x8f/0x4f0
[  292.505671]  [<ffffffff8116a2af>] ? kmem_cache_free+0x2f/0x110
[  292.505770]  [<ffffffff811833d3>] ? putname+0x33/0x50
[  292.505864]  [<ffffffff8118aad1>] sys_ioctl+0x91/0xa0
[  292.505960]  [<ffffffff8166a2e9>] system_call_fastpath+0x16/0x1b
[  292.506055] Code: 90 48 8b 05 88 81 b4 00 89 ff 5d 0f b7 04 f8 83
e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66 66 66 90 48 8b 05 78 81 b4
00 89 ff 5d <0f> b7 04 78 83 e0 18 c3 0f 1f 44 00 00 55 48 89 e5 66 66
66 66
[  292.509405] RIP  [<ffffffff813b6a93>]
gnttab_query_foreign_access_v2+0x13/0x20
[  292.509586]  RSP <ffff880f8acf3db0>
[  292.509673] CR2: ffffc90206047fc8
[  292.509777] ---[ end trace d8a8651f22939589 ]---


The disassembly of the code looks something like this (function
begin/end edited by me). If I did things right, the fault occurs in
the movzwl instruction at <+43>):

0x00000000006010a0 <+0>: nop
   0x00000000006010a1 <+1>: mov    0xb48188(%rip),%rax        # 0x1149230
   0x00000000006010a8 <+8>: mov    %edi,%edi
   0x00000000006010aa <+10>: pop    %rbp
   0x00000000006010ab <+11>: movzwl (%rax,%rdi,8),%eax
   0x00000000006010af <+15>: and    $0x18,%eax
   0x00000000006010b2 <+18>: retq
   0x00000000006010b3 <+19>: nopl   0x0(%rax,%rax,1)
------ begin gnttab_query_foreign_access
   0x00000000006010b8 <+24>: push   %rbp
   0x00000000006010b9 <+25>: mov    %rsp,%rbp
   0x00000000006010bc <+28>: data32 data32 data32 xchg %ax,%ax
   0x00000000006010c1 <+33>: mov    0xb48178(%rip),%rax        # 0x1149240
   0x00000000006010c8 <+40>: mov    %edi,%edi
   0x00000000006010ca <+42>: pop    %rbp
   0x00000000006010cb <+43>: movzwl (%rax,%rdi,2),%eax
   0x00000000006010cf <+47>: and    $0x18,%eax
   0x00000000006010d2 <+50>: retq
------ end gnttab_query_foreign_access
   0x00000000006010d3 <+51>: nopl   0x0(%rax,%rax,1)
   0x00000000006010d8 <+56>: push   %rbp
   0x00000000006010d9 <+57>: mov    %rsp,%rbp
   0x00000000006010dc <+60>: data16
   0x00000000006010dd <+61>: data16
   0x00000000006010de <+62>: data16
   0x00000000006010df <+63>: data16
   0x00000000006010e0 <+64>: add    %al,(%rax)


I tried locating the bug, but could not:
something is odd because in order for the call trace to be as it is,
it seems to me that gntalloc_ioctl_alloc had to call add_grefs.
add_grefs calls gnttab_grant_foreign_access and then __del_gref, who
in turn does the call to gnttab_query_foreign_access, which seems to
segfault.
However, this did not seem logical to me, because the only way for
add_grefs to call __del_gref would be for gnttab_grant_foreign_access
to fail (gref->gref_id < 0), and in that case __del_gref does not call
gnttab_query_foreign_access because it is only called if gref->gref_id
> 0.
In spite of this, I see no other way execution would result in this
call trace (first gnttab_grant_foreign_access, then __del_gref, and
then finally gnttab_query_foreign_access). I must have missed
something.

In case it is needed, my memory alloc code which interfaces with
gntalloc looks like this: http://pastie.org/5351941
When loading xen-gntalloc, I am specifying the limit parameter with a
value higher than 1024 and higher than the number of pages I request.
This crash does not happen when I request a small amount of pages, but
as soon as I specify a large number (about 4096 or more), the crash
happens immediately. Regardless of the size, the crash happens
eventually when enough grant references have been requested. Even if I
am not deallocating things right (I think I am doing it right,
though), this should probably not happen.

I would appreciate any help for fixing this bug.

Thank you in advance,

Pablo

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.