[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback BUG_ON when using copy_skb=1
Hi Jan, In my test, the grant table copy error may cause that VM crash. The stack is as follows: kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372! Pid: 2658, comm: iperf Not tainted 2.6.32-220.el6.x86_64 #1 Xen HVM domU RIP: 0010:[<ffffffffa01166ca>] [<ffffffffa01166ca>] xennet_tx_buf_gc+0x18a/0x1f0 [xen_netfront] RSP: 0018:ffff880004403df8 EFLAGS: 00010096 RAX: 0000000000000049 RBX: ffff8800821986e0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff880004403e48 R08: ffffffff81c00690 R09: 0000000000000080 R10: 0000000000013030 R11: 0000000000000000 R12: 000000000000003b R13: 000000000000023d R14: 0000000000000011 R15: 0000000000000011 FS: 00007fd8fd97e700(0000) GS:ffff880004400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000030270aab70 CR3: 0000000080cf4000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process iperf (pid: 2658, threadinfo ffff8800813ba000, task ffff880080d0eb00) Stack: ffff880082198020 ffff880082198f90 ffff88007f8d00c0 0000003f04415fc0 <0> ffff880004403e28 ffff880082198768 ffff880082198020 ffff8800821986e0 <0> 0000000000000282 0000000000000100 ffff880004403e78 ffffffffa0117d4c Call Trace: <IRQ> [<ffffffffa0117d4c>] xennet_interrupt+0x4c/0xb0 [xen_netfront] [<ffffffff810d94f0>] handle_IRQ_event+0x60/0x170 [<ffffffff8109b8a3>] ? ktime_get+0x63/0xe0 [<ffffffff810dbc2e>] handle_edge_irq+0xde/0x180 [<ffffffff812fe809>] __xen_evtchn_do_upcall+0x1b9/0x1f0 [<ffffffff812fedbf>] xen_evtchn_do_upcall+0x2f/0x50 [<ffffffff8100c373>] xen_hvm_callback_vector+0x13/0x20 The BUG code in xen-netfront.c xennet_tx_buf_gc() is: if (unlikely(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { printk(KERN_ALERT "xennet_tx_buf_gc: warning " "-- grant still in use by backend " "domain.\n"); BUG(); In my guess the reason may be as follows: 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and __acquire_grant_for_copy() is executed failed and the grant ref is not ended. So GTF_reading bit cannot be cleared. 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is still set. Regards, Jerry On 2013/10/17 16:00, Jan Beulich wrote: >>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote: >> But there may be still concurrency problems in my test. >> If the page replacing in copy_pending_req() was done after >> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly marked >> with GNTCOPY_source_gref. >> Here the memory of that page in skb has been replaced with Dom0 local >> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in >> netbk_rx_actions() will get errors. >> The messages is shown as: >> >> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0) >> >> Would you like to share some opinions? > > At a first glance that seems possible, but the question is - does it > cause any problems other than the quoted message to be issued > (and the problematic packet getting re-transmitted)? I'm asking > mainly because fixing this would appear to imply adding locking to > these paths - with the risk of adversely affecting performance. > > Jan > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |