[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback BUG_ON when using copy_skb=1
On 2013/10/17 20:11, Jan Beulich wrote: >>>> On 17.10.13 at 12:26, jerry <jerry.lilijun@xxxxxxxxxx> wrote: >> Hi Jan, > > please don't top post. > >> In my test, the grant table copy error may cause that VM crash. >> The stack is as follows: >> kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372! >> ... >> The BUG code in xen-netfront.c xennet_tx_buf_gc() is: >> if (unlikely(gnttab_query_foreign_access( >> np->grant_tx_ref[id]) != 0)) { >> printk(KERN_ALERT "xennet_tx_buf_gc: warning " >> "-- grant still in use by backend " >> "domain.\n"); >> BUG(); >> >> In my guess the reason may be as follows: >> 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and >> __acquire_grant_for_copy() is executed failed and the grant ref is not ended. >> So GTF_reading bit cannot be cleared. >> 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is >> still set. > > If that was the case, this would be a hypervisor bug: a grant copy > operation is supposed to hold the grant active only for as long as > the copy operation takes. You'll in particular notice that > __acquire_grant_for_copy() in its error path clears GTF_reading > (and GTF_writing, as appropriate) again. You'd likely need to > instrument the code to demonstrate (via a couple of extra log > messages) what you think is not working properly here. I have proved that the GTF_reading or GTF_writing is surely cleared after __gnttab_copy(). So the question is where the GTF_reading is set. Is hypervisor doing a grant copy operation while VM netfront calling xennet_tx_buf_gc()? Any ideas? > > Jan > >> On 2013/10/17 16:00, Jan Beulich wrote: >>>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote: >>>> But there may be still concurrency problems in my test. >>>> If the page replacing in copy_pending_req() was done after >>>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly >>>> marked >>>> with GNTCOPY_source_gref. >>>> Here the memory of that page in skb has been replaced with Dom0 local >>>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in >>>> netbk_rx_actions() will get errors. >>>> The messages is shown as: >>>> >>>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0) >>>> >>>> Would you like to share some opinions? >>> >>> At a first glance that seems possible, but the question is - does it >>> cause any problems other than the quoted message to be issued >>> (and the problematic packet getting re-transmitted)? I'm asking >>> mainly because fixing this would appear to imply adding locking to >>> these paths - with the risk of adversely affecting performance. >>> >>> Jan >>> >>> >>> > > > > > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |