[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback BUG_ON when using copy_skb=1



On 2013/10/17 20:11, Jan Beulich wrote:
>>>> On 17.10.13 at 12:26, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>> Hi Jan,
> 
> please don't top post.
> 
>> In my test, the grant table copy error may cause that VM crash.
>> The stack is as follows:
>> kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372!
>> ...
>> The BUG code in xen-netfront.c xennet_tx_buf_gc() is:
>>                      if (unlikely(gnttab_query_foreign_access(
>>                              np->grant_tx_ref[id]) != 0)) {
>>                              printk(KERN_ALERT "xennet_tx_buf_gc: warning "
>>                                     "-- grant still in use by backend "
>>                                     "domain.\n");
>>                              BUG();
>>
>> In my guess the reason may be as follows:
>> 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and 
>> __acquire_grant_for_copy() is executed failed and the grant ref is not ended.
>>         So GTF_reading bit cannot be cleared.
>> 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is 
>> still set.
> 
> If that was the case, this would be a hypervisor bug: a grant copy
> operation is supposed to hold the grant active only for as long as
> the copy operation takes. You'll in particular notice that
> __acquire_grant_for_copy() in its error path clears GTF_reading
> (and GTF_writing, as appropriate) again. You'd likely need to
> instrument the code to demonstrate (via a couple of extra log
> messages) what you think is not working properly here.

I have proved that the GTF_reading or GTF_writing is surely cleared after 
__gnttab_copy().
So the question is where the GTF_reading is set.
Is hypervisor doing a grant copy operation while VM netfront calling 
xennet_tx_buf_gc()?

Any ideas?
> 
> Jan
> 
>> On 2013/10/17 16:00, Jan Beulich wrote:
>>>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>>>> But there may be still concurrency problems in my test.
>>>> If the page replacing in copy_pending_req() was done after 
>>>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly 
>>>> marked 
>>>> with GNTCOPY_source_gref.
>>>> Here the memory of that page in skb has been replaced with Dom0 local 
>>>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in 
>>>> netbk_rx_actions() will get errors.
>>>> The messages is shown as:
>>>>
>>>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
>>>>
>>>> Would you like to share some opinions?
>>>
>>> At a first glance that seems possible, but the question is - does it
>>> cause any problems other than the quoted message to be issued
>>> (and the problematic packet getting re-transmitted)? I'm asking
>>> mainly because fixing this would appear to imply adding locking to
>>> these paths - with the risk of adversely affecting performance.
>>>
>>> Jan
>>>
>>>
>>>
> 
> 
> 
> 
> .
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.