|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback BUG_ON when using copy_skb=1
On 2013/10/17 20:11, Jan Beulich wrote:
>>>> On 17.10.13 at 12:26, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>> Hi Jan,
>
> please don't top post.
>
>> In my test, the grant table copy error may cause that VM crash.
>> The stack is as follows:
>> kernel BUG at /linux/driver/redhat6.2/xen-vnif/xen-netfront.c:372!
>> ...
>> The BUG code in xen-netfront.c xennet_tx_buf_gc() is:
>> if (unlikely(gnttab_query_foreign_access(
>> np->grant_tx_ref[id]) != 0)) {
>> printk(KERN_ALERT "xennet_tx_buf_gc: warning "
>> "-- grant still in use by backend "
>> "domain.\n");
>> BUG();
>>
>> In my guess the reason may be as follows:
>> 1) XEN: The function _set_status() called in hypercall __gnttab_copy() and
>> __acquire_grant_for_copy() is executed failed and the grant ref is not ended.
>> So GTF_reading bit cannot be cleared.
>> 2) Netfront: this module invokes a BUG when it checks the GTF_reading bit is
>> still set.
>
> If that was the case, this would be a hypervisor bug: a grant copy
> operation is supposed to hold the grant active only for as long as
> the copy operation takes. You'll in particular notice that
> __acquire_grant_for_copy() in its error path clears GTF_reading
> (and GTF_writing, as appropriate) again. You'd likely need to
> instrument the code to demonstrate (via a couple of extra log
> messages) what you think is not working properly here.
I have proved that the GTF_reading or GTF_writing is surely cleared after
__gnttab_copy().
So the question is where the GTF_reading is set.
Is hypervisor doing a grant copy operation while VM netfront calling
xennet_tx_buf_gc()?
Any ideas?
>
> Jan
>
>> On 2013/10/17 16:00, Jan Beulich wrote:
>>>>>> On 17.10.13 at 09:41, jerry <jerry.lilijun@xxxxxxxxxx> wrote:
>>>> But there may be still concurrency problems in my test.
>>>> If the page replacing in copy_pending_req() was done after
>>>> netif_get_page_ext() in netbk_gop_frag(), copy_gop->flags is wrongly
>>>> marked
>>>> with GNTCOPY_source_gref.
>>>> Here the memory of that page in skb has been replaced with Dom0 local
>>>> memory, so the later HYPERVISOR_multicall() with GNTTABOP_copy in
>>>> netbk_rx_actions() will get errors.
>>>> The messages is shown as:
>>>>
>>>> (XEN) grant_table.c:305:d0 Bad flags (0) or dom (0). (expected dom 0)
>>>>
>>>> Would you like to share some opinions?
>>>
>>> At a first glance that seems possible, but the question is - does it
>>> cause any problems other than the quoted message to be issued
>>> (and the problematic packet getting re-transmitted)? I'm asking
>>> mainly because fixing this would appear to imply adding locking to
>>> these paths - with the risk of adversely affecting performance.
>>>
>>> Jan
>>>
>>>
>>>
>
>
>
>
> .
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |