[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb



On Thu, Jun 19, 2014 at 03:35:11PM +0100, David Vrabel wrote:
> On 19/06/14 15:12, Wei Liu wrote:
> > On Wed, Jun 18, 2014 at 06:48:31PM +0200, Philipp Hahn wrote:
> > [...]
> >>
> >> (gdb) list *(xen_netbk_rx_action+0x18b)
> >> 0xffffffffa04287dc is in xen_netbk_rx_action
> >> (/var/build/temp/tmp.hW3dNilayw/pbuilder/linux-3.10.11/drivers/net/xen-netback/netback
> >> .c:611).
> >> 606                     meta->gso_size = skb_shinfo(skb)->gso_size;
> >> 607             else
> >> 608                     meta->gso_size = 0;
> >> 609
> >> 610             meta->size = 0;
> >> 611             meta->id = req->id;
> >> 612             npo->copy_off = 0;
> >> 613             npo->copy_gref = req->gref;
> >> 614
> >> 615             data = skb->data;
> >>
> >>
> >> After more debugging today I think something like this happens:
> >>
> >> 1. The VM is receiving packets through bonding + bridge + netback +
> >> netfront.
> >>
> >> 2. For some unknown reason at least one packet remains in the rx queue
> >> and is not delivered to the domU immediately by netback.
> >>
> >> 3. The VM finishes shutting down.
> >>
> >> 4. The shared ring between dom0 and domU is freed.
> >>
> >> 5. then xen-netback continues processing the pending requests and tries
> >> to put the packet into the now already released shared ring.
> >>
> >>
> >> >From reading the attached disassembly I guess, that
> >>  AX = &meta
> >>  CX = &rx->string
> >>  DX =~ rx.req_cons
> >>  CR2 = &req->id
> >> where
> >>  CX + DX * sizeof(union struct xen_netif_rx_{request,response})=8 = CR2
> >>
> >>
> >> Any additional ideas or insight is appreciated.
> >>
> > 
> > I think your analysis makes sense. Netback does have it's internal queue
> > and kthread can certainly be scheduled away. There doesn't seem to be a
> > synchronisation point between a vif getting disconnet and internal queue
> > gets processed. I attach a quick hack. If it does work to a degree then
> > we can try to work out a proper fix.
> 
> The kthread_stop() in xenvif_disconnect() waits for the kthread to exit
> so I don't see how Philipp's analysis can be right.
> 

He's using 3.10 kernel. One kthread serves many vifs. The kthread won't
stop.

Wei.

> David

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.