[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels



> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-
> bounces@xxxxxxxxxxxxx] On Behalf Of Wei Liu
> Sent: 27 November 2013 14:41
> To: Tomasz Wroblewski
> Cc: xen-devel; Paul Durrant; Wei Liu; David Vrabel
> Subject: Re: [Xen-devel] Netback vif reference count mismatching in latest
> 3.11 kernels
> 
> On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote:
> > Hi,
> >
> > After update of our network backend vm kernel to 3.11.9 I'm seeing
> > trouble with netback vif close which seem related to the recent
> > changes which separated vif disconnect and free; It seems that now
> > multiple disconnect/connect cycles can happen without freeing and
> > reallocing the netdev in the processes, which confuses the vif
> > refcount.
> >
> > vif refcount is initialized to 1 in xenvif_alloc. Then first
> > xenvif_disconnect brings it back to 0, instead of 1 which would seem
> > more reasonable (since its initialized to 1 in xenvif_alloc i would
> > expect it to not be dropped to 0 until xenvif_free). Second
> > xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT)
> > this happens when we hibernate linux guest, since linux hibernate is
> > a complex beast which transitions the drivers to between
> > close/connected states multiple times (i.e. first it suspends/closes
> > the drivers to take memory snapshot, then resumes/reconnects the
> > drivers to the actual writing of hibernate image to disk, then
> > finally it closes them again to shutdown the system)
> >
> 
> Can you illustrate a graph of the whole process? I'm not very clear of
> the whole cycle.
> 
> There's a xenvif_get in xenvif_connect, which increases refcnt by 1,
> that should corresponds to the atomic_dec in xenvif_disconnect, right?
> 

ISTR this happening with some frontends that did not drive the state model 
entirely correctly. Commit ea732dff5cfa10789007bf4a5b935388a0bb2a8f (Handle 
backend state transitions in a more robust way ) should fix this.

  Paul

> > I've hacked the attached patch which fixes it (for us), is the approach 
> > taken
> there correct/upstreamable/reasonable? It does the following
> >
> >     * reset tx_irq to 0 after unbinding the irqs on disconnect -
> > because xenvif_connect tests for it being 0 and will not reconnect
> > if it's not reset
> >     * reacquire one reference to vif in disconnect(). This is because the
> reference
> >       vif should be 1, as initialized in xenvif_alloc(), until the vif is 
> > freed.
> Otherwise multiple disconne
> >       and cause a hang. I imagine alternate way of fixing this could be to 
> > use
> "0" as the default
> >       refcnt in xenvif_alloc()
> >
> 
> You mean the numbers of connect's and disconnect's don't match? Even
> after you reset tx_irq to 0?
> 
> Wei.
> 
> > I believe we didn't experience this issue on previous kernel because
> > vif disconnect was also freeing the vif and netdev, hence it was not
> > possible to get xenvif_connect/xenvif_disconnect called multiple
> > times between vif alloc/free.
> >
> >
> >
> >
> >
> >
> 
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> > index 68d5102..ccb46c4 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -420,6 +420,8 @@ void xenvif_disconnect(struct xenvif *vif)
> >
> >     atomic_dec(&vif->refcnt);
> >     wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0);
> > +   /* reacquire reference since it should be 1 until freed */
> > +   xenvif_get(vif);
> >
> >     if (vif->tx_irq) {
> >             if (vif->tx_irq == vif->rx_irq)
> > @@ -428,6 +430,7 @@ void xenvif_disconnect(struct xenvif *vif)
> >                     unbind_from_irqhandler(vif->tx_irq, vif);
> >                     unbind_from_irqhandler(vif->rx_irq, vif);
> >             }
> > +           vif->tx_irq = 0;
> >     }
> >
> >     xen_netbk_unmap_frontend_rings(vif);
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.