[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Netback vif reference count mismatching in latest 3.11 kernels



On Wed, Nov 27, 2013 at 05:21:56PM +0100, Tomasz Wroblewski wrote:
> On 11/27/2013 03:41 PM, Wei Liu wrote:
> >On Wed, Nov 27, 2013 at 03:07:09PM +0100, Tomasz Wroblewski wrote:
> >>Hi,
> >>
> >>After update of our network backend vm kernel to 3.11.9 I'm seeing
> >>trouble with netback vif close which seem related to the recent
> >>changes which separated vif disconnect and free; It seems that now
> >>multiple disconnect/connect cycles can happen without freeing and
> >>reallocing the netdev in the processes, which confuses the vif
> >>refcount.
> >>
> >>vif refcount is initialized to 1 in xenvif_alloc. Then first
> >>xenvif_disconnect brings it back to 0, instead of 1 which would seem
> >>more reasonable (since its initialized to 1 in xenvif_alloc i would
> >>expect it to not be dropped to 0 until xenvif_free). Second
> >>xenvif_disconnect brings it to -1 and hangs. For us (xenclient XT)
> >>this happens when we hibernate linux guest, since linux hibernate is
> >>a complex beast which transitions the drivers to between
> >>close/connected states multiple times (i.e. first it suspends/closes
> >>the drivers to take memory snapshot, then resumes/reconnects the
> >>drivers to the actual writing of hibernate image to disk, then
> >>finally it closes them again to shutdown the system)
> >>
> >
> >Can you illustrate a graph of the whole process? I'm not very clear of
> >the whole cycle.
> >
> >There's a xenvif_get in xenvif_connect, which increases refcnt by 1,
> >that should corresponds to the atomic_dec in xenvif_disconnect, right?
> >
> 
> Not if I'm reading the code right. I think the atomic_dec in
> xenvif_disconnect corresponds to atomic_set(..., 1) in xenvif_alloc,
> and xenvif_get() in xenvif_connect corresponds to xenvif_put in
> xenvif_carrier_off() (which is called at top of xenvif_disconnect).
> 
> Therefore xenvif_disconnect() decrements the refcnt twice whilst
> xenvif_connect() increments it once, which results in negative
> refcnt after one cycle. The graph I'm seeing looks like this:
> 

OK, I get what you mean. The separation of vif disconnect and free is
indeed causing problem.

I think the right fix would be to make things symmetric again.

Does the following *untested* patch work for you? If so I will submit a
proper patch to netdev.

Wei.

---8<---
diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index d28324a..342d4e5 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -418,9 +418,6 @@ void xenvif_disconnect(struct xenvif *vif)
        if (netif_carrier_ok(vif->dev))
                xenvif_carrier_off(vif);
 
-       atomic_dec(&vif->refcnt);
-       wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0);
-
        if (vif->tx_irq) {
                if (vif->tx_irq == vif->rx_irq)
                        unbind_from_irqhandler(vif->tx_irq, vif);
@@ -428,6 +425,7 @@ void xenvif_disconnect(struct xenvif *vif)
                        unbind_from_irqhandler(vif->tx_irq, vif);
                        unbind_from_irqhandler(vif->rx_irq, vif);
                }
+               vif->tx_irq = 0;
        }
 
        xen_netbk_unmap_frontend_rings(vif);
@@ -435,6 +433,9 @@ void xenvif_disconnect(struct xenvif *vif)
 
 void xenvif_free(struct xenvif *vif)
 {
+       atomic_dec(&vif->refcnt);
+       wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0);
+
        unregister_netdev(vif->dev);
 
        free_netdev(vif->dev);

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.