[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netback Oops then xenwatch stuck in D state



On Thu, 2013-02-14 at 09:11 +0000, Jan Beulich wrote:
> >>> On 13.02.13 at 21:17, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
> > On Wed, Feb 13, 2013 at 07:20:32PM +0000, David Vrabel wrote:
> >> On 13/02/13 18:37, Wei Liu wrote:
> >> > A slightly upgraded version of the *UNTESTED* patch.
> >> > 
> >> > 
> >> > Wei.
> >> > 
> >> > ----8<----
> >> > commit df4c929d034cec7043fbd96ba89833eb639c336e
> >> > Author: Wei Liu <wei.liu2@xxxxxxxxxx>
> >> > Date:   Wed Feb 13 18:17:01 2013 +0000
> >> > 
> >> >     netback: fix netbk_count_requests
> >> >     
> >> >     There are two paths in the original code, a) test against 
> >> > work_to_do, 
> > b) test
> >> >     against first->size, could return 0 even when error happens.
> >> >     
> >> >     Simply return -1 in error paths should work. Modify all error paths 
> >> > to 
> > return
> >> >     -1 to be consistent.
> >> 
> >> You also need to remove the netbk_tx_err() after checking the result of
> >> netbk_count_requests().  Otherwise you will have a double xenvif_put(),
> >> which will screw up ref counting.
> > 
> > I just realized that we were talking about different code path when I
> > walked home.
> > 
> > The path you mentioned is correct. As excution flow should never reach
> > there if there is error in netbk_count_requests.
> > 
> > The path I'm not sure is that in the netbk_fatal_tx_err, it calls
> > xenvif_carrier_off which calls xenvif_put, and then it calls xenvif_put,
> > which is likely to mess up the refcount.
> 
> I first thought so too when looking at this again yesterday, but
> then convinced myself that this double put _here_ is correct. And
> Ian's patch specifically removed to call to netbk_tx_err() after the
> caller netbk_count_requests() recognized the error, knowing that
> the latter function now itself issues a call to netbk_fatal_tx_err()
> (whether that wouldn't better have replaced the caller's call to
> netbk_tx_err() is a different question).
> 

I'm not convinced. netbk_tx_err only has one xenvif_put, however
netbk_fatal_tx_err has two puts.

If this is a bug, and, if my previous patch fixes Christopher's OOPS, he
will hit this bug soon when shutting down DomU.


Wei.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.