[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback Oops then xenwatch stuck in D state
>>> On 14.02.13 at 12:20, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: > On Thu, 2013-02-14 at 09:11 +0000, Jan Beulich wrote: >> >>> On 13.02.13 at 21:17, Wei Liu <wei.liu2@xxxxxxxxxx> wrote: >> > On Wed, Feb 13, 2013 at 07:20:32PM +0000, David Vrabel wrote: >> >> On 13/02/13 18:37, Wei Liu wrote: >> >> > A slightly upgraded version of the *UNTESTED* patch. >> >> > >> >> > >> >> > Wei. >> >> > >> >> > ----8<---- >> >> > commit df4c929d034cec7043fbd96ba89833eb639c336e >> >> > Author: Wei Liu <wei.liu2@xxxxxxxxxx> >> >> > Date: Wed Feb 13 18:17:01 2013 +0000 >> >> > >> >> > netback: fix netbk_count_requests >> >> > >> >> > There are two paths in the original code, a) test against >> >> > work_to_do, >> > b) test >> >> > against first->size, could return 0 even when error happens. >> >> > >> >> > Simply return -1 in error paths should work. Modify all error paths >> >> > to > >> > return >> >> > -1 to be consistent. >> >> >> >> You also need to remove the netbk_tx_err() after checking the result of >> >> netbk_count_requests(). Otherwise you will have a double xenvif_put(), >> >> which will screw up ref counting. >> > >> > I just realized that we were talking about different code path when I >> > walked home. >> > >> > The path you mentioned is correct. As excution flow should never reach >> > there if there is error in netbk_count_requests. >> > >> > The path I'm not sure is that in the netbk_fatal_tx_err, it calls >> > xenvif_carrier_off which calls xenvif_put, and then it calls xenvif_put, >> > which is likely to mess up the refcount. >> >> I first thought so too when looking at this again yesterday, but >> then convinced myself that this double put _here_ is correct. And >> Ian's patch specifically removed to call to netbk_tx_err() after the >> caller netbk_count_requests() recognized the error, knowing that >> the latter function now itself issues a call to netbk_fatal_tx_err() >> (whether that wouldn't better have replaced the caller's call to >> netbk_tx_err() is a different question). > > I'm not convinced. netbk_tx_err only has one xenvif_put, however > netbk_fatal_tx_err has two puts. Sure - one for the current transfer, and one for the carrier being on. That second one would otherwise be put when the interface gets brought down "normally". > If this is a bug, and, if my previous patch fixes Christopher's OOPS, he > will hit this bug soon when shutting down DomU. I don't think this patch will fix his problems, which - as described yesterday - I'm relatively certain result from the harsh action netbk_fatal_tx_err() does. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |