[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying

At 10:50 +0100 on 14 May (1400061034), George Dunlap wrote:
> On Wed, May 14, 2014 at 10:48 AM, George Dunlap
> <George.Dunlap@xxxxxxxxxxxxx> wrote:
> > On Wed, May 14, 2014 at 10:16 AM, George Dunlap
> > <George.Dunlap@xxxxxxxxxxxxx> wrote:
> >> On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
> >> <juergen.gross@xxxxxxxxxxxxxx> wrote:
> >>> When a cpupool is destroyed just after the last domain has been stopped 
> >>> the
> >>> domain might already be removed from the domain list without being removed
> >>> from the cpupool.
> >>> It is easy to detect this situation and to return EAGAIN in this case 
> >>> which
> >>> is already handled in libxc by doing a retry.
> >>
> >> OK, I hate to be picky over two lines, but it still seems to me like
> >> this is papering over issues instead of dealing with them properly.
> >> The real problem here is that "for_each_domain_in_cpupool()" doesn't
> >> actually go over every domain in the cpupool.  Instead of making it so
> >> that it actually does, you're compensating for that fact in an ad-hoc
> >> fashion.
> >>
> >> Now as it happens, it looks like all the other current uses of
> >> for_each_domain_in_cpupool() work just fine if there are domains in
> >> the pool it doesn't see, as long as they're about to disappear.  But
> >> we've already seen a bug caused because of a situation where "don't
> >> see domains that are about to disappear" *does* actually cause a
> >> problem; working around it is just setting a trap for future
> >> developers to fall into.  (And who knows, there may already be a bug
> >> we haven't discovered in the other invocations of
> >> for_each_domain_in_cpupool()).
> >
> > Really this seems like a race in our rcu implementation wrt the domain
> > list.  It seems like ideally, if you grab the domlist_read_lock, you
> > should either get the domain on the list, or the domain off the list
> > *and* complete_domain_destroy() completed...

I don't think that's somethng that can be done with RCU.  The
guarantee you get as a reader is that if you _do_ see the domain on
the list, complete_domain_destroy() _hasn't_ been called (and in
particular the domain struct hasn't been freed).

To guarantee that any domain you _don't_ see _has_ been destroyed would
need a full mutex that the caller of complete_domain_destroy() could
hold to exclude you.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.