Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying

On Wed, May 14, 2014 at 10:16 AM, George Dunlap
wrote:
On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
wrote:
>> When a cpupool is destroyed just after the last domain has been stopped the
>> domain might already be removed from the domain list without being removed
>> from the cpupool.
>> It is easy to detect this situation and to return EAGAIN in this case which
>> is already handled in libxc by doing a retry.
> OK, I hate to be picky over two lines, but it still seems to me like
> this is papering over issues instead of dealing with them properly.
> The real problem here is that "for_each_domain_in_cpupool()" doesn't
> actually go over every domain in the cpupool.  Instead of making it so
> that it actually does, you're compensating for that fact in an ad-hoc
> fashion.
> Now as it happens, it looks like all the other current uses of
> for_each_domain_in_cpupool() work just fine if there are domains in
> the pool it doesn't see, as long as they're about to disappear.  But
> we've already seen a bug caused because of a situation where "don't
> see domains that are about to disappear" *does* actually cause a
> problem; working around it is just setting a trap for future
> developers to fall into.  (And who knows, there may already be a bug
> we haven't discovered in the other invocations of
> for_each_domain_in_cpupool()).

Really this seems like a race in our rcu implementation wrt the domain
list.  It seems like ideally, if you grab the domlist_read_lock, you
should either get the domain on the list, or the domain off the list
*and* complete_domain_destroy() completed...


