[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying



On 05/14/2014 11:15 AM, Jan Beulich wrote:
On 14.05.14 at 11:56, <juergen.gross@xxxxxxxxxxxxxx> wrote:
On 14.05.2014 11:16, George Dunlap wrote:
On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
<juergen.gross@xxxxxxxxxxxxxx> wrote:
When a cpupool is destroyed just after the last domain has been stopped the
domain might already be removed from the domain list without being removed
from the cpupool.
It is easy to detect this situation and to return EAGAIN in this case which
is already handled in libxc by doing a retry.
OK, I hate to be picky over two lines, but it still seems to me like
this is papering over issues instead of dealing with them properly.
The real problem here is that "for_each_domain_in_cpupool()" doesn't
actually go over every domain in the cpupool.  Instead of making it so
that it actually does, you're compensating for that fact in an ad-hoc
fashion.

Now as it happens, it looks like all the other current uses of
for_each_domain_in_cpupool() work just fine if there are domains in
the pool it doesn't see, as long as they're about to disappear.  But
we've already seen a bug caused because of a situation where "don't
see domains that are about to disappear" *does* actually cause a
problem; working around it is just setting a trap for future
developers to fall into.  (And who knows, there may already be a bug
we haven't discovered in the other invocations of
for_each_domain_in_cpupool()).
This isn't unique to for_each_domain_in_cpupool(). It is a problem for all
uses of for_each_domain() which are related to resources freed only in
complete_domain_destroy().
Of which there shouldn't be that many, if any at all.

What prevents cpupool_rm_domain() getting moved from
complete_domain_destroy() to domain_destroy(), before the
domain gets taken off the list? I actually assume that there are
more things here that may not really need deferring until the
last possible moment...

Well the domain needs to be taken out of the scheduler for one, which is done by sched_destroy_domain(). That's where I stopped the "dependency tracking", as it seemed somewhat likely that somewhere there would be things that depended on having a valid scheduler. (That may not be correct, but it would take a careful reading to convince yourself it was true.)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.