[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] move domain to cpupool0 before destroying it

>>> On 19.05.14 at 16:57, <dunlapg@xxxxxxxxx> wrote:
> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
> <juergen.gross@xxxxxxxxxxxxxx> wrote:
>> Currently when a domain is destroyed it is removed from the domain_list
>> before all of it's resources, including the cpupool membership, are freed.
>> This can lead to a situation where the domain is still member of a cpupool
>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>> able to find it any more. This in turn can result in rejection of removing
>> the last cpu from a cpupool, because there seems to be still a domain in
>> the cpupool, even if it can't be found by scanning through all domains.
>> This situation can be avoided by moving the domain to be destroyed to
>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>> the domain_list. As cpupool0 is always active and a domain without any 
> cpupool
>> membership is implicitly regarded as belonging to cpupool0, this poses no
>> problem.
> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
> moving the "cpupool_rm_domain()".
> The sched_move_domain() only happens in domain_kill(), which is only
> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
> other reason (i.e., crashes), that you may still have the same race?
> If not, then just this change alone should be sufficent.  If it does,
> then this change is redundant.

No, a crashed domain is merely being reported as crashed to the
tool stack. It's the tool stack to then actually invoke the killing of
it (or else e.g. "on_crash=preserve" would be rather hard to handle).

> Moving the cpupool_rm_domain() will change things so that there is now
> a period of time where the VM is not being listed as being in
> cpupool0's pool, but may still be in that pool's scheduler's list of
> domains.  Is that OK?  If it is OK, it seems like that change alone
> should be sufficient.

Moving this earlier was a requirement to avoid the race that the
earlier (much different) patch tried to address. Also I think the
patch's description already addresses that question (see the last
sentence of the quoted original mail contents above).

> I've been trying to trace through the twisty little passages of domain
> destruction, and I'm still not quite sure: would it be OK if we just
> called sched_move_domain() in domain_destroy() instead of calling
> cpupool_rm_domain()?

No, it would not, because then again we wouldn't be able to deal
with potential failure, needing re-invocation of the function.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.