[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen crashing when killing a domain with no VCPUs allocated
On 21/07/14 11:33, George Dunlap wrote: > On 07/18/2014 09:26 PM, Julien Grall wrote: >> >> On 18/07/14 17:39, Ian Campbell wrote: >>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote: >>>> Hi all, >>>> >>>> I've been played with the function alloc_vcpu on ARM. And I hit one >>>> case >>>> where this function can failed. >>>> >>>> During domain creation, the toolstack will call DOMCTL_max_vcpus >>>> which may >>>> fail, for instance because alloc_vcpu didn't succeed. In this case, >>>> the >>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack >>>> trace. >>>> >>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by >>>> returning >>>> in an error in vcpu_initialize. >>>> >>>> I'm not sure how to correctly fix it. >>> I think a simple check at the head of the function would be ok. >>> >>> Alternatively perhaps in sched_mode_domain, which could either detect >>> this or could detect a domain in pool0 being moved to pool0 and short >>> circuit. >> I was thinking about the small fix below. If it's fine for everyone, >> I can >> send a patch next week. >> >> diff --git a/xen/common/schedule.c b/xen/common/schedule.c >> index e9eb0bc..c44d047 100644 >> --- a/xen/common/schedule.c >> +++ b/xen/common/schedule.c >> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct >> cpupool *c) >> } >> /* Do we have vcpus already? If not, no need to update >> node-affinity */ >> - if ( d->vcpu ) >> + if ( d->vcpu && d->vcpu[0] != NULL ) >> domain_update_node_affinity(d); > > So is the problem that we're allocating the vcpu array area, but not > putting any vcpus in it? The problem (as I recall) was that domain_create() got midway through and alloc_vcpu(0) failed with -ENOMEM. Following that failure, the toolstack called domain_destroy(). Having d->vcpu properly allocated and containing fully NULL pointers is a valid position to be in, especial in error or teardown paths. > > Overall it seems like those checks for the existence of cpus should be > moved into domain_update_node_affinity(). The ASSERT() there I think > is just a sanity check to make sure we're not getting a ridiculous > result out of our calculation; but of course if there actually are no > vcpus, it's not ridiculous at all. > > One solution might be to change the ASSERT to > ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then > we could probably even remove the d->vcpu conditional when calling it. If you were going along this line, the pointer checks are substantially less expensive than cpumask_empty(), so the ||'s should be reordered. However, I am not convinced that it is necessarily the best solution, given my previous observation. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |