[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen, What is supposed to happen if a domain is in cpupool0, and then all of the cpus are taken out of cpupool0? Is that possible? It looks like there's code in cpupools.c:cpupool_unassign_cpu() which will move all VMs in a cpupool to cpupool0 before removing the last cpu. But what happens if cpupool0 is the pool that has become empty? It seems like that breaks a lot of the assumptions; e.g., sched_move_domain() seems to assume that the pool we're moving a VM to actually has cpus. While we're at it, what's with the "(cpu != cpu_moving_cpu)" in the first half of cpupool_unassign_cpu()? Under what conditions are you anticipating cpupool_unassign_cpu() being called a second time before the first completes? If you have to abort the move because schedule_cpu_switch() failed, wouldn't it be better just to roll the whole transaction back, rather than leaving it hanging in the middle? Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0? What could possibly be the use of grabbing a random cpupool and then trying to remove the specified cpu from it? Andre, you might think about folding the attached patch into your debug patch. -George On Mon, Feb 7, 2011 at 1:32 PM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote: > On 02/07/11 13:38, Andre Przywara wrote: >> >> Juergen, >> >> as promised some more debug data. This is from c/s 22858 with Stephans >> debug patch (attached). >> We get the following dump when the hypervisor crashes, note that the >> first lock is different from the second and subsequent ones: >> >> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock: >> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6 >> sdom->weight: 256 >> >> .... >> >> Hope that gives you an idea. I attach the whole log for your reference. > > Hmm, could it be your log wasn't created with the attached patch? I'm > missing > Dom-Id and VCPU from the printk() above, which would be interesting (at > least > I hope so)... > Additionally printing the local pcpu number would help, too. > And could you add a printk for the new prv address in csched_init()? > > It would be nice if you could enable cpupool diag output. Please use the > attached patch (includes the previous patch for executing the cpu move on > the > cpu to be moved, plus some diag printk corrections). > > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: > juergen.gross@xxxxxxxxxxxxxx > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > > Attachment:
cpupools-bug-on-move-to-self.diff _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |