[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL
>>> On 14.03.11 at 11:02, Tim Deegan <Tim.Deegan@xxxxxxxxxx> wrote: > At 17:51 +0000 on 11 Mar (1299865912), Ian Jackson wrote: >> Mar 11 13:46:58.154777 (XEN) Xen call trace: >> Mar 11 13:46:58.154798 (XEN) [<ffff82c480100140>] __bitmap_empty+0x0/0x7f >> Mar 11 13:46:58.163767 (XEN) [<ffff82c480119582>] csched_cpu_pick+0xe/0x10 >> Mar 11 13:46:58.163802 (XEN) [<ffff82c480122c8d>] vcpu_migrate+0xfb/0x230 >> Mar 11 13:46:58.178768 (XEN) [<ffff82c480122e24>] context_saved+0x62/0x7b >> Mar 11 13:46:58.178799 (XEN) [<ffff82c480157f17>] > context_switch+0xd98/0xdca >> Mar 11 13:46:58.183766 (XEN) [<ffff82c4801226b4>] schedule+0x5fc/0x624 >> Mar 11 13:46:58.183795 (XEN) [<ffff82c480123837>] __do_softirq+0x88/0x99 >> Mar 11 13:46:58.198784 (XEN) [<ffff82c4801238b2>] do_softirq+0x6a/0x7a > > I think this hang comes because although this code: > > cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers); > if ( commit ) > CSCHED_PCPU(nxt)->idle_bias = cpu; > cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu)); > > removes the new cpu and its siblings from cpus, cpu isn't guaranteed to > have been in cpus in the first place, and none of its siblings are > either since nxt might not be its sibling. I had originally spent quite a while to verify that the loop this is in can't be infinite (i.e. there's going to be always at least one bit removed from "cpus"), and did so again during the last half hour or so. I'm certain (hardened also by the CPU masks we see on the stack) that it's not this function itself that's looping infinitely, but rather its caller (see my other reply sent just a few minutes ago). > Possible fix: > > diff -r b9a5d116102d xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Thu Mar 10 13:06:52 2011 +0000 > +++ b/xen/common/sched_credit.c Mon Mar 14 09:25:07 2011 +0000 > @@ -533,7 +533,7 @@ _csched_cpu_pick(const struct scheduler > cpu = cycle_cpu(CSCHED_PCPU(nxt)->idle_bias, nxt_idlers); > if ( commit ) > CSCHED_PCPU(nxt)->idle_bias = cpu; > - cpus_andnot(cpus, cpus, per_cpu(cpu_sibling_map, cpu)); > + cpus_andnot(cpus, cpus, nxt_idlers); > } > else > { > > which guarantees that nxt will be removed from cpus, though I suspect > this means that we might not pick the best HT pair in a particular core. > Scheduler code is twisty and hurts my brain so I'd like George's > opinion before checking anything in. No - that was precisely done the opposite direction to get better symmetry of load across all CPUs. With what you propose, idle_bias would become meaningless. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |