[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] PV-shim 4.13 assertion failures during vcpu_wake()
On Tue, Oct 22, 2019 at 01:01:09PM +0200, Jürgen Groß wrote: > On 22.10.19 12:52, Roger Pau Monné wrote: > > On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote: > > > On 21.10.19 11:51, Sergey Dyasli wrote: > > > > Hello, > > > > > > > > While testing pv-shim from a snapshot of staging 4.13 branch (with core- > > > > scheduling patches applied), some sort of scheduling issues were > > > > uncovered > > > > which usually leads to a guest lockup (sometimes with soft lockup > > > > messages > > > > from Linux kernel). > > > > > > > > This happens more frequently on SandyBridge CPUs. After enabling > > > > CONFIG_DEBUG in pv-shim, the following assertions failed: > > > > > > > > Null scheduler: > > > > > > > > Assertion 'lock == > > > > get_sched_res(i->res->master_cpu)->schedule_lock' failed at > > > > ...are/xen-dir/xen-root/xen/include/xen/sched-if.h:278 > > > > (full crash log: https://paste.debian.net/1108861/ ) > > > > > > > > Credit1 scheduler: > > > > > > > > Assertion 'cpumask_cycle(cpu, unit->cpu_hard_affinity) == cpu' > > > > failed at sched_credit.c:383 > > > > (full crash log: https://paste.debian.net/1108862/ ) > > > > > > > > I'm currently investigation those, but would appreciate any help or > > > > suggestions. > > > > > > And now a more sane patch to try. > > > > > > > > > Juergen > > > > > > > > From 205b7622b84bc678f8a0d6ac121dff14439fe331 Mon Sep 17 00:00:00 2001 > > > From: Juergen Gross <jgross@xxxxxxxx> > > > To: xen-devel@xxxxxxxxxxxxxxxxxxxx > > > Cc: Jan Beulich <jbeulich@xxxxxxxx> > > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > > Cc: Wei Liu <wl@xxxxxxx> > > > Cc: "Roger Pau Monné" <roger.pau@xxxxxxxxxx> > > > Date: Tue, 22 Oct 2019 11:14:08 +0200 > > > Subject: [PATCH] xen/pvhsim: fix cpu onlining > > > > > > Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor") > > > the initial processor for all pv-shim vcpus will be 0, as no other cpus > > > are online when the vcpus are created. Before that commit the vcpus > > > would have processors set not being online yet, which worked just by > > > chance. So all vCPUs for the shim have their hard affinity set to pCPU#0 if I understand it correctly. From my reading of sched_setup_dom0_vcpus it seems like in the shim case all sched units are pinned to their id, which would imply sched units != 0 are not pinned to CPU#0? Or maybe there's only one sched unit that contains all the shim vCPUs? > > > When the pv-shim vcpu becomes active it will have a hard affinity > > > not matching its initial processor assignment leading to failing > > > ASSERT()s or other problems depending on the selected scheduler. > > > > I'm slightly lost here, who has set this hard affinity on the pvshim > > vCPUs? > > That is done in sched_setup_dom0_vcpus(). > > > > > > Fix that by redoing the affinity setting after onlining the cpu but > > > before taking the vcpu up. > > > > The change seems fine to me, but I don't understand why the lack of > > this can cause asserts to trigger, as reported by Sergey. I also > > wonder why a change to pin vCPU#0 to pCPU#0 is not required, because > > pv_shim_cpu_up is only used for APs. > > When vcpu 0 is being created pcpu 0 is online already. So the affinity > set in sched_setup_dom0_vcpus() is fine in that case. IIRC all shim vCPUs where pinned to their identity pCPU at creation, and there was no need to do this pining when the vCPU is brought online. I guess this is no longer possible. What is not clear to me is why having all vCPUs pinned to pCPU#0 can lead to assertions triggering, such scenario is not desirable, but shouldn't lead to crashes. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |