[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] dom0less + sched=null => broken in staging
Hi, On 8/13/19 7:43 PM, Julien Grall wrote: > > > On 8/13/19 6:34 PM, Dario Faggioli wrote: >> On Tue, 2019-08-13 at 17:52 +0100, Julien Grall wrote: >>> Hi Dario, >>> >> Hello! >> >>> On 8/13/19 4:27 PM, Dario Faggioli wrote: >>>> On Fri, 2019-08-09 at 11:30 -0700, Stefano Stabellini wrote: >>>>> >>>> In my (x86 and "dom0full") testbox, this seems to come from >>>> domain_unpause_by_systemcontroller(dom0) called by >>>> xen/arch/x86/setup.c:init_done(), at the very end of __start_xen(). >>>> >>>> I don't know if domain construction in an ARM dom0less system works >>>> similarly, though. What we want, is someone calling either >>>> vcpu_wake() >>>> or vcpu_unpause(), after having cleared _VPF_down from pause_flags. >>> >>> Looking at create_domUs() there is a call to >>> domain_unpause_by_controller for each domUs. >>> >> Yes, I saw that. And I've seen the one done don dom0, at the end of >> xen/arch/arm/setup.c:start_xen(), as well. >> >> Also, both construct_dom0() (still from start_xen()) and >> construct_domU() (called from create_domUs()) call construct_domain(), >> which does clear_bit(_VPF_down), setting the domain to online. >> >> So, unless the flag gets cleared again, or something else happens that >> makes the vCPU(s) fail the vcpu_runnable() check in >> domain_unpause()->vcpu_wake(), I don't see why the wakeup that let the >> null scheduler start scheduling the vCPU doesn't happen... as it >> instead does on x86 or !dom0less ARM (because, as far as I've >> understood, it's only dom0less that doesn't work, it this correct?) > > Yes, I quickly tried to use NULL scheduler with just dom0 and it boots. > > Interestingly, I can't see the log: > > (XEN) Freed 328kB init memory. > > This is called as part of init_done before CPU0 goes into the idle loop. > > Adding more debug, it is getting stuck when calling > domain_unpause_by_controller for dom0. Specifically vcpu_wake on dom0v0. > > The loop to assign a pCPU in null_vcpu_wake() is turning into an > infinite loop. Indeed the loop is trying to pick CPU0 for dom0v0 that is > already used by dom1v0. So the problem is in pick_cpu() or the data used > by it. > > It feels to me this is an affinity problem. Note that I didn't request > to pin dom0 vCPUs. I did a bit more digging, as I pointed out before, pick_cpu() is returning pCPU0. This is because per_cpu(ncp, 0) == NULL. per_cpu(npc, 0) will be set by vcpu_assign(). AFAIU, the function is called during scheduling. As CPU0 is not able to serve softirq until it finishes to initialize, per_cpu(npc, 0) will still be NULL when trying to wake dom0v0. My knowledge of the scheduler is pretty limited, so I will leave to Dario and George suggesting a fix :). On a side note, I have tried to hack a bit the Dom0 vCPU allocation to see if I can help you to reproduce it on x86. But I stumbled across another error while bringing up d0v1: (XEN) Assertion 'lock == per_cpu(schedule_data, v->processor).schedule_lock' failed at /home/julieng/works/xen/xen/include/xen/sched-if.h:108 (XEN) ----[ Xen-4.13-unstable arm64 debug=y Not tainted ]---- (XEN) CPU: 0 [...] (XEN) Xen call trace: (XEN) [<00000000002251b8>] vcpu_wake+0x550/0x554 (PC) (XEN) [<0000000000224da4>] vcpu_wake+0x13c/0x554 (LR) (XEN) [<0000000000261624>] vpsci.c#do_common_cpu_on+0x134/0x1c4 (XEN) [<0000000000261a04>] do_vpsci_0_2_call+0x294/0x3d0 (XEN) [<00000000002612c0>] vsmc.c#vsmccc_handle_call+0x3a0/0x4b0 (XEN) [<0000000000261484>] do_trap_hvc_smccc+0x28/0x4c (XEN) [<0000000000257efc>] do_trap_guest_sync+0x508/0x5d8 (XEN) [<000000000026542c>] entry.o#guest_sync_slowpath+0x9c/0xcc (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion 'lock == per_cpu(schedule_data, v->processor).schedule_lock' failed at /home/julieng/works/xen/xen/include/xen/sched-*************************************** I only try to create all the vCPU to pCPU 0 with the following code: diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 4c8404155a..ce92e3841f 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -2004,7 +2004,7 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo) for ( i = 1, cpu = 0; i < d->max_vcpus; i++ ) { cpu = cpumask_cycle(cpu, &cpu_online_map); - if ( vcpu_create(d, i, cpu) == NULL ) + if ( vcpu_create(d, i, 0) == NULL ) { printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu); break; I am not entirely sure whether the problem is related. Anyway, I have wrote the following patch to reproduce on Arm without dom0less: diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index 4c8404155a..20246ae475 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -2004,7 +2004,7 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo) for ( i = 1, cpu = 0; i < d->max_vcpus; i++ ) { cpu = cpumask_cycle(cpu, &cpu_online_map); - if ( vcpu_create(d, i, cpu) == NULL ) + if ( vcpu_create(d, i, 0) == NULL ) { printk("Failed to allocate dom0 vcpu %d on pcpu %d\n", i, cpu); break; @@ -2019,6 +2019,10 @@ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo) v->is_initialised = 1; clear_bit(_VPF_down, &v->pause_flags); + v = d->vcpu[1]; + v->is_initialised = 1; + clear_bit(_VPF_down, &v->pause_flags); + return 0; } This could easily be adapt for x86 so you can reproduce it easily :). Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |