[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Null scheduler and vwfi native problem
On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote: > On 1/25/21 5:11 PM, Dario Faggioli wrote: > > On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote: > > > Hi Anders, > > > > > > On 22/01/2021 08:06, Anders Törnqvist wrote: > > > > On 1/22/21 12:35 AM, Dario Faggioli wrote: > > > > > On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote: > > > > - booting with "sched=null vwfi=native" but not doing the IRQ > > > > passthrough that you mentioned above > > > > "xl destroy" gives > > > > (XEN) End of domain_destroy function > > > > > > > > Then a "xl create" says nothing but the domain has not started > > > > correct. > > > > "xl list" look like this for the domain: > > > > mydomu 2 512 1 ------ > > > > 0.0 > > > This is odd. I would have expected ``xl create`` to fail if > > > something > > > went wrong with the domain creation. > > > > > So, Anders, would it be possible to issue a: > > > > # xl debug-keys r > > # xl dmesg > > > > And send it to us ? > > > > Ideally, you'd do it: > > - with Julien's patch (the one he sent the other day, and that > > you > > have already given a try to) applied > > - while you are in the state above, i.e., after having tried to > > destroy a domain and failing > > - and maybe again after having tried to start a new domain > Here are some logs. > Great, thanks a lot! > The system is booted as before with the patch and the domu config > does > not have the IRQs. > Ok. > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r----- > 820.1 > mydomu 1 511 1 r----- > 157.0 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=191793008000 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > (XEN) Waitqueue: > So far, so good. All vCPUs are running on their assigned pCPU, and there is no vCPU wanting to run but not having a vCPU where to do so. > (XEN) Command line: console=dtuart dtuart=/serial@5a060000 > dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin > sched=null vwfi=native > Oh, just as a side note (and most likely unrelated to the problem we're discussing), you should be able to get rid of dom0_vcpus_pin. The NULL scheduler will do something similar to what that option itself does anyway. And with the benefit that, if you want, you can actually change to what pCPUs the dom0's vCPU are pinned. While, if you use dom0_vcpus_pin, you can't. So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). > # xl destroy mydomu > (XEN) End of domain_destroy function > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r----- > 1057.9 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=223871439875 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > Right. And from the fact that: 1) we only see the "End of domain_destroy function" line in the logs, and 2) we see that the vCPU is still listed here, we have our confirmation (like there wase the need for it :-/) that domain destruction is done only partially. > # xl create mydomu.cfg > Parsing config from mydomu.cfg > (XEN) Power on resource 215 > > # xl list > Name ID Mem VCPUs State > Time(s) > Domain-0 0 3000 5 r----- > 1152.1 > mydomu 2 512 1 ------ > 0.0 > > # xl debug-keys r > (XEN) sched_smt_power_savings: disabled > (XEN) NOW=241210530250 > (XEN) Online Cpus: 0-5 > (XEN) Cpupool 0: > (XEN) Cpus: 0-5 > (XEN) Scheduler: null Scheduler (null) > (XEN) cpus_free = > (XEN) Domain info: > (XEN) Domain: 0 > (XEN) 1: [0.0] pcpu=0 > (XEN) 2: [0.1] pcpu=1 > (XEN) 3: [0.2] pcpu=2 > (XEN) 4: [0.3] pcpu=3 > (XEN) 5: [0.4] pcpu=4 > (XEN) Domain: 1 > (XEN) 6: [1.0] pcpu=5 > (XEN) Domain: 2 > (XEN) 7: [2.0] pcpu=-1 > (XEN) Waitqueue: d2v0 > Yep, so, as we were suspecting, domain 1 was not destroyed properly. Specifically, we did not get to the point where the vCPU is deallocated and the pCPU to which such vCPU has been assigned to by the NULL scheduler is released. This means that the new vCPU (i.e., d2v0) has, from the point of view of the NULL scheduler, no pCPU where to run. And it's therefore parked in the waitqueue. There should be a warning about that, which I don't see... but perhaps I'm just misremembering. Anyway, cool, this makes things even more clear. Thanks again for letting us see these logs. -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere) Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |