[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Null scheduler and vwfi native problem
On 1/26/21 11:31 PM, Dario Faggioli wrote: On Tue, 2021-01-26 at 18:03 +0100, Anders Törnqvist wrote:On 1/25/21 5:11 PM, Dario Faggioli wrote:On Fri, 2021-01-22 at 14:26 +0000, Julien Grall wrote:Hi Anders, On 22/01/2021 08:06, Anders Törnqvist wrote:On 1/22/21 12:35 AM, Dario Faggioli wrote:On Thu, 2021-01-21 at 19:40 +0000, Julien Grall wrote:- booting with "sched=null vwfi=native" but not doing the IRQ passthrough that you mentioned above "xl destroy" gives (XEN) End of domain_destroy function Then a "xl create" says nothing but the domain has not started correct. "xl list" look like this for the domain: mydomu 2 512 1 ------ 0.0This is odd. I would have expected ``xl create`` to fail if something went wrong with the domain creation.So, Anders, would it be possible to issue a: # xl debug-keys r # xl dmesg And send it to us ? Ideally, you'd do it: - with Julien's patch (the one he sent the other day, and that you have already given a try to) applied - while you are in the state above, i.e., after having tried to destroy a domain and failing - and maybe again after having tried to start a new domainHere are some logs.Great, thanks a lot!The system is booted as before with the patch and the domu config does not have the IRQs.Ok.# xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r----- 820.1 mydomu 1 511 1 r----- 157.0 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=191793008000 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 (XEN) Waitqueue:So far, so good. All vCPUs are running on their assigned pCPU, and there is no vCPU wanting to run but not having a vCPU where to do so.(XEN) Command line: console=dtuart dtuart=/serial@5a060000 dom0_mem=3000M dom0_max_vcpus=5 hmp-unsafe=true dom0_vcpus_pin sched=null vwfi=nativeOh, just as a side note (and most likely unrelated to the problem we're discussing), you should be able to get rid of dom0_vcpus_pin. The NULL scheduler will do something similar to what that option itself does anyway. And with the benefit that, if you want, you can actually change to what pCPUs the dom0's vCPU are pinned. While, if you use dom0_vcpus_pin, you can't. So it using it has only downsides (and that's true in general, if you ask me, but particularly so if using NULL). Thanks for the feedback.I removed dom0_vcpus_pin. And, as you said, it seems to be unrelated to the problem we're discussing. The system still behaves the same. When the dom0_vcpus_pin is removed. xl vcpu-list looks like this:Name ID VCPU CPU State Time(s) Affinity (Hard / Soft) Domain-0 0 0 0 r-- 29.4 all / all Domain-0 0 1 1 r-- 28.7 all / all Domain-0 0 2 2 r-- 28.7 all / all Domain-0 0 3 3 r-- 28.6 all / all Domain-0 0 4 4 r-- 28.6 all / all mydomu 1 0 5 r-- 21.6 5 / allFrom this listing (with "all" as hard affinity for dom0) one might read it like dom0 is not pinned with hard affinity to any specific pCPUs at all but mudomu is pinned to pCPU 5. Will the dom0_max_vcpus=5 in this case guarantee that dom0 only will run on pCPU 0-4 so that mydomu always will have pCPU 5 for itself only? What if I would like mydomu to be th only domain that uses pCPU 2? # xl destroy mydomu (XEN) End of domain_destroy function # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r----- 1057.9 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=223871439875 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5Right. And from the fact that: 1) we only see the "End of domain_destroy function" line in the logs, and 2) we see that the vCPU is still listed here, we have our confirmation (like there wase the need for it :-/) that domain destruction is done only partially. Yes it looks like that. # xl create mydomu.cfg Parsing config from mydomu.cfg (XEN) Power on resource 215 # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 3000 5 r----- 1152.1 mydomu 2 512 1 ------ 0.0 # xl debug-keys r (XEN) sched_smt_power_savings: disabled (XEN) NOW=241210530250 (XEN) Online Cpus: 0-5 (XEN) Cpupool 0: (XEN) Cpus: 0-5 (XEN) Scheduler: null Scheduler (null) (XEN) cpus_free = (XEN) Domain info: (XEN) Domain: 0 (XEN) 1: [0.0] pcpu=0 (XEN) 2: [0.1] pcpu=1 (XEN) 3: [0.2] pcpu=2 (XEN) 4: [0.3] pcpu=3 (XEN) 5: [0.4] pcpu=4 (XEN) Domain: 1 (XEN) 6: [1.0] pcpu=5 (XEN) Domain: 2 (XEN) 7: [2.0] pcpu=-1 (XEN) Waitqueue: d2v0Yep, so, as we were suspecting, domain 1 was not destroyed properly. Specifically, we did not get to the point where the vCPU is deallocated and the pCPU to which such vCPU has been assigned to by the NULL scheduler is released. This means that the new vCPU (i.e., d2v0) has, from the point of view of the NULL scheduler, no pCPU where to run. And it's therefore parked in the waitqueue. There should be a warning about that, which I don't see... but perhaps I'm just misremembering. Anyway, cool, this makes things even more clear. Thanks again for letting us see these logs. Thanks for the attention to this :-) Any ideas for how to solve it?
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |