[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle
> On Mar 12, 2020, at 1:55 PM, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx> wrote: > > On 12/03/2020 13:44, Dario Faggioli wrote: >> There have been report of stalls of guest vCPUs, when Credit2 was used. >> It seemed like these vCPUs were not getting scheduled for very long >> time, even under light load conditions (e.g., during dom0 boot). >> >> Investigations led to the discovery that --although rarely-- it can >> happen that a vCPU manages to run for very long timeslices. In Credit2, >> this means that, when runtime accounting happens, the vCPU will lose a >> large quantity of credits. This in turn may lead to the vCPU having less >> credits than the idle vCPUs (-2^30). At this point, the scheduler will >> pick the idle vCPU, instead of the ready to run vCPU, for a few >> "epochs", which often times is enough for the guest kernel to think the >> vCPU is not responding and crashing. >> >> An example of this situation is shown here. In fact, we can see d0v1 >> sitting in the runqueue while all the CPUs are idle, as it has >> -1254238270 credits, which is smaller than -2^30 = −1073741824: >> >> (XEN) Runqueue 0: >> (XEN) ncpus = 28 >> (XEN) cpus = 0-27 >> (XEN) max_weight = 256 >> (XEN) pick_bias = 22 >> (XEN) instload = 1 >> (XEN) aveload = 293391 (~111%) >> (XEN) idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff >> (XEN) tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000 >> (XEN) fully idle cores: >> 00,00000000,00000000,00000000,00000000,00000000,0fffffff >> [...] >> (XEN) Runqueue 0: >> (XEN) CPU[00] runq=0, sibling=00,..., core=00,... >> (XEN) CPU[01] runq=0, sibling=00,..., core=00,... >> [...] >> (XEN) CPU[26] runq=0, sibling=00,..., core=00,... >> (XEN) CPU[27] runq=0, sibling=00,..., core=00,... >> (XEN) RUNQ: >> (XEN) 0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 >> (~100%) >> >> We certainly don't want, under any circumstance, this to happen. >> Therefore, let's use INT_MIN for the credits of the idle vCPU, in >> Credit2, to be sure that no vCPU can get below that value. >> >> NOTE: investigations have been done about _how_ it is possible for a >> vCPU to execute for so long that its credits becomes so low. While still >> not completely clear, there are evidence that: >> - it only happens very rarely >> - it appears to be both machine and workload specific >> - it does not look to be a Credit2 (e.g., as it happens when running >> with Credit1 as well) issue, or a scheduler issue > > On what basis? > > Everything reported to xen-devel appears to suggests it is a credit2 > problem. It doesn't manifest on versions of Xen before credit2 became > the default, and switching back to credit1 appears to mitigate the problem. > > Certainly as far as XenServer is concerned, we haven't seen symptoms > like this in a decade of running credit1. One reason could be because the symptoms are different. On credit1, credits and “priority” are separated; it’s not possible in credit1 for a vcpu to end up with a lower priority than the idle domain, and no matter how low the credits become, a vcpu will always end up with some “peers” at the same priority level, meaning it always has a chance at some cpu. What Dario is saying (if I understand him correctly) is that the *proximate* cause (allowing a vcpu to have an effective priority of less than idle) is certainly credit2-only; but the *deeper* cause (vcpus racking up massive amounts of negative credit) is not. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |