[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] xen: credit2: avoid vCPUs to ever reach lower credits than idle

On 12/03/2020 13:44, Dario Faggioli wrote:
> There have been report of stalls of guest vCPUs, when Credit2 was used.
> It seemed like these vCPUs were not getting scheduled for very long
> time, even under light load conditions (e.g., during dom0 boot).
> Investigations led to the discovery that --although rarely-- it can
> happen that a vCPU manages to run for very long timeslices. In Credit2,
> this means that, when runtime accounting happens, the vCPU will lose a
> large quantity of credits. This in turn may lead to the vCPU having less
> credits than the idle vCPUs (-2^30). At this point, the scheduler will
> pick the idle vCPU, instead of the ready to run vCPU, for a few
> "epochs", which often times is enough for the guest kernel to think the
> vCPU is not responding and crashing.
> An example of this situation is shown here. In fact, we can see d0v1
> sitting in the runqueue while all the CPUs are idle, as it has
> -1254238270 credits, which is smaller than -2^30 = −1073741824:
>     (XEN) Runqueue 0:
>     (XEN)   ncpus              = 28
>     (XEN)   cpus               = 0-27
>     (XEN)   max_weight         = 256
>     (XEN)   pick_bias          = 22
>     (XEN)   instload           = 1
>     (XEN)   aveload            = 293391 (~111%)
>     (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>     (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
>     (XEN)   fully idle cores: 
> 00,00000000,00000000,00000000,00000000,00000000,0fffffff
>     [...]
>     (XEN) Runqueue 0:
>     (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
>     (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
>     [...]
>     (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
>     (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
>     (XEN) RUNQ:
>     (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 
> (~100%)
> We certainly don't want, under any circumstance, this to happen.
> Therefore, let's use INT_MIN for the credits of the idle vCPU, in
> Credit2, to be sure that no vCPU can get below that value.
> NOTE: investigations have been done about _how_ it is possible for a
> vCPU to execute for so long that its credits becomes so low. While still
> not completely clear, there are evidence that:
> - it only happens very rarely
> - it appears to be both machine and workload specific
> - it does not look to be a Credit2 (e.g., as it happens when running
>   with Credit1 as well) issue, or a scheduler issue

On what basis?

Everything reported to xen-devel appears to suggests it is a credit2
problem.  It doesn't manifest on versions of Xen before credit2 became
the default, and switching back to credit1 appears to mitigate the problem.

Certainly as far as XenServer is concerned, we haven't seen symptoms
like this in a decade of running credit1.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.