[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CPU Lockup bug with the credit2 scheduler

[Adding George, as scheduler maintainer, and Juergen as he commented, 
 later in this thread]

[Adding xen-users back, as the thread originated from there... sorry 
 for cross-posting]

On Mon, 2020-02-17 at 11:58 -0800, Sarah Newman wrote:
> If there are no merged or proposed fixes soon, it may be worth
> considering making the credit scheduler the default again until
> problems with the 
> credit2 scheduler are resolved.
Just as an heads up, I finally --thanks to Jim Fehlig-- gfound a
machine where I could reproduce (something like) this.

I've been able to do some analysis of the situation. Basically, on the
server I'm using, I do not see stalls severe enough to cause 
NMI/watchdogs to fire, but I do see, during boot, some preliminary
signs of that.

I checked what was happening in Xen at that point in time ('r' debug-
key, which dumps the scheduler's data scructures), and I found out that
there is a vCPU kind of stuck in a runqueue. In fact, the vCPU is in
there, i.e., it is ready to run *but* not running, despite being plenty
of idle pCPUs that could possibly run it.

Reason why it's not being picked up, is that its credit are less than
the ones of the idle vCPU.

I have a theory about how it got in such a situation and, if I'm right,
a draft of an idea of how to fix this.

We're using this bug, that Glen kindly created, to track this issue:


But of course I'll keep upstream MLs updated as well.

Stay tuned. :-)
Dario Faggioli, Ph.D
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.