[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]





On 09/28/2017 12:51 AM, Julien Grall wrote:
Hi Dario,

On 09/26/2017 09:51 PM, Dario Faggioli wrote:
On Tue, 2017-09-26 at 18:28 +0100, Julien Grall wrote:
On 09/26/2017 08:33 AM, Dario Faggioli wrote:

Here's the logs:
http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-
armhf-xl-rtds/info.html

It does not seem to be similar, in the credit2 case the kernel is
stuck at very early boot.
Here it seems it is running (there are grants setup).

Yes, I agree, it's not totally similar.

This seem to be confirmed from the guest console log, I can see the
prompt. Interestingly
when the guest job fails, it has been waiting for a long time disk
and hvc0. Although, it
does not timeout.

Ah, I see what you mean, I found it in the guest console log.

I am actually quite surprised that we start a 4 vCPUs guest on a 2
pCPUs platform. The total of
vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest
for testing. So I was
wondering if we end up to have too many vCPUs running on the platform
and making it unreliable
the test?

Well, doing that, with this scheduler, is certainly *not* the best
recipe for determinism and reliability.

In fact, RTDS is a non-work conserving scheduler. This means that (with
default parameters) each vCPU gets at most 40% CPU time, even if there
are idle cycles.

With 6 vCPU, there's a total demand of 240% of CPU time, and with 2
pCPUs, there's at most 200% of that, which means we're in overload
(well, at least that's the case if/when all the vCPUs try to execute
for their guaranteed 40%).

Things *should really not* explode (like as in Xen crashes) if that
happens; actually, from a scheduler perspective, it should really not
be too big of a deal (especially if the overload is transient, like I
guess it should be in this case). However, it's entirely possible that
some specific vCPUs failing to be scheduler for a certain amount of
time, causes something _inside_ the guest to timeout, or get stuck or
wedged, which may be what happens here.

Looking at the log I don't see any crash of Xen and it seems to
be responsive.

I forgot to add that I don't see any timeout on the guest console
but can notice slow down (waiting for some PV device).


I don't know much about the scheduler and how to interpret the logs:

Sep 25 22:43:21.495119 (XEN) Domain info:
Sep 25 22:43:21.503073 (XEN)    domain: 0
Sep 25 22:43:21.503100 (XEN) [    0.0 ] cpu 0, (10000000, 4000000), 
cur_b=3895333 cur_d=1611120000000 last_start=1611116505875
Sep 25 22:43:21.511080 (XEN)             onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.519082 (XEN) [    0.1 ] cpu 1, (10000000, 4000000), 
cur_b=3946375 cur_d=1611130000000 last_start=1611126446583
Sep 25 22:43:21.527023 (XEN)             onQ=0 runnable=1 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.535063 (XEN)    domain: 5
Sep 25 22:43:21.535089 (XEN) [    5.0 ] cpu 0, (10000000, 4000000), 
cur_b=3953875 cur_d=1611120000000 last_start=1611110106041
Sep 25 22:43:21.543073 (XEN)             onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.551078 (XEN) [    5.1 ] cpu 1, (10000000, 4000000), 
cur_b=3938167 cur_d=1611140000000 last_start=1611130169791
Sep 25 22:43:21.559063 (XEN)             onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.559096 (XEN) [    5.2 ] cpu 1, (10000000, 4000000), 
cur_b=3952500 cur_d=1611140000000 last_start=1611130107958
Sep 25 22:43:21.575067 (XEN)             onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1
Sep 25 22:43:21.575101 (XEN) [    5.3 ] cpu 0, (10000000, 4000000), 
cur_b=3951875 cur_d=1611120000000 last_start=1611110154166
Sep 25 22:43:21.583196 (XEN)             onQ=0 runnable=0 flags=0 effective 
hard_affinity=0-1

Also, it seems to fail fairly reliably, so it might be possible
to set up a reproducer.

Cheers,


--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.