[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xen/arm: Domain not fully destroyed when using credit2



Hi Dario,

On 24/01/17 12:53, Dario Faggioli wrote:
On Tue, 2017-01-24 at 10:50 +0000, Julien Grall wrote:
On 24/01/2017 08:20, Jan Beulich wrote:
On 23.01.17 at 20:42, <julien.grall@xxxxxxx> wrote:
The function domain_destroy will setup the RCU callback
(complete_domain_destroy) by calling call_rcu. call_rcu will add
the
callback into the RCU list and then will may send an IPI (see
force_quiescent_state) if the threshold reached. This IPI is here
to
make sure all CPUs are quiescent before calling the callbacks
(e.g
complete_domain_destroy). In my case, the threshold has not
reached and
therefore an IPI is not sent.

But wait - isn't it the nature of RCU that it may take arbitrary
time
until the actual call(s) happen(s)?

Today this arbitrary time could be infinite if an idle pCPU does not
receive an interrupt. So some part of domain resource will never be
freed.

If I am power-cycling a domain in loop, after some time the
toolstack
will fail to allocate memory because of exhausted resources.
Previous
instance of the domain was not yet fully destroyed (e.g
complete_domain_destroy was not called).

Do you have a script and/or some more info for letting me try to
reproduce it (e.g., you say some otf the vCPUs are pinned, which one?
etc)?

That was mentioned in my first e-mail :). My configuration is:
        - ARM platform with 6 cores
        - staging Xen with credit2 enabled by default
        - DOM0 using 2 pinned vCPUs
        - Guest using 2 vCPUs (not pinned)

The script is really simple:

for i in `seq 1 10`; do
        sudo xl create ~/works/guest/guest.cfg;
        sudo xl destroy guest;
done


I'm a bit curious about why you're saying this is being exposed by
using Credit2.

It is been exposed by Credit2 because compared to Credit1 there is no interrupt traffic made by the scheduler. On ARM with credit2 the interrupt traffic is reduced to none for idle pCPU.

In fact:
 1) I've power-cycled quite a few domains in these last months, while
    under Credit2, and I don't think I have encountered it on x86;

AFAIU, IPI is often the only way to broadcast some instruction on x86. So compare to ARM, you have likely an higher interrupt traffic.

Also, the problem is not obvious to spot unless you look at the free memory (via xl info) before and after. Another solution is printing a message in both domain_destroy and complete_domain_destroy.

You will spot the first message directly. The latter may never be printed.

 2) I see how it may be related to Credit2 being more deterministic
    and not trying to schedule stuff around pseudo-randomly like
    Credit1 does... but I'd like to try investigating a bit more.

I am able to reliable reproduce on a Juno-r2.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.