[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Revert "domctl: improve locking during domain destruction"

On 24.03.2020 19:39, Julien Grall wrote:
> On 24/03/2020 16:13, Jan Beulich wrote:
>> On 24.03.2020 16:21, Hongyan Xia wrote:
>>> From: Hongyan Xia <hongyxia@xxxxxxxxxx>
>>> In contrast,
>>> after dropping that commit, parallel domain destructions will just fail
>>> to take the domctl lock, creating a hypercall continuation and backing
>>> off immediately, allowing the thread that holds the lock to destroy a
>>> domain much more quickly and allowing backed-off threads to process
>>> events and irqs.
>>> On a 144-core server with 4TiB of memory, destroying 32 guests (each
>>> with 4 vcpus and 122GiB memory) simultaneously takes:
>>> before the revert: 29 minutes
>>> after the revert: 6 minutes
>> This wants comparing against numbers demonstrating the bad effects of
>> the global domctl lock. Iirc they were quite a bit higher than 6 min,
>> perhaps depending on guest properties.
> Your original commit message doesn't contain any clue in which
> cases the domctl lock was an issue. So please provide information
> on the setups you think it will make it worse.

I did never observe the issue myself - let's see whether one of the SUSE
people possibly involved in this back then recall (or have further
pointers; Jim, Charles?), or whether any of the (partly former) Citrix
folks do. My vague recollection is that the issue was the tool stack as
a whole stalling for far too long in particular when destroying very
large guests. One important aspect not discussed in the commit message
at all is that holding the domctl lock block basically _all_ tool stack
operations (including e.g. creation of new guests), whereas the new
issue attempted to be addressed is limited to just domain cleanup.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.