Xen project Mailing List

Re: [PATCH] x86: make "dom0_nodes=" work with credit2

Date: Fri, 8 Apr 2022 14:36:26 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=nJ9DjZ2YgBITR4o5KgspTRYuUPRJnVk9AglH4dqAqOY=; b=gpzUPy0UAFxbSMADXwD57r01lrAgKDR7xC0XPh374YaxtOpdO6mvjHiw4rk7QT7SgUQvmt5XDvtT3byhEfU72QE9+eSUP7k9BvdT8nZ2NQSkLPFEtSu5cT0MavD9owty6bmu123/zbvV/vzoV/5YY3BzdCDp8sXXb0UD9DQG3Xa0ohDFKtHaCInGESpt0roeXqfyrqsJtvSGAPITGhoMfX/CcLNYNoUZ33rao7yg6MTe8TnSmMaOcBblLTyT80CVJ4LRW1dM541PTTrdeH1Ed4lKHPU/whUgj253RXBIjdVYashvi6fdtbEB0E9HxsC1w/yELKV0IgCJT/sas0Cq+w==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=InrDKFC3x/srg0A/dBHW2ivAxdI7FNHBdY1jrHT9zene/XYAPLsnC/PQb0v6eF026Gnyre6KlXaQftJ7R0VEI/NGgGhalE2U9aKxuf69lRAMjZ1KCGfJxmli+Ax4J5A0pHOOF4AIcg3f+/l7ISGIxjncOzrFaAHXB65d6Fdze70iY0sLTotLjOsD+E0V0OIHeCOaA25RMoSPB//JQ9Zyz708Ehvlo0RTWC9bv/pkls0az9GA8eeY8K5uRNPLNerUmS0Q7u62DzLr6GKgr05PinG3PuwkpQkrpj7tqd47lnfkZ0aa57Ra/GFiouB56ZsHtEey+3XOc7i8D+wqoPIY+w==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "ohering@xxxxxxx" <ohering@xxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 08 Apr 2022 12:36:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.04.2022 13:20, Dario Faggioli wrote: > On Thu, 2022-04-07 at 15:27 +0200, Jan Beulich wrote: >> --- >> The Fixes: tag isn't very precise - it's rather the commit exposing >> the >> issue by default. I haven't been able to identify the actual commit >> which did introduce the problem; it may well be that it has always >> been >> there since the introduction of credit2. >> >> Credit2 moving the vCPU-s off of their initially assigned ones right >> away of course renders sched_select_initial_cpu()'s use of >> cpu_cycle() >> pretty useless. >> > Mmm... you mean that Credit2 is moving the vCPUs off they're assigned > ones _right_now_, or that it will, with this patch? Right now. On a 4-node (6 cores each) system with "dom0_nodes=0" I've observed the 6 vCPU-s to reliably be put on CPUs 13, 14, etc. The patch is actually undoing that questionable movement. Since I have a large amount of other patches in place (none of which I would assume to have such an effect) - Olaf has meanwhile confirmed that the change helps for him as well. > If you mean the former, I'm not sure it is. In fact, when > sched_select_initial_cpu() is called for dom0, dom0's node affinity is > just all nodes, isn't it? So, the we can pick any CPU in the cpupool, > and we use cycle to try to spread the vCPUs kind of evenly. The CPU mask used in the function is 0x3f for the example given above. I did not check which of the constituent parts of the calculation have this effect. But the result is that all CPUs will be put on CPU 0 first, as cpumask_cycle(..., 13) for a mask of 0x3f produces 0. > If you mean after this patch, well, sure, but that's normal. Again, > when we pick the initial CPU, we still don't know that the vCPUs have > an affinity. Or, actually, we know that in sched_setup_dom0_vcpus(), > but there's no direct way to tell it to sched_init_vcpu() (and hence > sched_select_initial_cpu()). > > That's because, by default, affinity is just "all cpus", when we create > the vCPUs, and we change that later, if they have one already (due to > it being present in the config file, or in the dom0_nodes parameter). But that's what I'm talking about a little further down, where you reply that you don't think using the more narrow set would hide the issue. > Something that *maybe* we can try, since we're handling dom0 vCPUs > specially anyway, is to directly set dom0's node affinity to the nodes > of the CPUs we have in dom0_cpus, before calling vcpu_create() (in > sched_setup_dom0_vcpus(), of course). > > This should make sched_select_initial_cpu() pick one of the "correct" > CPUs in the first place. But I don't know if it's worth, neither if > we'll still need this patch anyway (I have to check more thoroughly). As per above - sched_select_initial_cpu() behaves as I would expect it. It's credit2 which subsequently overrides that decision. >> But I guess that's still useful for other schedulers. >> I wonder though whether sched_init_vcpu() shouldn't use the CPU map >> calculated by sched_select_initial_cpu() for its call to >> sched_set_affinity() in the non-pinned case, rather than setting >> "all". >> > If we do that, and there's no affinity configured for the guest, or no > "dom0_nodes=", when will we reset the affinity to all, which what it > should be in such a case? Why "reset"? When no restrictions are in place, afaict sched_select_initial_cpu() will calculate a mask of "all". > Also, if I'm right in my reasoning above, when we come from > sched_setup_dom0_vcpus(), the mast calculated by > sched_select_initial_cpu() is basically cpupool0's cpus_valid, so this > wouldn't really change anything for the problem we're trying to solve > here. > >> (I guess doing so might mask the issue at hand, but I think the >> change >> here would still be applicable at least from an abstract pov.) >> > I don't think it would mask it, but I do think that, yes, the change > you're making would still be applicable. > > And, about it, One thing... > >> --- a/xen/common/sched/core.c >> +++ b/xen/common/sched/core.c >> @@ -3403,9 +3403,15 @@ void __init sched_setup_dom0_vcpus(struc >> { >> for_each_sched_unit ( d, unit ) >> { >> + spinlock_t *lock = unit_schedule_lock_irq(unit); >> + >> if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed ) >> sched_set_affinity(unit, &dom0_cpus, NULL); >> sched_set_affinity(unit, NULL, &dom0_cpus); >> + >> + sched_unit_migrate_start(unit); >> + unit_schedule_unlock_irq(lock, unit); >> + sched_unit_migrate_finish(unit); >> } >> } >> > ... The result of this (also considering the call to > domain_update_node_affinity()) ends up looking very similar to what > we'd get if, instead of calling sched_set_affinity(), we call > vcpu_set_affinity(). Funny you should say this - this is what I had done first. It works, but it is less efficient than the approach above. First and foremost when there are multiple vCPU-s per unit. > I'm therefore wondering if we should try to just do that... But I'm not > sure, mostly because that would mean calling > domain_update_node_affinity() for all dom0's vCPUs, which is clearly > less efficient than just calling it once at the end. Indeed, that's the other reason why I did change to the approach above. > So I'm thinking that we can indeed do it like this, and add a comment. A comment to what effect? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.