|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86: make "dom0_nodes=" work with credit2
On 08.04.2022 13:20, Dario Faggioli wrote:
> On Thu, 2022-04-07 at 15:27 +0200, Jan Beulich wrote:
>> ---
>> The Fixes: tag isn't very precise - it's rather the commit exposing
>> the
>> issue by default. I haven't been able to identify the actual commit
>> which did introduce the problem; it may well be that it has always
>> been
>> there since the introduction of credit2.
>>
>> Credit2 moving the vCPU-s off of their initially assigned ones right
>> away of course renders sched_select_initial_cpu()'s use of
>> cpu_cycle()
>> pretty useless.
>>
> Mmm... you mean that Credit2 is moving the vCPUs off they're assigned
> ones _right_now_, or that it will, with this patch?
Right now. On a 4-node (6 cores each) system with "dom0_nodes=0" I've
observed the 6 vCPU-s to reliably be put on CPUs 13, 14, etc. The
patch is actually undoing that questionable movement.
Since I have a large amount of other patches in place (none of which
I would assume to have such an effect) - Olaf has meanwhile confirmed
that the change helps for him as well.
> If you mean the former, I'm not sure it is. In fact, when
> sched_select_initial_cpu() is called for dom0, dom0's node affinity is
> just all nodes, isn't it? So, the we can pick any CPU in the cpupool,
> and we use cycle to try to spread the vCPUs kind of evenly.
The CPU mask used in the function is 0x3f for the example given above.
I did not check which of the constituent parts of the calculation have
this effect. But the result is that all CPUs will be put on CPU 0
first, as cpumask_cycle(..., 13) for a mask of 0x3f produces 0.
> If you mean after this patch, well, sure, but that's normal. Again,
> when we pick the initial CPU, we still don't know that the vCPUs have
> an affinity. Or, actually, we know that in sched_setup_dom0_vcpus(),
> but there's no direct way to tell it to sched_init_vcpu() (and hence
> sched_select_initial_cpu()).
>
> That's because, by default, affinity is just "all cpus", when we create
> the vCPUs, and we change that later, if they have one already (due to
> it being present in the config file, or in the dom0_nodes parameter).
But that's what I'm talking about a little further down, where you
reply that you don't think using the more narrow set would hide the
issue.
> Something that *maybe* we can try, since we're handling dom0 vCPUs
> specially anyway, is to directly set dom0's node affinity to the nodes
> of the CPUs we have in dom0_cpus, before calling vcpu_create() (in
> sched_setup_dom0_vcpus(), of course).
>
> This should make sched_select_initial_cpu() pick one of the "correct"
> CPUs in the first place. But I don't know if it's worth, neither if
> we'll still need this patch anyway (I have to check more thoroughly).
As per above - sched_select_initial_cpu() behaves as I would expect
it. It's credit2 which subsequently overrides that decision.
>> But I guess that's still useful for other schedulers.
>> I wonder though whether sched_init_vcpu() shouldn't use the CPU map
>> calculated by sched_select_initial_cpu() for its call to
>> sched_set_affinity() in the non-pinned case, rather than setting
>> "all".
>>
> If we do that, and there's no affinity configured for the guest, or no
> "dom0_nodes=", when will we reset the affinity to all, which what it
> should be in such a case?
Why "reset"? When no restrictions are in place, afaict
sched_select_initial_cpu() will calculate a mask of "all".
> Also, if I'm right in my reasoning above, when we come from
> sched_setup_dom0_vcpus(), the mast calculated by
> sched_select_initial_cpu() is basically cpupool0's cpus_valid, so this
> wouldn't really change anything for the problem we're trying to solve
> here.
>
>> (I guess doing so might mask the issue at hand, but I think the
>> change
>> here would still be applicable at least from an abstract pov.)
>>
> I don't think it would mask it, but I do think that, yes, the change
> you're making would still be applicable.
>
> And, about it, One thing...
>
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>> @@ -3403,9 +3403,15 @@ void __init sched_setup_dom0_vcpus(struc
>> {
>> for_each_sched_unit ( d, unit )
>> {
>> + spinlock_t *lock = unit_schedule_lock_irq(unit);
>> +
>> if ( !opt_dom0_vcpus_pin && !dom0_affinity_relaxed )
>> sched_set_affinity(unit, &dom0_cpus, NULL);
>> sched_set_affinity(unit, NULL, &dom0_cpus);
>> +
>> + sched_unit_migrate_start(unit);
>> + unit_schedule_unlock_irq(lock, unit);
>> + sched_unit_migrate_finish(unit);
>> }
>> }
>>
> ... The result of this (also considering the call to
> domain_update_node_affinity()) ends up looking very similar to what
> we'd get if, instead of calling sched_set_affinity(), we call
> vcpu_set_affinity().
Funny you should say this - this is what I had done first. It works,
but it is less efficient than the approach above. First and foremost
when there are multiple vCPU-s per unit.
> I'm therefore wondering if we should try to just do that... But I'm not
> sure, mostly because that would mean calling
> domain_update_node_affinity() for all dom0's vCPUs, which is clearly
> less efficient than just calling it once at the end.
Indeed, that's the other reason why I did change to the approach
above.
> So I'm thinking that we can indeed do it like this, and add a comment.
A comment to what effect?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |