|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 3/3] xen/sched: fix cpu hotplug
On 01/09/2022 07:11, Juergen Gross wrote:
> On 01.09.22 00:52, Andrew Cooper wrote:
>> On 16/08/2022 11:13, Juergen Gross wrote:
>>> Cpu cpu unplugging is calling schedule_cpu_rm() via stop_machine_run()
>>
>> Cpu cpu.
>>
>>> with interrupts disabled, thus any memory allocation or freeing must
>>> be avoided.
>>>
>>> Since commit 5047cd1d5dea ("xen/common: Use enhanced
>>> ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced
>>> via an assertion, which will now fail.
>>>
>>> Before that commit cpu unplugging in normal configurations was working
>>> just by chance as only the cpu performing schedule_cpu_rm() was doing
>>> active work. With core scheduling enabled, however, failures could
>>> result from memory allocations not being properly propagated to other
>>> cpus' TLBs.
>>
>> This isn't accurate, is it? The problem with initiating a TLB flush
>> with IRQs disabled is that you can deadlock against a remote CPU which
>> is waiting for you to enable IRQs first to take a TLB flush IPI.
>
> As long as only one cpu is trying to allocate/free memory during the
> stop_machine_run() action the deadlock won't happen.
>
>> How does a memory allocation out of the xenheap result in a TLB flush?
>> Even with split heaps, you're only potentially allocating into a new
>> slot which was unused...
>
> Yeah, you are right. The main problem would occur only when a virtual
> address is changed to point at another physical address, which should be
> quite unlikely.
>
> I can drop that paragraph, as it doesn't really help.
Yeah, I think that would be best.
>>
>>> diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
>>> index 58e082eb4c..2506861e4f 100644
>>> --- a/xen/common/sched/cpupool.c
>>> +++ b/xen/common/sched/cpupool.c
>>> @@ -411,22 +411,28 @@ int cpupool_move_domain(struct domain *d,
>>> struct cpupool *c)
>>> }
>>> /* Update affinities of all domains in a cpupool. */
>>> -static void cpupool_update_node_affinity(const struct cpupool *c)
>>> +static void cpupool_update_node_affinity(const struct cpupool *c,
>>> + struct affinity_masks *masks)
>>> {
>>> - struct affinity_masks masks;
>>> + struct affinity_masks local_masks;
>>> struct domain *d;
>>> - if ( !update_node_aff_alloc(&masks) )
>>> - return;
>>> + if ( !masks )
>>> + {
>>> + if ( !update_node_aff_alloc(&local_masks) )
>>> + return;
>>> + masks = &local_masks;
>>> + }
>>> rcu_read_lock(&domlist_read_lock);
>>> for_each_domain_in_cpupool(d, c)
>>> - domain_update_node_aff(d, &masks);
>>> + domain_update_node_aff(d, masks);
>>> rcu_read_unlock(&domlist_read_lock);
>>> - update_node_aff_free(&masks);
>>> + if ( masks == &local_masks )
>>> + update_node_aff_free(masks);
>>> }
>>> /*
>>
>> Why do we need this at all? domain_update_node_aff() already knows what
>> to do when passed NULL, so this seems like an awfully complicated no-op.
>
> You do realize that update_node_aff_free() will do something in case
> masks
> was initially NULL?
By "this", I meant the entire hunk, not just the final if().
What is wrong with passing the (possibly NULL) masks pointer straight
down into domain_update_node_aff() and removing all the memory
allocation in this function?
But I've also answered that by looking more clearly. It's about not
repeating the memory allocations/freeing for each domain in the pool.
Which does make sense as this is a slow path already, but the complexity
here of having conditionally allocated masks is far from simple.
>
>>
>>> @@ -1008,10 +1016,21 @@ static int cf_check cpu_callback(
>>> {
>>> unsigned int cpu = (unsigned long)hcpu;
>>> int rc = 0;
>>> + static struct cpu_rm_data *mem;
>>> switch ( action )
>>> {
>>> case CPU_DOWN_FAILED:
>>> + if ( system_state <= SYS_STATE_active )
>>> + {
>>> + if ( mem )
>>> + {
>>
>> So, this does compile (and indeed I've tested the result), but I can't
>> see how it should.
>>
>> mem is guaranteed to be uninitialised at this point, and ...
>
> ... it is defined as "static", so it is clearly NULL initially.
Oh, so it is. That is hiding particularly well in plain sight.
Can it please be this:
@@ -1014,9 +1014,10 @@ void cf_check dump_runq(unsigned char key)
static int cf_check cpu_callback(
struct notifier_block *nfb, unsigned long action, void *hcpu)
{
+ static struct cpu_rm_data *mem; /* Protected by cpu_add_remove_lock */
+
unsigned int cpu = (unsigned long)hcpu;
int rc = 0;
- static struct cpu_rm_data *mem;
switch ( action )
{
We already split static and non-static variable like this elsewhere for
clarity, and identifying the lock which protects the data is useful for
anyone coming to this in the future.
~Andrew
P.S. as an observation, this isn't safe for parallel AP booting, but I
guarantee that this isn't the only example which would want fixing if we
wanted to get parallel booting working.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |