[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 5/7] x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 12 Jun 2024 11:04:26 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 12 Jun 2024 09:04:44 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 12.06.2024 10:47, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 02:45:09PM +0200, Jan Beulich wrote:
>> On 10.06.2024 16:20, Roger Pau Monne wrote:
>>> Given the current logic it's possible for ->arch.old_cpu_mask to get out of
>>> sync: if a CPU set in old_cpu_mask is offlined and then onlined
>>> again without old_cpu_mask having been updated the data in the mask will no
>>> longer be accurate, as when brought back online the CPU will no longer have
>>> old_vector configured to handle the old interrupt source.
>>>
>>> If there's an interrupt movement in progress, and the to be offlined CPU 
>>> (which
>>> is the call context) is in the old_cpu_mask clear it and update the mask, 
>>> so it
>>> doesn't contain stale data.
>>
>> This imo is too __cpu_disable()-centric. In the code you cover the
>> smp_send_stop() case afaict, where it's all _other_ CPUs which are being
>> offlined. As we're not meaning to bring CPUs online again in that case,
>> dealing with the situation likely isn't needed. Yet the description should
>> imo at least make clear that the case was considered.
> 
> What about adding the following paragraph:

Sounds good, just maybe one small adjustment:

> Note that when the system is going down fixup_irqs() will be called by
> smp_send_stop() from CPU 0 with a mask with only CPU 0 on it,
> effectively asking to move all interrupts to the current caller (CPU
> 0) which is the only CPU online.  In that case we don't care to

"... the only CPU to remain online."

> migrate interrupts that are in the process of being moved, as it's
> likely we won't be able to move all interrupts to CPU 0 due to vector
> shortage anyway.
> 
>>
>>> @@ -2589,6 +2589,28 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
>>>                                 affinity);
>>>          }
>>>  
>>> +        if ( desc->arch.move_in_progress &&
>>> +             !cpumask_test_cpu(cpu, &cpu_online_map) &&
>>
>> This part of the condition is, afaict, what covers (excludes) the
>> smp_send_stop() case. Might be nice to have a brief comment here, thus
>> also clarifying ...
> 
> Would you be fine with:
> 
>         if ( desc->arch.move_in_progress &&
>              /*
>             * Only attempt to migrate if the current CPU is going
>             * offline, otherwise the whole system is going down and
>             * leaving stale interrupts is fine.
>             */
>              !cpumask_test_cpu(cpu, &cpu_online_map) &&
>              cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )

Sure, this is even more verbose (i.e. better) than I was after.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.