Xen project Mailing List

Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 19 Jun 2024 09:24:41 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 19 Jun 2024 07:25:02 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 19.06.2024 09:05, Roger Pau Monné wrote: > On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote: >> On 18.06.2024 16:50, Roger Pau Monné wrote: >>> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote: >>>> On 18.06.2024 13:30, Roger Pau Monné wrote: >>>>> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote: >>>>>> On 13.06.2024 18:56, Roger Pau Monne wrote: >>>>>>> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool >>>>>>> verbose) >>>>>>> if ( desc->handler->disable ) >>>>>>> desc->handler->disable(desc); >>>>>>> >>>>>>> + /* >>>>>>> + * If the current CPU is going offline and is (one of) the >>>>>>> target(s) of >>>>>>> + * the interrupt, signal to check whether there are any >>>>>>> pending vectors >>>>>>> + * to be handled in the local APIC after the interrupt has >>>>>>> been moved. >>>>>>> + */ >>>>>>> + if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, >>>>>>> desc->arch.cpu_mask) ) >>>>>>> + check_irr = true; >>>>>>> + >>>>>>> if ( desc->handler->set_affinity ) >>>>>>> desc->handler->set_affinity(desc, affinity); >>>>>>> else if ( !(warned++) ) >>>>>>> set_affinity = false; >>>>>>> >>>>>>> + if ( check_irr && apic_irr_read(vector) ) >>>>>>> + /* >>>>>>> + * Forward pending interrupt to the new destination, this >>>>>>> CPU is >>>>>>> + * going offline and otherwise the interrupt would be lost. >>>>>>> + */ >>>>>>> + send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)), >>>>>>> + desc->arch.vector); >>>>>> >>>>>> Hmm, IRR may become set right after the IRR read (unlike in the other >>>>>> cases, >>>>>> where new IRQs ought to be surfacing only at the new destination). >>>>>> Doesn't >>>>>> this want moving ... >>>>>> >>>>>>> if ( desc->handler->enable ) >>>>>>> desc->handler->enable(desc); >>>>>> >>>>>> ... past the actual affinity change? >>>>> >>>>> Hm, but the ->enable() hook is just unmasking the interrupt, the >>>>> actual affinity change is done in ->set_affinity(), and hence after >>>>> the call to ->set_affinity() no further interrupts should be delivered >>>>> to the CPU regardless of whether the source is masked? >>>>> >>>>> Or is it possible for the device/interrupt controller to not switch to >>>>> use the new destination until the interrupt is unmasked, and hence >>>>> could have pending masked interrupts still using the old destination? >>>>> IIRC For MSI-X it's required that the device updates the destination >>>>> target once the entry is unmasked. >>>> >>>> That's all not relevant here, I think. IRR gets set when an interrupt is >>>> signaled, no matter whether it's masked. >>> >>> I'm kind of lost here, what does signaling mean in this context? >>> >>> I would expect the interrupt vector to not get set in IRR if the MSI-X >>> entry is masked, as at that point the state of the address/data fields >>> might not be consistent (that's the whole point of masking it right?) >>> >>>> It's its handling which the >>>> masking would prevent, i.e. the "moving" of the set bit from IRR to ISR. >>> >>> My understanding was that the masking would prevent the message write to >>> the APIC from happening, and hence no vector should get set in IRR. >> >> Hmm, yes, looks like I was confused. The masking is at the source side >> (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability). >> So the sole case to worry about is MSI without mask-bit support then. > > Yeah, and for MSI without masking bit support we don't care doing the > IRR check before or after the ->enable() hook, as that's a no-op in > that case. The write to the MSI address/data fields has already been > done, and hence the issue would be exclusively with draining any > in-flight writes to the APIC doorbell (what you mention below). Except that both here ... >>>> Plus we run with IRQs off here anyway if I'm not mistaken, so no >>>> interrupt can be delivered to the local CPU. IOW whatever IRR bits it >>>> has set (including ones becoming set between the IRR read and the actual >>>> vector change), those would never be serviced. Hence the reading of the >>>> bit ought to occur after the vector change: It's only then that we know >>>> the IRR bit corresponding to the old vector can't become set anymore. >>> >>> Right, and the vector change happens in ->set_affinity(), not >>> ->enable(). See for example set_msi_affinity() and the >>> write_msi_msg(), that's where the vector gets changed. >>> >>>> And even then we're assuming that no interrupt signals might still be >>>> "on their way" from the IO-APIC or a posted MSI message write by a >>>> device to the LAPIC (I have no idea how to properly fence that, or >>>> whether there are guarantees for this to never occur). >>> >>> Yeah, those I expect would be completed in the window between the >>> write of the new vector/destination and the reading of IRR. >> >> Except we have no idea on the latencies. > > There isn't much else we can do? Even the current case where we add > the 1ms window at the end of the shuffling could still suffer from > this issue because we don't know the latencies. IOW: I don't think > this is any worse than the current approach. ... and here, the later we read IRR, the better the chances we don't miss anything. Even the no-op ->enable() isn't a no-op execution-wise. In fact it (quite pointlessly[1]) is an indirect call to irq_enable_none(). I'm actually inclined to suggest that we try to even further delay the IRR read, certainly past the cpumask_copy(), maybe even past the spin_unlock() (latching CPU and vector into local variables, along with the latching of ->affinity that's already there). Jan [1] While back when that was written the main goal probably was to avoid conditionals on what may be deemed fast paths, I wonder whether nowadays the main goal wouldn't be to avoid indirect calls when we (pretty) easily can.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.