[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v3 3/3] x86/irq: forward pending interrupts to new destination in fixup_irqs()
On 19.06.2024 09:05, Roger Pau Monné wrote: > On Tue, Jun 18, 2024 at 06:30:22PM +0200, Jan Beulich wrote: >> On 18.06.2024 16:50, Roger Pau Monné wrote: >>> On Tue, Jun 18, 2024 at 04:34:50PM +0200, Jan Beulich wrote: >>>> On 18.06.2024 13:30, Roger Pau Monné wrote: >>>>> On Mon, Jun 17, 2024 at 03:41:12PM +0200, Jan Beulich wrote: >>>>>> On 13.06.2024 18:56, Roger Pau Monne wrote: >>>>>>> @@ -2686,11 +2705,27 @@ void fixup_irqs(const cpumask_t *mask, bool >>>>>>> verbose) >>>>>>> if ( desc->handler->disable ) >>>>>>> desc->handler->disable(desc); >>>>>>> >>>>>>> + /* >>>>>>> + * If the current CPU is going offline and is (one of) the >>>>>>> target(s) of >>>>>>> + * the interrupt, signal to check whether there are any >>>>>>> pending vectors >>>>>>> + * to be handled in the local APIC after the interrupt has >>>>>>> been moved. >>>>>>> + */ >>>>>>> + if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, >>>>>>> desc->arch.cpu_mask) ) >>>>>>> + check_irr = true; >>>>>>> + >>>>>>> if ( desc->handler->set_affinity ) >>>>>>> desc->handler->set_affinity(desc, affinity); >>>>>>> else if ( !(warned++) ) >>>>>>> set_affinity = false; >>>>>>> >>>>>>> + if ( check_irr && apic_irr_read(vector) ) >>>>>>> + /* >>>>>>> + * Forward pending interrupt to the new destination, this >>>>>>> CPU is >>>>>>> + * going offline and otherwise the interrupt would be lost. >>>>>>> + */ >>>>>>> + send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)), >>>>>>> + desc->arch.vector); >>>>>> >>>>>> Hmm, IRR may become set right after the IRR read (unlike in the other >>>>>> cases, >>>>>> where new IRQs ought to be surfacing only at the new destination). >>>>>> Doesn't >>>>>> this want moving ... >>>>>> >>>>>>> if ( desc->handler->enable ) >>>>>>> desc->handler->enable(desc); >>>>>> >>>>>> ... past the actual affinity change? >>>>> >>>>> Hm, but the ->enable() hook is just unmasking the interrupt, the >>>>> actual affinity change is done in ->set_affinity(), and hence after >>>>> the call to ->set_affinity() no further interrupts should be delivered >>>>> to the CPU regardless of whether the source is masked? >>>>> >>>>> Or is it possible for the device/interrupt controller to not switch to >>>>> use the new destination until the interrupt is unmasked, and hence >>>>> could have pending masked interrupts still using the old destination? >>>>> IIRC For MSI-X it's required that the device updates the destination >>>>> target once the entry is unmasked. >>>> >>>> That's all not relevant here, I think. IRR gets set when an interrupt is >>>> signaled, no matter whether it's masked. >>> >>> I'm kind of lost here, what does signaling mean in this context? >>> >>> I would expect the interrupt vector to not get set in IRR if the MSI-X >>> entry is masked, as at that point the state of the address/data fields >>> might not be consistent (that's the whole point of masking it right?) >>> >>>> It's its handling which the >>>> masking would prevent, i.e. the "moving" of the set bit from IRR to ISR. >>> >>> My understanding was that the masking would prevent the message write to >>> the APIC from happening, and hence no vector should get set in IRR. >> >> Hmm, yes, looks like I was confused. The masking is at the source side >> (IO-APIC RTE, MSI-X entry, or - if supported - in the MSI capability). >> So the sole case to worry about is MSI without mask-bit support then. > > Yeah, and for MSI without masking bit support we don't care doing the > IRR check before or after the ->enable() hook, as that's a no-op in > that case. The write to the MSI address/data fields has already been > done, and hence the issue would be exclusively with draining any > in-flight writes to the APIC doorbell (what you mention below). Except that both here ... >>>> Plus we run with IRQs off here anyway if I'm not mistaken, so no >>>> interrupt can be delivered to the local CPU. IOW whatever IRR bits it >>>> has set (including ones becoming set between the IRR read and the actual >>>> vector change), those would never be serviced. Hence the reading of the >>>> bit ought to occur after the vector change: It's only then that we know >>>> the IRR bit corresponding to the old vector can't become set anymore. >>> >>> Right, and the vector change happens in ->set_affinity(), not >>> ->enable(). See for example set_msi_affinity() and the >>> write_msi_msg(), that's where the vector gets changed. >>> >>>> And even then we're assuming that no interrupt signals might still be >>>> "on their way" from the IO-APIC or a posted MSI message write by a >>>> device to the LAPIC (I have no idea how to properly fence that, or >>>> whether there are guarantees for this to never occur). >>> >>> Yeah, those I expect would be completed in the window between the >>> write of the new vector/destination and the reading of IRR. >> >> Except we have no idea on the latencies. > > There isn't much else we can do? Even the current case where we add > the 1ms window at the end of the shuffling could still suffer from > this issue because we don't know the latencies. IOW: I don't think > this is any worse than the current approach. ... and here, the later we read IRR, the better the chances we don't miss anything. Even the no-op ->enable() isn't a no-op execution-wise. In fact it (quite pointlessly[1]) is an indirect call to irq_enable_none(). I'm actually inclined to suggest that we try to even further delay the IRR read, certainly past the cpumask_copy(), maybe even past the spin_unlock() (latching CPU and vector into local variables, along with the latching of ->affinity that's already there). Jan [1] While back when that was written the main goal probably was to avoid conditionals on what may be deemed fast paths, I wonder whether nowadays the main goal wouldn't be to avoid indirect calls when we (pretty) easily can.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |