Xen project Mailing List

Re: [PATCH v2 6/7] x86/irq: handle moving interrupts in _assign_irq_vector()

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 13 Jun 2024 14:55:31 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 13 Jun 2024 12:55:39 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Jun 13, 2024 at 01:36:55PM +0200, Jan Beulich wrote: > On 13.06.2024 13:31, Roger Pau Monné wrote: > > On Thu, Jun 13, 2024 at 10:38:35AM +0200, Jan Beulich wrote: > >> On 12.06.2024 17:36, Roger Pau Monné wrote: > >>> On Wed, Jun 12, 2024 at 03:42:58PM +0200, Jan Beulich wrote: > >>>> On 12.06.2024 12:39, Roger Pau Monné wrote: > >>>>> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote: > >>>>>> On 10.06.2024 16:20, Roger Pau Monne wrote: > >>>>>>> Currently there's logic in fixup_irqs() that attempts to prevent > >>>>>>> _assign_irq_vector() from failing, as fixup_irqs() is required to > >>>>>>> evacuate all > >>>>>>> interrupts from the CPUs not present in the input mask. The current > >>>>>>> logic in > >>>>>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that > >>>>>>> have > >>>>>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field. > >>>>>>> > >>>>>>> Instead of attempting to fixup the interrupt descriptor in > >>>>>>> fixup_irqs() so that > >>>>>>> _assign_irq_vector() cannot fail, introduce logic in > >>>>>>> _assign_irq_vector() > >>>>>>> to deal with interrupts that have either > >>>>>>> move_{in_progress,cleanup_count} set > >>>>>>> and no remaining online CPUs in ->arch.cpu_mask. > >>>>>>> > >>>>>>> If _assign_irq_vector() is requested to move an interrupt in the state > >>>>>>> described above, first attempt to see if ->arch.old_cpu_mask contains > >>>>>>> any valid > >>>>>>> CPUs that could be used as fallback, and if that's the case do move > >>>>>>> the > >>>>>>> interrupt back to the previous destination. Note this is easier > >>>>>>> because the > >>>>>>> vector hasn't been released yet, so there's no need to allocate and > >>>>>>> setup a new > >>>>>>> vector on the destination. > >>>>>>> > >>>>>>> Due to the logic in fixup_irqs() that clears offline CPUs from > >>>>>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes > >>>>>>> empty) it > >>>>>>> shouldn't be possible to get into _assign_irq_vector() with > >>>>>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in > >>>>>>> ->arch.old_cpu_mask. > >>>>>>> > >>>>>>> However if ->arch.move_{in_progress,cleanup_count} is set and the > >>>>>>> interrupt has > >>>>>>> also changed affinity, it's possible the members of > >>>>>>> ->arch.old_cpu_mask are no > >>>>>>> longer part of the affinity set, > >>>>>> > >>>>>> I'm having trouble relating this (->arch.old_cpu_mask related) to ... > >>>>>> > >>>>>>> move the interrupt to a different CPU part of > >>>>>>> the provided mask > >>>>>> > >>>>>> ... this (->arch.cpu_mask related). > >>>>> > >>>>> No, the "provided mask" here is the "mask" parameter, not > >>>>> ->arch.cpu_mask. > >>>> > >>>> Oh, so this describes the case of "hitting" the comment at the very > >>>> bottom of > >>>> the first hunk then? (I probably was misreading this because I was > >>>> expecting > >>>> it to describe a code change, rather than the case where original > >>>> behavior > >>>> needs retaining. IOW - all fine here then.) > >>>> > >>>>>>> and keep the current ->arch.old_{cpu_mask,vector} for the > >>>>>>> pending interrupt movement to be completed. > >>>>>> > >>>>>> Right, that's to clean up state from before the initial move. What > >>>>>> isn't > >>>>>> clear to me is what's to happen with the state of the intermediate > >>>>>> placement. Description and code changes leave me with the impression > >>>>>> that > >>>>>> it's okay to simply abandon, without any cleanup, yet I can't quite > >>>>>> figure > >>>>>> why that would be an okay thing to do. > >>>>> > >>>>> There isn't much we can do with the intermediate placement, as the CPU > >>>>> is going offline. However we can drain any pending interrupts from > >>>>> IRR after the new destination has been set, since setting the > >>>>> destination is done from the CPU that's the current target of the > >>>>> interrupts. So we can ensure the draining is done strictly after the > >>>>> target has been switched, hence ensuring no further interrupts from > >>>>> this source will be delivered to the current CPU. > >>>> > >>>> Hmm, I'm afraid I still don't follow: I'm specifically in trouble with > >>>> the ... > >>>> > >>>>>>> --- a/xen/arch/x86/irq.c > >>>>>>> +++ b/xen/arch/x86/irq.c > >>>>>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc > >>>>>>> *desc, const cpumask_t *mask) > >>>>>>> } > >>>>>>> > >>>>>>> if ( desc->arch.move_in_progress || > >>>>>>> desc->arch.move_cleanup_count ) > >>>>>>> - return -EAGAIN; > >>>>>>> + { > >>>>>>> + /* > >>>>>>> + * If the current destination is online refuse to shuffle. > >>>>>>> Retry after > >>>>>>> + * the in-progress movement has finished. > >>>>>>> + */ > >>>>>>> + if ( cpumask_intersects(desc->arch.cpu_mask, > >>>>>>> &cpu_online_map) ) > >>>>>>> + return -EAGAIN; > >>>>>>> + > >>>>>>> + /* > >>>>>>> + * Due to the logic in fixup_irqs() that clears offlined > >>>>>>> CPUs from > >>>>>>> + * ->arch.old_cpu_mask it shouldn't be possible to get here > >>>>>>> with > >>>>>>> + * ->arch.move_{in_progress,cleanup_count} set and no online > >>>>>>> CPUs in > >>>>>>> + * ->arch.old_cpu_mask. > >>>>>>> + */ > >>>>>>> + ASSERT(valid_irq_vector(desc->arch.old_vector)); > >>>>>>> + ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, > >>>>>>> &cpu_online_map)); > >>>>>>> + > >>>>>>> + if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) ) > >>>>>>> + { > >>>>>>> + /* > >>>>>>> + * Fallback to the old destination if moving is in > >>>>>>> progress and the > >>>>>>> + * current destination is to be offlined. This is only > >>>>>>> possible if > >>>>>>> + * the CPUs in old_cpu_mask intersect with the affinity > >>>>>>> mask passed > >>>>>>> + * in the 'mask' parameter. > >>>>>>> + */ > >>>>>>> + desc->arch.vector = desc->arch.old_vector; > >>>>>>> + cpumask_and(desc->arch.cpu_mask, > >>>>>>> desc->arch.old_cpu_mask, mask); > >>>> > >>>> ... replacing of vector (and associated mask), without any further > >>>> accounting. > >>> > >>> It's quite likely I'm missing something here, but what further > >>> accounting you would like to do? > >>> > >>> The current target of the interrupt (->arch.cpu_mask previous to > >>> cpumask_and()) is all going offline, so any attempt to set it in > >>> ->arch.old_cpu_mask would just result in a stale (offline) CPU getting > >>> set in ->arch.old_cpu_mask, which previous patches attempted to > >>> solve. > >>> > >>> Maybe by "further accounting" you meant something else not related to > >>> ->arch.old_{cpu_mask,vector}? > >> > >> Indeed. What I'm thinking of is what normally release_old_vec() would > >> do (of which only desc->arch.used_vectors updating would appear to be > >> relevant, seeing the CPU's going offline). The other one I was thinking > >> of, updating vector_irq[], likely is also unnecessary, again because > >> that's per-CPU data of a CPU going down. > > > > I think updating vector_irq[] should be explicitly avoided, as doing > > so would prevent us from correctly draining any pending interrupts > > because the vector -> irq mapping would be broken when the interrupt > > enable window at the bottom of fixup_irqs() is reached. > > > > For used_vectors: we might clean it, I'm a bit worried however that at > > some point we insert a check in do_IRQ() path that ensures the > > vector_irq[] is inline with desc->arch.used_vectors, which would fail > > for interrupts drained at the bottom of fixup_irqs(). Let me attempt > > to clean the currently used vector from ->arch.used_vectors. > > Just to clarify: It may well be that for draining the bit can't be cleared > right here. But it then still needs clearing _somewhere_, or else we > chance ending up with inconsistent state (triggering e.g. an assertion > later on) or the leaking of vectors. My problem here was that I also > couldn't locate any such "somewhere", and commentary also didn't point me > anywhere. You are correct, there's no such place where the cleanup would happen. I'm afraid the only option I see to correctly deal with this is to do the cleanup of the old destination in _assign_irq_vector(), and then do the pending interrupt draining from IRR like I had proposed in patch 7/7, thus removing the interrupt enable window at the bottom of fixup_irqs(). Let me know if that seems sensible. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.