Xen project Mailing List

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 16 May 2024 17:56:38 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Thu, 16 May 2024 15:56:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote: > On 16.05.2024 15:22, Roger Pau Monne wrote: > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void) > > } > > __initcall(setup_dump_irqs); > > > > -/* Reset irq affinities to match the given CPU mask. */ > > +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. > > */ > > void fixup_irqs(const cpumask_t *mask, bool verbose) > > { > > Evacuating is one purpose. Updating affinity, if need be, is another. I've > been wondering more than once though whether it is actually correct / > necessary for ->affinity to be updated by the function. As it stands you > don't remove the respective code, though. Yeah, I didn't want to get into updating ->affinity in this patch, so decided to leave that as-is. Note however that if we shuffle the interrupt around we should update ->affinity, so that the new destination is part of ->affinity? Otherwise we could end up with the interrupt assigned to CPU(s) that are not part of the ->affinity mask. Maybe that's OK, TBH I'm not sure I understand the purpose of the ->affinity mask, hence why I've decided to leave it alone in this patch. > > > @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose) > > release_old_vec(desc); > > } > > > > - if ( !desc->action || cpumask_subset(desc->affinity, mask) ) > > + /* > > + * Avoid shuffling the interrupt around if it's assigned to a CPU > > set > > + * that's all covered by the requested affinity mask. > > + */ > > + cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map); > > + if ( !desc->action || cpumask_subset(affinity, mask) ) > > { > > spin_unlock(&desc->lock); > > continue; > > First my understanding of how the two CPU sets are used: ->affinity is > merely a representation of where the IRQ is permitted to be handled. > ->arch.cpu_mask describes all CPUs where the assigned vector is valid > (and would thus need cleaning up when a new vector is assigned). Neither > of the two needs to be a strict subset of the other. Oh, so it's allowed to have the interrupt target a CPU (->arch.cpu_mask) that's not set in the affinity mask? > > (It's not really clear whether ->arch.cpu_mask is [supposed to be] a > subset of cpu_online_map.) Not according to the description in arch_irq_desc: /* * Except for high priority interrupts @cpu_mask may have bits set for * offline CPUs. Consumers need to be careful to mask this down to * online ones as necessary. There is supposed to always be a non- * empty intersection with cpu_online_map. */ So ->arch.cpu_mask can (according to the comment) not strictly be a subset of cpu_online_map. Note this is only possible when using logical destination mode, so removing that would turn the destination field into an unsigned int that would point to a single CPU that must be present in cpu_online_map. > If that understanding of mine is correct, going from just ->arch.cpu_mask > doesn't look quite right to me, as that may include CPUs not in ->affinity. > As in: Things could be further limited, by also ANDing in ->affinity. Hm, my expectation would be that ->arch.cpu_mask is a subset of ->affinity, but even if it's not, what we do care in fixup_cpus() is what CPUs the interrupt targets, as we need to move the interrupt if the target set is not in the input mask set. I don't think ->affinity should be taken into account for that decision, it should be done based exclusively on which CPU(s) the interrupt target (->arch.cpu_mask). > At the same time your(?) and my variant suffer from cpumask_subset() > possibly returning true despite the CPU the IRQ is presently being > handled on being the one that we're in the process of bringing down. No, I'm not sure I see how that can happen. The CPU we are bringing down will always be clear from the input CPU mask, and hence cpumask_subset(->arch.cpu_mask, mask) will only return true if all the set CPUs in ->arch.cpu_mask are also set in mask. IOW: none of the possible target destinations is a CPU to be removed. > In > that case we absolutely cannot skip the move. (I'd like to note that > there are only two possible input values of "mask" for the function. The > case I think you've been looking into is when it's &cpu_online_map. Well, it's cpu_online_map which already has the CPU to be offlined cleared. See the call to cpumask_clear_cpu() ahead of calling fixup_irqs(). > In > which case cpumask_subset() is going to always return true with your > change in place, if I'm not mistaken. That seems to make your change > questionable. Yet with that I guess I'm overlooking something.) I might we wrong, but I think you are missing that the to be offlined CPU has been removed from cpu_online_map by the time it gets passed to fixup_irqs(). > Plus there remains the question of whether updating ->affinity can indeed > be skipped in more cases than it is right now (prior to you patch), or > whether up front we may want to get rid of that updating (in line, I think, > with earlier changes we did elsewhere) altogether. I'm not sure about ->affinity TBH, that's why I've opted to leave it alone for the time being. It doesn't seem like a very helpful field right now. I would expect it to be mostly static, and populated with the set of CPUs that are closer to (NUMA-wise) the device the interrupt belongs to, but I'm not seeing any of this. It seems to be set arbitrarily to the CPU that binds the interrupt. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.