 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()
 On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
> On 16.05.2024 15:22, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
> >  }
> >  __initcall(setup_dump_irqs);
> >  
> > -/* Reset irq affinities to match the given CPU mask. */
> > +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. 
> > */
> >  void fixup_irqs(const cpumask_t *mask, bool verbose)
> >  {
> 
> Evacuating is one purpose. Updating affinity, if need be, is another. I've
> been wondering more than once though whether it is actually correct /
> necessary for ->affinity to be updated by the function. As it stands you
> don't remove the respective code, though.
Yeah, I didn't want to get into updating ->affinity in this patch, so
decided to leave that as-is.
Note however that if we shuffle the interrupt around we should update
->affinity, so that the new destination is part of ->affinity?
Otherwise we could end up with the interrupt assigned to CPU(s) that
are not part of the ->affinity mask.  Maybe that's OK, TBH I'm not
sure I understand the purpose of the ->affinity mask, hence why I've
decided to leave it alone in this patch.
> 
> > @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                  release_old_vec(desc);
> >          }
> >  
> > -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
> > +        /*
> > +         * Avoid shuffling the interrupt around if it's assigned to a CPU 
> > set
> > +         * that's all covered by the requested affinity mask.
> > +         */
> > +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> > +        if ( !desc->action || cpumask_subset(affinity, mask) )
> >          {
> >              spin_unlock(&desc->lock);
> >              continue;
> 
> First my understanding of how the two CPU sets are used: ->affinity is
> merely a representation of where the IRQ is permitted to be handled.
> ->arch.cpu_mask describes all CPUs where the assigned vector is valid
> (and would thus need cleaning up when a new vector is assigned). Neither
> of the two needs to be a strict subset of the other.
Oh, so it's allowed to have the interrupt target a CPU
(->arch.cpu_mask) that's not set in the affinity mask?
> 
> (It's not really clear whether ->arch.cpu_mask is [supposed to be] a
> subset of cpu_online_map.)
Not according to the description in arch_irq_desc:
        /*
         * Except for high priority interrupts @cpu_mask may have bits set for
         * offline CPUs.  Consumers need to be careful to mask this down to
         * online ones as necessary.  There is supposed to always be a non-
         * empty intersection with cpu_online_map.
         */
So ->arch.cpu_mask can (according to the comment) not strictly be a
subset of cpu_online_map.
Note this is only possible when using logical destination mode, so
removing that would turn the destination field into an unsigned int
that would point to a single CPU that must be present in
cpu_online_map.
> If that understanding of mine is correct, going from just ->arch.cpu_mask
> doesn't look quite right to me, as that may include CPUs not in ->affinity.
> As in: Things could be further limited, by also ANDing in ->affinity.
Hm, my expectation would be that ->arch.cpu_mask is a subset of
->affinity, but even if it's not, what we do care in fixup_cpus() is
what CPUs the interrupt targets, as we need to move the interrupt if
the target set is not in the input mask set.  I don't think ->affinity
should be taken into account for that decision, it should be done
based exclusively on which CPU(s) the interrupt target
(->arch.cpu_mask).
> At the same time your(?) and my variant suffer from cpumask_subset()
> possibly returning true despite the CPU the IRQ is presently being
> handled on being the one that we're in the process of bringing down.
No, I'm not sure I see how that can happen.  The CPU we are bringing
down will always be clear from the input CPU mask, and hence
cpumask_subset(->arch.cpu_mask, mask) will only return true if all the
set CPUs in ->arch.cpu_mask are also set in mask.  IOW: none of the
possible target destinations is a CPU to be removed.
> In
> that case we absolutely cannot skip the move. (I'd like to note that
> there are only two possible input values of "mask" for the function. The
> case I think you've been looking into is when it's &cpu_online_map.
Well, it's cpu_online_map which already has the CPU to be offlined
cleared.  See the call to cpumask_clear_cpu() ahead of calling
fixup_irqs().
> In
> which case cpumask_subset() is going to always return true with your
> change in place, if I'm not mistaken. That seems to make your change
> questionable. Yet with that I guess I'm overlooking something.)
I might we wrong, but I think you are missing that the to be offlined
CPU has been removed from cpu_online_map by the time it gets passed
to fixup_irqs().
> Plus there remains the question of whether updating ->affinity can indeed
> be skipped in more cases than it is right now (prior to you patch), or
> whether up front we may want to get rid of that updating (in line, I think,
> with earlier changes we did elsewhere) altogether.
I'm not sure about ->affinity TBH, that's why I've opted to leave it
alone for the time being.  It doesn't seem like a very helpful field
right now.
I would expect it to be mostly static, and populated with the set of
CPUs that are closer to (NUMA-wise) the device the interrupt belongs
to, but I'm not seeing any of this.  It seems to be set arbitrarily to
the CPU that binds the interrupt.
Thanks, Roger.
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |