[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.19] xen/x86: limit interrupt movement done by fixup_irqs()



On Thu, May 16, 2024 at 05:00:54PM +0200, Jan Beulich wrote:
> On 16.05.2024 15:22, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/irq.c
> > +++ b/xen/arch/x86/irq.c
> > @@ -2527,7 +2527,7 @@ static int __init cf_check setup_dump_irqs(void)
> >  }
> >  __initcall(setup_dump_irqs);
> >  
> > -/* Reset irq affinities to match the given CPU mask. */
> > +/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. 
> > */
> >  void fixup_irqs(const cpumask_t *mask, bool verbose)
> >  {
> 
> Evacuating is one purpose. Updating affinity, if need be, is another. I've
> been wondering more than once though whether it is actually correct /
> necessary for ->affinity to be updated by the function. As it stands you
> don't remove the respective code, though.

Yeah, I didn't want to get into updating ->affinity in this patch, so
decided to leave that as-is.

Note however that if we shuffle the interrupt around we should update
->affinity, so that the new destination is part of ->affinity?

Otherwise we could end up with the interrupt assigned to CPU(s) that
are not part of the ->affinity mask.  Maybe that's OK, TBH I'm not
sure I understand the purpose of the ->affinity mask, hence why I've
decided to leave it alone in this patch.

> 
> > @@ -2576,7 +2576,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
> >                  release_old_vec(desc);
> >          }
> >  
> > -        if ( !desc->action || cpumask_subset(desc->affinity, mask) )
> > +        /*
> > +         * Avoid shuffling the interrupt around if it's assigned to a CPU 
> > set
> > +         * that's all covered by the requested affinity mask.
> > +         */
> > +        cpumask_and(affinity, desc->arch.cpu_mask, &cpu_online_map);
> > +        if ( !desc->action || cpumask_subset(affinity, mask) )
> >          {
> >              spin_unlock(&desc->lock);
> >              continue;
> 
> First my understanding of how the two CPU sets are used: ->affinity is
> merely a representation of where the IRQ is permitted to be handled.
> ->arch.cpu_mask describes all CPUs where the assigned vector is valid
> (and would thus need cleaning up when a new vector is assigned). Neither
> of the two needs to be a strict subset of the other.

Oh, so it's allowed to have the interrupt target a CPU
(->arch.cpu_mask) that's not set in the affinity mask?

> 
> (It's not really clear whether ->arch.cpu_mask is [supposed to be] a
> subset of cpu_online_map.)

Not according to the description in arch_irq_desc:

        /*
         * Except for high priority interrupts @cpu_mask may have bits set for
         * offline CPUs.  Consumers need to be careful to mask this down to
         * online ones as necessary.  There is supposed to always be a non-
         * empty intersection with cpu_online_map.
         */

So ->arch.cpu_mask can (according to the comment) not strictly be a
subset of cpu_online_map.

Note this is only possible when using logical destination mode, so
removing that would turn the destination field into an unsigned int
that would point to a single CPU that must be present in
cpu_online_map.

> If that understanding of mine is correct, going from just ->arch.cpu_mask
> doesn't look quite right to me, as that may include CPUs not in ->affinity.
> As in: Things could be further limited, by also ANDing in ->affinity.

Hm, my expectation would be that ->arch.cpu_mask is a subset of
->affinity, but even if it's not, what we do care in fixup_cpus() is
what CPUs the interrupt targets, as we need to move the interrupt if
the target set is not in the input mask set.  I don't think ->affinity
should be taken into account for that decision, it should be done
based exclusively on which CPU(s) the interrupt target
(->arch.cpu_mask).

> At the same time your(?) and my variant suffer from cpumask_subset()
> possibly returning true despite the CPU the IRQ is presently being
> handled on being the one that we're in the process of bringing down.

No, I'm not sure I see how that can happen.  The CPU we are bringing
down will always be clear from the input CPU mask, and hence
cpumask_subset(->arch.cpu_mask, mask) will only return true if all the
set CPUs in ->arch.cpu_mask are also set in mask.  IOW: none of the
possible target destinations is a CPU to be removed.

> In
> that case we absolutely cannot skip the move. (I'd like to note that
> there are only two possible input values of "mask" for the function. The
> case I think you've been looking into is when it's &cpu_online_map.

Well, it's cpu_online_map which already has the CPU to be offlined
cleared.  See the call to cpumask_clear_cpu() ahead of calling
fixup_irqs().

> In
> which case cpumask_subset() is going to always return true with your
> change in place, if I'm not mistaken. That seems to make your change
> questionable. Yet with that I guess I'm overlooking something.)

I might we wrong, but I think you are missing that the to be offlined
CPU has been removed from cpu_online_map by the time it gets passed
to fixup_irqs().

> Plus there remains the question of whether updating ->affinity can indeed
> be skipped in more cases than it is right now (prior to you patch), or
> whether up front we may want to get rid of that updating (in line, I think,
> with earlier changes we did elsewhere) altogether.

I'm not sure about ->affinity TBH, that's why I've opted to leave it
alone for the time being.  It doesn't seem like a very helpful field
right now.

I would expect it to be mostly static, and populated with the set of
CPUs that are closer to (NUMA-wise) the device the interrupt belongs
to, but I'm not seeing any of this.  It seems to be set arbitrarily to
the CPU that binds the interrupt.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.