[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Thu, 16 Oct 2025 18:27:09 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=szVWMCLcw5g4qjwk4j0p7wYe2rI/mlOBQkDrIOIRDE0=; b=V18ArW7HMfpl8uNXbgwYuNFtmigKCvkE7SzvJ3f/luRsBpdB1C6nZt40tdFQMPKCyxkbMKGN3eeZuNPYFSjlx2jHfXktubs/91j7j9aUSF5UIeknxiXedYbMdbDBu5OC3CrqxBkrT+UvaGz1Pi+rZqYE4ILP7uINgKqZt9x7Hs4r4yYDDOu/sLGYhmUHayVlY94GeqPdxcRQjXeTZLnQDG9Q/SvxhGGMpQn8ITi/5dwOeCO0TGghIOHzh5HnOojSwql7PzBJsOvUUkf1KHViHZBWW0Hp0ReIC1Th4yMD3cYqYTIB4TEX/s3WZ8chslc2YXCEKUe14IdczF2ikfb5Tw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=YAN+my2Ardh193b7m2l50cRekeJm6Ep/9FcJnrayM4vJq2Vjw/yhHA2a5XPoM+5ZvKGn14A6SRzk5pefXUiFV+TipLNbFMREQ2KFuitXCqmq0dSSODkK97AwN5McDF1qkaNeuzKTJ4Tj5ISFIGqs8gG4s4tJVqK43DpPEOVc5YRKeli3dGagN96YYOCj9rqvJO+r0r4PtrC8iwijbhrmTW+KkgzaE1LCblnhzFX7+b2wnNdDre4RguB1XD5N1VJLrCRqLIkH2/SP7KQEwRIFvVCj8mti8fkd2xFWM7d2SDJ0HZgQzXEgWJ4CKpgTfUq+Sx0PpUUBRAiS3XcZ9fzJyg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • Delivery-date: Thu, 16 Oct 2025 16:27:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 16, 2025 at 09:32:04AM +0200, Jan Beulich wrote:
> Using dynamically allocated / maintained vectors has several downsides:
> - possible nesting of IRQs due to the effects of IRQ migration,
> - reduction of vectors available for devices,
> - IRQs not moving as intended if there's shortage of vectors,
> - higher runtime overhead.
> 
> As the vector also doesn't need to be of any priority (first and foremost
> it really shouldn't be of higher or same priority as the timer IRQ, as
> that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther
> and use a vector from the 0x10...0x1f exception vector space. Exception vs
> interrupt can easily be distinguished by checking for the presence of an
> error code.
> 
> Fixes: 996576b965cc ("xen: allow up to 16383 cpus")
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> This is an alternative proposal to
> https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html.
> 
> The Fixes: tag indicates where the problem got signficantly worse; in
> principle it was there already before (crashing at perhaps 6 or 7 levels
> of nested IRQs).
> 
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -9,17 +9,19 @@
>  #include <xen/timer.h>
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
> +#include <xen/cpuidle.h>
>  #include <xen/irq.h>
>  #include <xen/numa.h>
>  #include <xen/param.h>
>  #include <xen/sched.h>
>  
>  #include <asm/apic.h>
> -#include <asm/fixmap.h>
>  #include <asm/div64.h>
> +#include <asm/fixmap.h>
> +#include <asm/genapic.h>
>  #include <asm/hpet.h>
> +#include <asm/irq-vectors.h>
>  #include <asm/msi.h>
> -#include <xen/cpuidle.h>
>  
>  #define MAX_DELTA_NS MILLISECS(10*1000)
>  #define MIN_DELTA_NS MICROSECS(20)
> @@ -307,15 +309,13 @@ static void cf_check hpet_msi_set_affini
>      struct hpet_event_channel *ch = desc->action->dev_id;
>      struct msi_msg msg = ch->msi.msg;
>  
> -    msg.dest32 = set_desc_affinity(desc, mask);
> -    if ( msg.dest32 == BAD_APICID )
> -        return;
> +    /* This really is only for dump_irqs(). */
> +    cpumask_copy(desc->arch.cpu_mask, mask);

If you no longer call set_desc_affinity(), could you adjust the second
parameter of hpet_msi_set_affinity() to be unsigned int cpu instead of
a cpumask?

And here just clear desc->arch.cpu_mask and set the passed CPU.

>  
> -    msg.data &= ~MSI_DATA_VECTOR_MASK;
> -    msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
> +    msg.dest32 = cpu_mask_to_apicid(mask);

And here you can just use cpu_physical_id().

>      msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
>      msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> -    if ( msg.data != ch->msi.msg.data || msg.dest32 != ch->msi.msg.dest32 )
> +    if ( msg.dest32 != ch->msi.msg.dest32 )
>          hpet_msi_write(ch, &msg);

A further note here, which ties to my comment on the previous patch
about loosing the interrupt during the masked window.  If the vector
is the same across all CPUs, we no longer need to update the MSI data
field, just the address one, which can be done atomically.  We also
have signaling from the IOMMU whether the MSI fields need writing.

We can avoid the masking, and the possible drop of interrupts.

>  }
>  
> @@ -328,7 +328,7 @@ static hw_irq_controller hpet_msi_type =
>      .shutdown   = hpet_msi_shutdown,
>      .enable      = hpet_msi_unmask,
>      .disable    = hpet_msi_mask,
> -    .ack        = ack_nonmaskable_msi_irq,
> +    .ack        = irq_actor_none,
>      .end        = end_nonmaskable_irq,
>      .set_affinity   = hpet_msi_set_affinity,
>  };
> @@ -347,6 +347,12 @@ static int __init hpet_setup_msi_irq(str
>      u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
>      irq_desc_t *desc = irq_to_desc(ch->msi.irq);
>  
> +    clear_irq_vector(ch->msi.irq);
> +    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, 
> &cpu_online_map);
> +    if ( ret )
> +        return ret;
> +    cpumask_setall(desc->affinity);
> +
>      if ( iommu_intremap != iommu_intremap_off )
>      {
>          ch->msi.hpet_id = hpet_blockid;
> @@ -457,7 +463,7 @@ static struct hpet_event_channel *hpet_g
>      /*
>       * Try the least recently used channel first.  It may still have its 
> IRQ's
>       * affinity set to the desired CPU.  This way we also limit having 
> multiple
> -     * of our IRQs raised on the same CPU, in possibly a nested manner.
> +     * of our IRQs raised on the same CPU.
>       */
>      ch = per_cpu(lru_channel, cpu);
>      if ( ch && !test_and_set_bit(HPET_EVT_USED_BIT, &ch->flags) )
> @@ -497,6 +503,7 @@ static void set_channel_irq_affinity(str
>      spin_lock(&desc->lock);
>      hpet_msi_mask(desc);
>      hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;

I would set the vector table ahead of setting the affinity, in case we
can drop the mask calls around this block of code.

I also wonder, do you really need the bind_irq_vector() if you
manually set the affinity afterwards, and the vector table plus
desc->arch.cpu_mask are also set here?

>      hpet_msi_unmask(desc);
>      spin_unlock(&desc->lock);
>  
> --- a/xen/arch/x86/include/asm/irq-vectors.h
> +++ b/xen/arch/x86/include/asm/irq-vectors.h
> @@ -18,6 +18,15 @@
>  /* IRQ0 (timer) is statically allocated but must be high priority. */
>  #define IRQ0_VECTOR             0xf0
>  
> +/*
> + * Low-priority (for now statically allocated) vectors, sharing entry
> + * points with exceptions in the 0x10 ... 0x1f range, as long as the
> + * respective exception has an error code.
> + */
> +#define FIRST_LOPRIORITY_VECTOR 0x10
> +#define HPET_BROADCAST_VECTOR   X86_EXC_AC
> +#define LAST_LOPRIORITY_VECTOR  0x1f

I wonder if it won't be clearer to simply reserve a vector if the HPET
is used, instead of hijacking the AC one.  It's one vector less, but
arguably now that we unconditionally use physical destination mode our
pool of vectors has expanded considerably.

> +
>  /* Legacy PIC uses vectors 0x20-0x2f. */
>  #define FIRST_LEGACY_VECTOR     FIRST_DYNAMIC_VECTOR
>  #define LAST_LEGACY_VECTOR      (FIRST_LEGACY_VECTOR + 0xf)
> @@ -40,7 +49,7 @@
>  /* There's no IRQ2 at the PIC. */
>  #define IRQ_MOVE_CLEANUP_VECTOR (FIRST_LEGACY_VECTOR + 2)
>  
> -#define FIRST_IRQ_VECTOR        FIRST_DYNAMIC_VECTOR
> +#define FIRST_IRQ_VECTOR        FIRST_LOPRIORITY_VECTOR
>  #define LAST_IRQ_VECTOR         LAST_HIPRIORITY_VECTOR
>  
>  #endif /* _ASM_IRQ_VECTORS_H */
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -755,8 +755,9 @@ void setup_vector_irq(unsigned int cpu)
>          if ( !irq_desc_initialized(desc) )
>              continue;
>          vector = irq_to_vector(irq);
> -        if ( vector >= FIRST_HIPRIORITY_VECTOR &&
> -             vector <= LAST_HIPRIORITY_VECTOR )
> +        if ( vector <= (vector >= FIRST_HIPRIORITY_VECTOR
> +                        ? LAST_HIPRIORITY_VECTOR
> +                        : LAST_LOPRIORITY_VECTOR) )
>              cpumask_set_cpu(cpu, desc->arch.cpu_mask);

I think this is wrong.  The low priority vector used by the HPET will
only target a single CPU at a time, and hence adding extra CPUs to
that mask as part of AP bringup is not correct.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.