Xen project Mailing List

Re: [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 16 Oct 2025 18:27:09 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=szVWMCLcw5g4qjwk4j0p7wYe2rI/mlOBQkDrIOIRDE0=; b=V18ArW7HMfpl8uNXbgwYuNFtmigKCvkE7SzvJ3f/luRsBpdB1C6nZt40tdFQMPKCyxkbMKGN3eeZuNPYFSjlx2jHfXktubs/91j7j9aUSF5UIeknxiXedYbMdbDBu5OC3CrqxBkrT+UvaGz1Pi+rZqYE4ILP7uINgKqZt9x7Hs4r4yYDDOu/sLGYhmUHayVlY94GeqPdxcRQjXeTZLnQDG9Q/SvxhGGMpQn8ITi/5dwOeCO0TGghIOHzh5HnOojSwql7PzBJsOvUUkf1KHViHZBWW0Hp0ReIC1Th4yMD3cYqYTIB4TEX/s3WZ8chslc2YXCEKUe14IdczF2ikfb5Tw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=YAN+my2Ardh193b7m2l50cRekeJm6Ep/9FcJnrayM4vJq2Vjw/yhHA2a5XPoM+5ZvKGn14A6SRzk5pefXUiFV+TipLNbFMREQ2KFuitXCqmq0dSSODkK97AwN5McDF1qkaNeuzKTJ4Tj5ISFIGqs8gG4s4tJVqK43DpPEOVc5YRKeli3dGagN96YYOCj9rqvJO+r0r4PtrC8iwijbhrmTW+KkgzaE1LCblnhzFX7+b2wnNdDre4RguB1XD5N1VJLrCRqLIkH2/SP7KQEwRIFvVCj8mti8fkd2xFWM7d2SDJ0HZgQzXEgWJ4CKpgTfUq+Sx0PpUUBRAiS3XcZ9fzJyg==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>

Delivery-date: Thu, 16 Oct 2025 16:27:27 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Oct 16, 2025 at 09:32:04AM +0200, Jan Beulich wrote: > Using dynamically allocated / maintained vectors has several downsides: > - possible nesting of IRQs due to the effects of IRQ migration, > - reduction of vectors available for devices, > - IRQs not moving as intended if there's shortage of vectors, > - higher runtime overhead. > > As the vector also doesn't need to be of any priority (first and foremost > it really shouldn't be of higher or same priority as the timer IRQ, as > that raises TIMER_SOFTIRQ anyway), avoid any "ordinary" vectors altogther > and use a vector from the 0x10...0x1f exception vector space. Exception vs > interrupt can easily be distinguished by checking for the presence of an > error code. > > Fixes: 996576b965cc ("xen: allow up to 16383 cpus") > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > --- > This is an alternative proposal to > https://lists.xen.org/archives/html/xen-devel/2014-03/msg00399.html. > > The Fixes: tag indicates where the problem got signficantly worse; in > principle it was there already before (crashing at perhaps 6 or 7 levels > of nested IRQs). > > --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c > @@ -9,17 +9,19 @@ > #include <xen/timer.h> > #include <xen/smp.h> > #include <xen/softirq.h> > +#include <xen/cpuidle.h> > #include <xen/irq.h> > #include <xen/numa.h> > #include <xen/param.h> > #include <xen/sched.h> > > #include <asm/apic.h> > -#include <asm/fixmap.h> > #include <asm/div64.h> > +#include <asm/fixmap.h> > +#include <asm/genapic.h> > #include <asm/hpet.h> > +#include <asm/irq-vectors.h> > #include <asm/msi.h> > -#include <xen/cpuidle.h> > > #define MAX_DELTA_NS MILLISECS(10*1000) > #define MIN_DELTA_NS MICROSECS(20) > @@ -307,15 +309,13 @@ static void cf_check hpet_msi_set_affini > struct hpet_event_channel *ch = desc->action->dev_id; > struct msi_msg msg = ch->msi.msg; > > - msg.dest32 = set_desc_affinity(desc, mask); > - if ( msg.dest32 == BAD_APICID ) > - return; > + /* This really is only for dump_irqs(). */ > + cpumask_copy(desc->arch.cpu_mask, mask); If you no longer call set_desc_affinity(), could you adjust the second parameter of hpet_msi_set_affinity() to be unsigned int cpu instead of a cpumask? And here just clear desc->arch.cpu_mask and set the passed CPU. > > - msg.data &= ~MSI_DATA_VECTOR_MASK; > - msg.data |= MSI_DATA_VECTOR(desc->arch.vector); > + msg.dest32 = cpu_mask_to_apicid(mask); And here you can just use cpu_physical_id(). > msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; > msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32); > - if ( msg.data != ch->msi.msg.data || msg.dest32 != ch->msi.msg.dest32 ) > + if ( msg.dest32 != ch->msi.msg.dest32 ) > hpet_msi_write(ch, &msg); A further note here, which ties to my comment on the previous patch about loosing the interrupt during the masked window. If the vector is the same across all CPUs, we no longer need to update the MSI data field, just the address one, which can be done atomically. We also have signaling from the IOMMU whether the MSI fields need writing. We can avoid the masking, and the possible drop of interrupts. > } > > @@ -328,7 +328,7 @@ static hw_irq_controller hpet_msi_type = > .shutdown = hpet_msi_shutdown, > .enable = hpet_msi_unmask, > .disable = hpet_msi_mask, > - .ack = ack_nonmaskable_msi_irq, > + .ack = irq_actor_none, > .end = end_nonmaskable_irq, > .set_affinity = hpet_msi_set_affinity, > }; > @@ -347,6 +347,12 @@ static int __init hpet_setup_msi_irq(str > u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx)); > irq_desc_t *desc = irq_to_desc(ch->msi.irq); > > + clear_irq_vector(ch->msi.irq); > + ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, > &cpu_online_map); > + if ( ret ) > + return ret; > + cpumask_setall(desc->affinity); > + > if ( iommu_intremap != iommu_intremap_off ) > { > ch->msi.hpet_id = hpet_blockid; > @@ -457,7 +463,7 @@ static struct hpet_event_channel *hpet_g > /* > * Try the least recently used channel first. It may still have its > IRQ's > * affinity set to the desired CPU. This way we also limit having > multiple > - * of our IRQs raised on the same CPU, in possibly a nested manner. > + * of our IRQs raised on the same CPU. > */ > ch = per_cpu(lru_channel, cpu); > if ( ch && !test_and_set_bit(HPET_EVT_USED_BIT, &ch->flags) ) > @@ -497,6 +503,7 @@ static void set_channel_irq_affinity(str > spin_lock(&desc->lock); > hpet_msi_mask(desc); > hpet_msi_set_affinity(desc, cpumask_of(ch->cpu)); > + per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq; I would set the vector table ahead of setting the affinity, in case we can drop the mask calls around this block of code. I also wonder, do you really need the bind_irq_vector() if you manually set the affinity afterwards, and the vector table plus desc->arch.cpu_mask are also set here? > hpet_msi_unmask(desc); > spin_unlock(&desc->lock); > > --- a/xen/arch/x86/include/asm/irq-vectors.h > +++ b/xen/arch/x86/include/asm/irq-vectors.h > @@ -18,6 +18,15 @@ > /* IRQ0 (timer) is statically allocated but must be high priority. */ > #define IRQ0_VECTOR 0xf0 > > +/* > + * Low-priority (for now statically allocated) vectors, sharing entry > + * points with exceptions in the 0x10 ... 0x1f range, as long as the > + * respective exception has an error code. > + */ > +#define FIRST_LOPRIORITY_VECTOR 0x10 > +#define HPET_BROADCAST_VECTOR X86_EXC_AC > +#define LAST_LOPRIORITY_VECTOR 0x1f I wonder if it won't be clearer to simply reserve a vector if the HPET is used, instead of hijacking the AC one. It's one vector less, but arguably now that we unconditionally use physical destination mode our pool of vectors has expanded considerably. > + > /* Legacy PIC uses vectors 0x20-0x2f. */ > #define FIRST_LEGACY_VECTOR FIRST_DYNAMIC_VECTOR > #define LAST_LEGACY_VECTOR (FIRST_LEGACY_VECTOR + 0xf) > @@ -40,7 +49,7 @@ > /* There's no IRQ2 at the PIC. */ > #define IRQ_MOVE_CLEANUP_VECTOR (FIRST_LEGACY_VECTOR + 2) > > -#define FIRST_IRQ_VECTOR FIRST_DYNAMIC_VECTOR > +#define FIRST_IRQ_VECTOR FIRST_LOPRIORITY_VECTOR > #define LAST_IRQ_VECTOR LAST_HIPRIORITY_VECTOR > > #endif /* _ASM_IRQ_VECTORS_H */ > --- a/xen/arch/x86/irq.c > +++ b/xen/arch/x86/irq.c > @@ -755,8 +755,9 @@ void setup_vector_irq(unsigned int cpu) > if ( !irq_desc_initialized(desc) ) > continue; > vector = irq_to_vector(irq); > - if ( vector >= FIRST_HIPRIORITY_VECTOR && > - vector <= LAST_HIPRIORITY_VECTOR ) > + if ( vector <= (vector >= FIRST_HIPRIORITY_VECTOR > + ? LAST_HIPRIORITY_VECTOR > + : LAST_LOPRIORITY_VECTOR) ) > cpumask_set_cpu(cpu, desc->arch.cpu_mask); I think this is wrong. The low priority vector used by the HPET will only target a single CPU at a time, and hence adding extra CPUs to that mask as part of AP bringup is not correct. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.