[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Fri, 17 Oct 2025 09:15:08 +0200
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • Delivery-date: Fri, 17 Oct 2025 07:15:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 16.10.2025 18:27, Roger Pau Monné wrote:
> On Thu, Oct 16, 2025 at 09:32:04AM +0200, Jan Beulich wrote:
>> @@ -307,15 +309,13 @@ static void cf_check hpet_msi_set_affini
>>      struct hpet_event_channel *ch = desc->action->dev_id;
>>      struct msi_msg msg = ch->msi.msg;
>>  
>> -    msg.dest32 = set_desc_affinity(desc, mask);
>> -    if ( msg.dest32 == BAD_APICID )
>> -        return;
>> +    /* This really is only for dump_irqs(). */
>> +    cpumask_copy(desc->arch.cpu_mask, mask);
> 
> If you no longer call set_desc_affinity(), could you adjust the second
> parameter of hpet_msi_set_affinity() to be unsigned int cpu instead of
> a cpumask?

Looks like I could, yes. But then we need to split the function, as it's
also used as the .set_affinity hook.

> And here just clear desc->arch.cpu_mask and set the passed CPU.

Which would still better be a cpumask_copy(), just given cpumask_of(cpu)
as input.

>> -    msg.data &= ~MSI_DATA_VECTOR_MASK;
>> -    msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
>> +    msg.dest32 = cpu_mask_to_apicid(mask);
> 
> And here you can just use cpu_physical_id().

Right. All of which (up to here; but see below) perhaps better a separate,
follow-on cleanup change.

>>      msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
>>      msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
>> -    if ( msg.data != ch->msi.msg.data || msg.dest32 != ch->msi.msg.dest32 )
>> +    if ( msg.dest32 != ch->msi.msg.dest32 )
>>          hpet_msi_write(ch, &msg);
> 
> A further note here, which ties to my comment on the previous patch
> about loosing the interrupt during the masked window.  If the vector
> is the same across all CPUs, we no longer need to update the MSI data
> field, just the address one, which can be done atomically.  We also
> have signaling from the IOMMU whether the MSI fields need writing.

Hmm, yes, we can leverage that, as long as we're willing to make assumptions
here about what exactly iommu_update_ire_from_msi() does: We'd then rely on
not only the original (untranslated) msg->data not changing, but also the
translated one. That looks to hold for both Intel and AMD, but it's still
something we want to be sure we actually want to make the code dependent
upon. (I'm intending to at least add an assertion to that effect.)

> We can avoid the masking, and the possible drop of interrupts.

Hmm, right. There's nothing wrong with the caller relying on the write
being atomic now. (Really, continuing to use hpet_msi_write() wouldn't
be a problem, as re-writing the low half of HPET_Tn_ROUTE() with the
same value is going to be benign. Unless of course that write was the
source of the extra IRQs I'm seeing.)

Taking together with what you said further up, having
set_channel_irq_affinity() no longer use hpet_msi_set_affinity() as it
is to ...

>> @@ -328,7 +328,7 @@ static hw_irq_controller hpet_msi_type =
>>      .shutdown   = hpet_msi_shutdown,
>>      .enable     = hpet_msi_unmask,
>>      .disable    = hpet_msi_mask,
>> -    .ack        = ack_nonmaskable_msi_irq,
>> +    .ack        = irq_actor_none,
>>      .end        = end_nonmaskable_irq,
>>      .set_affinity   = hpet_msi_set_affinity,

... satisfy the use here would then probably be desirable right away.
The little bit that's left of hpet_msi_set_affinity() would then be
open-coded in set_channel_irq_affinity().

Getting rid of the masking would (hopefully) also get rid of the stray
IRQs that I'm observing, assuming my guessing towards the reason there
is correct.

>> @@ -497,6 +503,7 @@ static void set_channel_irq_affinity(str
>>      spin_lock(&desc->lock);
>>      hpet_msi_mask(desc);
>>      hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
>> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> 
> I would set the vector table ahead of setting the affinity, in case we
> can drop the mask calls around this block of code.

Isn't there a problematic window either way round? I can make the change,
but I don't see that addressing anything. The new comparator value will
be written later anyway, and interrupts up to that point aren't of any
interest anyway. I.e. it doesn't matter which of the CPUs gets to handle
them.

> I also wonder, do you really need the bind_irq_vector() if you
> manually set the affinity afterwards, and the vector table plus
> desc->arch.cpu_mask are also set here?

At the very least I'd then also need to open-code the setting of
desc->arch.vector and desc->arch.used. Possibly also the setting of the
bit in desc->arch.used_vectors. And strictly speaking also the
trace_irq_mask() invocation.

>> --- a/xen/arch/x86/include/asm/irq-vectors.h
>> +++ b/xen/arch/x86/include/asm/irq-vectors.h
>> @@ -18,6 +18,15 @@
>>  /* IRQ0 (timer) is statically allocated but must be high priority. */
>>  #define IRQ0_VECTOR             0xf0
>>  
>> +/*
>> + * Low-priority (for now statically allocated) vectors, sharing entry
>> + * points with exceptions in the 0x10 ... 0x1f range, as long as the
>> + * respective exception has an error code.
>> + */
>> +#define FIRST_LOPRIORITY_VECTOR 0x10
>> +#define HPET_BROADCAST_VECTOR   X86_EXC_AC
>> +#define LAST_LOPRIORITY_VECTOR  0x1f
> 
> I wonder if it won't be clearer to simply reserve a vector if the HPET
> is used, instead of hijacking the AC one.  It's one vector less, but
> arguably now that we unconditionally use physical destination mode our
> pool of vectors has expanded considerably.

Well, I'd really like to avoid consuming an otherwise usable vector, if
at all possible (as per Andrew's FRED plans, that won't be possible
there anymore then).

>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -755,8 +755,9 @@ void setup_vector_irq(unsigned int cpu)
>>          if ( !irq_desc_initialized(desc) )
>>              continue;
>>          vector = irq_to_vector(irq);
>> -        if ( vector >= FIRST_HIPRIORITY_VECTOR &&
>> -             vector <= LAST_HIPRIORITY_VECTOR )
>> +        if ( vector <= (vector >= FIRST_HIPRIORITY_VECTOR
>> +                        ? LAST_HIPRIORITY_VECTOR
>> +                        : LAST_LOPRIORITY_VECTOR) )
>>              cpumask_set_cpu(cpu, desc->arch.cpu_mask);
> 
> I think this is wrong.  The low priority vector used by the HPET will
> only target a single CPU at a time, and hence adding extra CPUs to
> that mask as part of AP bringup is not correct.

I'm not sure about "wrong". It's not strictly necessary for the HPET one,
I expect, but it's generally what would be necessary. For the HPET one,
hpet_msi_set_affinity() replaces the value anyway. (I can add a sentence
to this effect to the description, if that helps.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.