[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1


  • To: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 14 Jan 2026 16:56:02 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Alistair Francis <alistair.francis@xxxxxxx>, Connor Davis <connojdavis@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 14 Jan 2026 15:56:07 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.01.2026 16:39, Oleksii Kurochko wrote:
> On 1/13/26 2:54 PM, Jan Beulich wrote:
>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>>>>        register_t vstval;
>>>>>        register_t vsatp;
>>>>>        register_t vsepc;
>>>>> +
>>>>> +    /*
>>>>> +     * VCPU interrupts
>>>>> +     *
>>>>> +     * We have a lockless approach for tracking pending VCPU interrupts
>>>>> +     * implemented using atomic bitops. The irqs_pending bitmap represent
>>>>> +     * pending interrupts whereas irqs_pending_mask represent bits 
>>>>> changed
>>>>> +     * in irqs_pending.
>>>> And hence a set immediately followed by an unset is then indistinguishable
>>>> from just an unset (or the other way around).
>>> I think it is distinguishable with the combination of irqs_pending_mask.
>> No. The set mask bit tells you that there was a change. But irqs_pending[]
>> records only the most recent set / clear.
>>
>>>>    This may not be a problem, but
>>>> if it isn't, I think this needs explaining. Much like it is unclear why the
>>>> "changed" state needs tracking in the first place.
>>> It is needed to track which bits are changed, irqs_pending only represents
>>> the current state of pending interrupts.CPU might want to react to changes
>>> rather than the absolute state.
>>>
>>> Example:
>>>    - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
>>>      to inject it into the VCPU.
>>>    - If CPU 0 sets and then clears the bit before CPU 1 reads it,
>>>      irqs_pending alone shows 0, the transition is lost.
>> The fact there was any number of transitions is recorded in _mask[], yes,
>> but "the transition" was still lost if we consider the "set" in your
>> example in isolation. And it's not quite clear to me what's interesting
>> about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
>> one followed directly by a 1 -> 0 one, would result in a "spurious
>> interrupt": There would be an indication that there was a lost interrupt
>> without there being a way to know which one it was.)
> 
> IIUC, in this reply you are talking about when the contents written to the
> irq_pending and irqs_pending_mask bitmaps are flushed to the hardware
> registers.
> 
> Originally, I understood your question to be about the case where
> vcpu_set_interrupt() is called and then vcpu_unset_interrupt() is called.

I was actually asking in more abstract terms. And I was assuming there
would be pretty direct ways for the guest to have vcpu_{,un}set_interrupt()
invoked. Looks like ...

> I am trying to understand whether such a scenario is possible.
> 
> Let’s take the vtimer as an example. vcpu_set_interrupt(t->v, IRQ_VS_TIMER)
> is not called again until vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) and
> set_timer() are called in vtimer_set_timer().
> 
> The opposite situation is not possible: it cannot happen that
> vcpu_set_interrupt(t->v, IRQ_VS_TIMER) is called and then immediately
> vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) is called, because
> vcpu_unset_interrupt() and set_timer() are only invoked when the guest
> has handled the timer interrupt and requested a new one.
> 
> So if no interrupt flush is happening, the vcpu_set_interrupt() →
> vcpu_unset_interrupt() sequence will only update the irq_pending and
> irqs_pending_mask bitmaps, without touching the hardware registers,
> so no spurious interrupt will occur. And if an interrupt flush does
> happen, it is not possible to have a 1 -> 0 transition due to the call
> sequence I mentioned in the last two paragraphs above.

... that wasn't a correct assumption. (Partly attributed to the patch
series leaving out a number of relevant things, which makes it hard to
guess what else is left out.)

>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>> recently,” even if the final state is 0.
>>>
>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>> {
>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>
>>> *hvip &= ~mask;
>>> *hvip |= val;
>>> }
>>> Without it I assume that we should have spinlcok around access to 
>>> irqs_pending.
>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>> me:
>>
>>      *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>
>> wouldn't require a lock either
> 
> Because vCPU's hvip (which is stored on the stack) can't be changed 
> concurrently
> and it's almost the one place in the code where vCPU->hvip is changed. Another
> place it is save_csrs() during context switch but it can't be called in 
> parallel
> with the vcpu_sync_interrupts() (look below).
> 
>> . What may be confusing me is that you put
>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>> to only occur due to software actions (End Of Interrupt), unless - see
>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>> 0 transitions) could be (guest) writes to SVIP, for example.
>>
>> Talking of which - do you really mean HVIP in the code you provided, not
>> VSVIP? So far I my understanding was that HVIP would be recording the
>> interrupts the hypervisor itself has pending (and needs to service).
> 
> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
> intended for VS-mode. And I think you confused HVIP with the HIP register
> which supplements the standard supervisor-level SIP register to indicate
> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
> interrupts.
> 
> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
> to SVIP, for example." then the correspondent HVIP (and HIP as usually
> they are aliasis of HVIP) bits will be updated. And that is why we need
> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
> +void vcpu_sync_interrupts(struct vcpu *v)
> +{
> +    unsigned long hvip;
> +
> +    /* Read current HVIP and VSIE CSRs */
> +    v->arch.vsie = csr_read(CSR_VSIE);
> +
> +    /* Sync-up HVIP.VSSIP bit changes does by Guest */
> +    hvip = csr_read(CSR_HVIP);
> +    if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
> +    {
> +        if ( hvip & BIT(IRQ_VS_SOFT, UL) )
> +        {
> +            if ( !test_and_set_bit(IRQ_VS_SOFT,
> +                                   &v->arch.irqs_pending_mask) )
> +                set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> +        }
> +        else
> +        {
> +            if ( !test_and_set_bit(IRQ_VS_SOFT,
> +                                   &v->arch.irqs_pending_mask) )
> +                clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> +        }
> +    }

I fear I don't understand this at all. Why would the guest having set a
pending bit not result in the IRQ to be marked pending? You can't know
whether that guest write happened before or after you last touched
.irqs_pending{,mask}[]? Yet that pair of bit arrays is supposed to be
tracking the most recent update (according to how I understood earlier
explanations of yours).

As an aside - the !test_and_set_bit() can be pulled out, to the outermost
if().

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.