|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
On 14.01.2026 16:39, Oleksii Kurochko wrote:
> On 1/13/26 2:54 PM, Jan Beulich wrote:
>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>>>> register_t vstval;
>>>>> register_t vsatp;
>>>>> register_t vsepc;
>>>>> +
>>>>> + /*
>>>>> + * VCPU interrupts
>>>>> + *
>>>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>>>> + * pending interrupts whereas irqs_pending_mask represent bits
>>>>> changed
>>>>> + * in irqs_pending.
>>>> And hence a set immediately followed by an unset is then indistinguishable
>>>> from just an unset (or the other way around).
>>> I think it is distinguishable with the combination of irqs_pending_mask.
>> No. The set mask bit tells you that there was a change. But irqs_pending[]
>> records only the most recent set / clear.
>>
>>>> This may not be a problem, but
>>>> if it isn't, I think this needs explaining. Much like it is unclear why the
>>>> "changed" state needs tracking in the first place.
>>> It is needed to track which bits are changed, irqs_pending only represents
>>> the current state of pending interrupts.CPU might want to react to changes
>>> rather than the absolute state.
>>>
>>> Example:
>>> - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
>>> to inject it into the VCPU.
>>> - If CPU 0 sets and then clears the bit before CPU 1 reads it,
>>> irqs_pending alone shows 0, the transition is lost.
>> The fact there was any number of transitions is recorded in _mask[], yes,
>> but "the transition" was still lost if we consider the "set" in your
>> example in isolation. And it's not quite clear to me what's interesting
>> about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
>> one followed directly by a 1 -> 0 one, would result in a "spurious
>> interrupt": There would be an indication that there was a lost interrupt
>> without there being a way to know which one it was.)
>
> IIUC, in this reply you are talking about when the contents written to the
> irq_pending and irqs_pending_mask bitmaps are flushed to the hardware
> registers.
>
> Originally, I understood your question to be about the case where
> vcpu_set_interrupt() is called and then vcpu_unset_interrupt() is called.
I was actually asking in more abstract terms. And I was assuming there
would be pretty direct ways for the guest to have vcpu_{,un}set_interrupt()
invoked. Looks like ...
> I am trying to understand whether such a scenario is possible.
>
> Let’s take the vtimer as an example. vcpu_set_interrupt(t->v, IRQ_VS_TIMER)
> is not called again until vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) and
> set_timer() are called in vtimer_set_timer().
>
> The opposite situation is not possible: it cannot happen that
> vcpu_set_interrupt(t->v, IRQ_VS_TIMER) is called and then immediately
> vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) is called, because
> vcpu_unset_interrupt() and set_timer() are only invoked when the guest
> has handled the timer interrupt and requested a new one.
>
> So if no interrupt flush is happening, the vcpu_set_interrupt() →
> vcpu_unset_interrupt() sequence will only update the irq_pending and
> irqs_pending_mask bitmaps, without touching the hardware registers,
> so no spurious interrupt will occur. And if an interrupt flush does
> happen, it is not possible to have a 1 -> 0 transition due to the call
> sequence I mentioned in the last two paragraphs above.
... that wasn't a correct assumption. (Partly attributed to the patch
series leaving out a number of relevant things, which makes it hard to
guess what else is left out.)
>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>> recently,” even if the final state is 0.
>>>
>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>> {
>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>
>>> *hvip &= ~mask;
>>> *hvip |= val;
>>> }
>>> Without it I assume that we should have spinlcok around access to
>>> irqs_pending.
>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>> me:
>>
>> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>
>> wouldn't require a lock either
>
> Because vCPU's hvip (which is stored on the stack) can't be changed
> concurrently
> and it's almost the one place in the code where vCPU->hvip is changed. Another
> place it is save_csrs() during context switch but it can't be called in
> parallel
> with the vcpu_sync_interrupts() (look below).
>
>> . What may be confusing me is that you put
>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>> to only occur due to software actions (End Of Interrupt), unless - see
>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>> 0 transitions) could be (guest) writes to SVIP, for example.
>>
>> Talking of which - do you really mean HVIP in the code you provided, not
>> VSVIP? So far I my understanding was that HVIP would be recording the
>> interrupts the hypervisor itself has pending (and needs to service).
>
> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
> intended for VS-mode. And I think you confused HVIP with the HIP register
> which supplements the standard supervisor-level SIP register to indicate
> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
> interrupts.
>
> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
> to SVIP, for example." then the correspondent HVIP (and HIP as usually
> they are aliasis of HVIP) bits will be updated. And that is why we need
> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
> +void vcpu_sync_interrupts(struct vcpu *v)
> +{
> + unsigned long hvip;
> +
> + /* Read current HVIP and VSIE CSRs */
> + v->arch.vsie = csr_read(CSR_VSIE);
> +
> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
> + hvip = csr_read(CSR_HVIP);
> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
> + {
> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
> + {
> + if ( !test_and_set_bit(IRQ_VS_SOFT,
> + &v->arch.irqs_pending_mask) )
> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> + }
> + else
> + {
> + if ( !test_and_set_bit(IRQ_VS_SOFT,
> + &v->arch.irqs_pending_mask) )
> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> + }
> + }
I fear I don't understand this at all. Why would the guest having set a
pending bit not result in the IRQ to be marked pending? You can't know
whether that guest write happened before or after you last touched
.irqs_pending{,mask}[]? Yet that pair of bit arrays is supposed to be
tracking the most recent update (according to how I understood earlier
explanations of yours).
As an aside - the !test_and_set_bit() can be pulled out, to the outermost
if().
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |