|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 03/14] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
On 20.02.2026 17:18, Oleksii Kurochko wrote:
> Based on Linux kernel v6.16.0.
> Note that smp_wmb() is used instead of smp_mb__before_atomic() as what
> we want to guarantee that if a bit in irqs_pending_mask is obversable
> that the correspondent bit in irqs_pending is observable too.
>
> Add lockless tracking of pending vCPU interrupts using atomic bitops.
> Two bitmaps are introduced:
> - irqs_pending — interrupts currently pending for the vCPU
> - irqs_pending_mask — bits that have changed in irqs_pending
>
> The design follows a multi-producer, single-consumer model, where the
> consumer is the vCPU itself. Producers may set bits in
> irqs_pending_mask without a lock. Clearing bits in irqs_pending_mask is
> performed only by the consumer via xchg(). The consumer must not write
> to irqs_pending and must not act on bits that are not set in the mask.
> Otherwise, extra synchronization should be provided.
>
> On RISC-V interrupts are not injected via guest registers, so pending
> interrupts must be recorded in irqs_pending (using the new
> vcpu_{un}set_interrupt() helpers) and flushed to the guest by updating
> HVIP before returning control to the guest. The consumer side is
> implemented in a follow-up patch.
>
> A barrier between updating irqs_pending and setting the corresponding
> mask bit in vcpu_set_interrupt()/vcpu_unset_interrupt() guarantees
> that if the consumer observes a mask bit set, the corresponding pending
> bit is also visible. This prevents missed interrupts during the flush.
>
> It is possible that a guest could have pending bit in the hardware
> register without being marked pending in irq_pending bitmap as:
> According to the RISC-V ISA specification:
> Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and
> interrupt-enable bits for VS-level software interrupts. VSSIP in hip
> is an alias (writable) of the same bit in hvip.
> Additionally:
> When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only
> zeros. Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and
> hie.VSSIE.
> This means the guest may modify vsip.SSIP, which implicitly updates
> hip.VSSIP and the bit being written with 1 would also trigger an interrupt
> as according to the RISC-V spec:
> These conditions for an interrupt trap to occur must be evaluated in a
> bounded amount of time from when an interrupt becomes, or ceases to be,
> pending in sip, and must also be evaluated immediately following the
> execution of an SRET instruction or an explicit write to a CSR on which
> these interrupt trap conditions expressly depend (including sip, sie and
> sstatus).
> What means that IRQ_VS_SOFT must be synchronized separately, what is done
> in vcpu_sync_interrupts(). Note, also, that IRQ_PMU_OVF would want to be
> synced for the similar reason as IRQ_VS_SOFT, but isn't sync-ed now as
> PMU isn't supported now.
>
> For the remaining VS-level interrupt types (IRQ_VS_TIMER and
> IRQ_VS_EXT), the specification states they cannot be modified by the guest
> and are read-only because of:
> Bits hip.VSEIP and hie.VSEIE are the interrupt-pending and interrupt-enable
> bits for VS-level external interrupts. VSEIP is read-only in hip, and is
> the logical-OR of these interrupt sources:
> • bit VSEIP of hvip;
> • the bit of hgeip selected by hstatus.VGEIN; and
> • any other platform-specific external interrupt signal directed to
> VS-level.
> Bits hip.VSTIP and hie.VSTIE are the interrupt-pending and interrupt-enable
> bits for VS-level timer interrupts. VSTIP is read-only in hip, and is the
> logical-OR of hvip.VSTIP and any other platform-specific timer interrupt
> signal directed to VS-level.
> and
> When bit 10 of hideleg is zero, vsip.SEIP and vsie.SEIE are read-only zeros.
> Else, vsip.SEIP and vsie.SEIE are aliases of hip.VSEIP and hie.VSEIE.
>
> When bit 6 of hideleg is zero, vsip.STIP and vsie.STIE are read-only zeros.
> Else, vsip.STIP and vsie.STIE are aliases of hip.VSTIP and hie.VSTIE.
> and also,
> Bits sip.SEIP and sie.SEIE are the interrupt-pending and interrupt-enable
> bits for supervisor-level external interrupts. If implemented, SEIP is
> read-only in sip, and is set and cleared by the execution environment,
> typically through a platform-specific interrupt controller.
>
> Bits sip.STIP and sie.STIE are the interrupt-pending and interrupt-enable
> bits for supervisor-level timer interrupts. If implemented, STIP is
> read-only in sip, and is set and cleared by the execution environment
> Thus, for these interrupt types, it is sufficient to use vcpu_set_interrupt()
> and vcpu_unset_interrupt(), and flush them during the call of
> vcpu_flush_interrupts() (which is introduced in follow up patch).
>
> vcpu_sync_interrupts(), which is called just before entering the VM,
> slightly bends the rule that the irqs_pending bit must be written
> first, followed by updating the corresponding bit in irqs_pending_mask.
> However, it still respects the core guarantee that the producer never
> clears the mask and only writes to irqs_pending if it is the one that
> flipped the corresponding mask bit from 0 to 1.
> Moreover, since the consumer won't run concurrently because
> vcpu_sync_interrupts() and the consumer path are going to be invoked
> sequentially immediately before VM entry, it is safe to slightly relax
> this ordering rule in vcpu_sync_interrupts().
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
> ---
> Changes in v5:
> - Update the commit message().
> - Rename c to curr.
> - Update vcpu_set_interrupt() to use test_and_set_bit() for irqs_pending
> bitmask too.
> - Move #ifdef CONFIG_RISCV_32 above the comment in vcpu_sync_interrupts().
> ---
> Changes in v4:
> - Update the commit message.
> - Update the comments in vcpu_(un)set_interrupt() and add the the comment
> above smp_wmb() barrier.
> - call vcpu_kick() only if the pending_mask bit going from 0 to 1.
> - Code style fixes.
> - Update defintion of RISCV_VCPU_NR_IRQS to cover potential RV128 case and
> the case if AIA isn't used.
> - latch current into a local variable in check_for_pcpu_work().
> ---
> Changes in v3:
> - Use smp_wb() instead of smp_mb__before_atomic().
> - Add explanation of the change above in the commit message.
> - Move vcpu_sync_interrupts() here to producers side.
> - Introduce check_for_pcpu_work() to be clear from where
> vcpu_sync_interrupts()
> is called.
> ---
> Changes in V2:
> - Move the patch before an introduction of vtimer.
> - Drop bitmap_zero() of irqs_pending and irqs_pending_mask bitmaps as
> vcpu structure starts out all zeros.
> - Drop const for irq argument of vcpu_{un}set_interrupt().
> - Drop check "irq < IRQ_LOCAL_MAX" in vcpu_{un}set_interrupt() as it
> could lead to overrun of irqs_pending and irqs_pending_mask bitmaps.
> - Drop IRQ_LOCAL_MAX as there is no usage for it now.
> ---
> xen/arch/riscv/domain.c | 75 +++++++++++++++++++++++++++++
> xen/arch/riscv/include/asm/domain.h | 23 +++++++++
> xen/arch/riscv/traps.c | 4 ++
> 3 files changed, 102 insertions(+)
>
> diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
> index c9494db0fbe7..335c41f3e5a1 100644
> --- a/xen/arch/riscv/domain.c
> +++ b/xen/arch/riscv/domain.c
> @@ -6,6 +6,7 @@
> #include <xen/sched.h>
> #include <xen/vmap.h>
>
> +#include <asm/bitops.h>
> #include <asm/cpufeature.h>
> #include <asm/csr.h>
> #include <asm/riscv_encoding.h>
> @@ -140,6 +141,80 @@ void arch_vcpu_destroy(struct vcpu *v)
> vfree((void *)&v->arch.cpu_info[1] - STACK_SIZE);
> }
>
> +int vcpu_set_interrupt(struct vcpu *v, unsigned int irq)
> +{
> + bool kick_vcpu;
> +
> + /* We only allow VS-mode software, timer, and external interrupts */
> + if ( irq != IRQ_VS_SOFT &&
> + irq != IRQ_VS_TIMER &&
> + irq != IRQ_VS_EXT )
> + return -EINVAL;
> +
> + kick_vcpu = !test_and_set_bit(irq, v->arch.irqs_pending);
> +
> + /*
> + * The counterpart of this barrier is the one encoded implicitly in
> xchg()
> + * which is used in consumer part (vcpu_flush_interrupts()).
> + */
> + smp_wmb();
> +
> + kick_vcpu |= !test_and_set_bit(irq, v->arch.irqs_pending_mask);
> +
> + if ( kick_vcpu )
> + vcpu_kick(v);
> +
> + return 0;
> +}
> +
> +int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq)
> +{
> + /* We only allow VS-mode software, timer, external interrupts */
> + if ( irq != IRQ_VS_SOFT &&
> + irq != IRQ_VS_TIMER &&
> + irq != IRQ_VS_EXT )
> + return -EINVAL;
> +
> + clear_bit(irq, v->arch.irqs_pending);
> + /*
> + * The counterpart of this barrier is the one encoded implicitly in
> xchg()
> + * which is used in consumer part (vcpu_flush_interrupts()).
> + */
> + smp_wmb();
> + set_bit(irq, v->arch.irqs_pending_mask);
> +
> + return 0;
> +}
> +
> +void vcpu_sync_interrupts(struct vcpu *v)
> +{
> + unsigned long hvip;
> +
> + /* Read current HVIP and VSIE CSRs */
> + v->arch.vsie = csr_read(CSR_VSIE);
What is this needed for, btw? There's no consumer of the field, and the register
can change right after re-entering the guest.
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -54,8 +54,26 @@ struct arch_vcpu {
> register_t hideleg;
> register_t henvcfg;
> register_t hstateen0;
> + register_t hvip;
>
> register_t vsatp;
> + register_t vsie;
Somewhat dependent upon the question above, might hvip and vsie also better
sit close together?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |