[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[xen master] xen/riscv: introduce tracking of pending vCPU interrupts, part 1



commit c537959a5cb0178b0a34333892a8c0b47b69274f
Author:     Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
AuthorDate: Tue Mar 10 09:25:16 2026 +0100
Commit:     Jan Beulich <jbeulich@xxxxxxxx>
CommitDate: Tue Mar 10 09:25:16 2026 +0100

    xen/riscv: introduce tracking of pending vCPU interrupts, part 1
    
    Based on Linux kernel v6.16.0.
    Note that smp_wmb() is used instead of smp_mb__before_atomic() as what
    we want to guarantee that if a bit in irqs_pending_mask is obversable
    that the correspondent bit in irqs_pending is observable too.
    
    Add lockless tracking of pending vCPU interrupts using atomic bitops.
    Two bitmaps are introduced:
     - irqs_pending â?? interrupts currently pending for the vCPU
     - irqs_pending_mask â?? bits that have changed in irqs_pending
    
    The design follows a multi-producer, single-consumer model, where the
    consumer is the vCPU itself. Producers may set bits in
    irqs_pending_mask without a lock. Clearing bits in irqs_pending_mask is
    performed only by the consumer via xchg(). The consumer must not write
    to irqs_pending and must not act on bits that are not set in the mask.
    Otherwise, extra synchronization should be provided.
    
    On RISC-V interrupts are not injected via guest registers, so pending
    interrupts must be recorded in irqs_pending (using the new
    vcpu_{un}set_interrupt() helpers) and flushed to the guest by updating
    HVIP before returning control to the guest. The consumer side is
    implemented in a follow-up patch.
    
    A barrier between updating irqs_pending and setting the corresponding
    mask bit in vcpu_set_interrupt()/vcpu_unset_interrupt() guarantees
    that if the consumer observes a mask bit set, the corresponding pending
    bit is also visible. This prevents missed interrupts during the flush.
    
    It is possible that a guest could have pending bit in the hardware
    register without being marked pending in irq_pending bitmap as:
      According to the RISC-V ISA specification:
        Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and
        interrupt-enable  bits for VS-level software interrupts. VSSIP in hip
        is an alias (writable) of the same bit in hvip.
      Additionally:
        When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only
        zeros. Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and
        hie.VSSIE.
    This means the guest may modify vsip.SSIP, which implicitly updates
    hip.VSSIP and the bit being written with 1 would also trigger an interrupt
    as according to the RISC-V spec:
      These conditions for an interrupt trap to occur must be evaluated in a
      bounded   amount of time from when an interrupt becomes, or ceases to be,
      pending in sip,  and must also be evaluated immediately following the
      execution of an SRET  instruction or an explicit write to a CSR on which
      these interrupt trap conditions expressly depend (including sip, sie and
      sstatus).
    What means that IRQ_VS_SOFT must be synchronized separately, what is done
    in vcpu_sync_interrupts(). Note, also, that IRQ_PMU_OVF would want to be
    synced for the similar reason as IRQ_VS_SOFT, but isn't sync-ed now as
    PMU isn't supported now.
    
    For the remaining VS-level interrupt types (IRQ_VS_TIMER and
    IRQ_VS_EXT), the specification states they cannot be modified by the guest
    and are read-only because of:
      Bits hip.VSEIP and hie.VSEIE are the interrupt-pending and 
interrupt-enable
      bits for VS-level external interrupts. VSEIP is read-only in hip, and is
      the logical-OR of these interrupt sources:
        â?¢ bit VSEIP of hvip;
        â?¢ the bit of hgeip selected by hstatus.VGEIN; and
        â?¢ any other platform-specific external interrupt signal directed to
          VS-level.
      Bits hip.VSTIP and hie.VSTIE are the interrupt-pending and 
interrupt-enable
      bits for VS-level timer interrupts. VSTIP is read-only in hip, and is the
      logical-OR of hvip.VSTIP and any other platform-specific timer interrupt
      signal directed to VS-level.
    and
      When bit 10 of hideleg is zero, vsip.SEIP and vsie.SEIE are read-only 
zeros.
      Else, vsip.SEIP and vsie.SEIE are aliases of hip.VSEIP and hie.VSEIE.
    
      When bit 6 of hideleg is zero, vsip.STIP and vsie.STIE are read-only 
zeros.
      Else, vsip.STIP and vsie.STIE are aliases of hip.VSTIP and hie.VSTIE.
    and also,
      Bits sip.SEIP and sie.SEIE are the interrupt-pending and interrupt-enable
      bits for supervisor-level external interrupts. If implemented, SEIP is
      read-only in sip, and is set and cleared by the execution environment,
      typically through a platform-specific interrupt controller.
    
      Bits sip.STIP and sie.STIE are the interrupt-pending and interrupt-enable
      bits for supervisor-level timer interrupts. If implemented, STIP is
      read-only in sip, and is set and cleared by the execution environment
    Thus, for these interrupt types, it is sufficient to use 
vcpu_set_interrupt()
    and vcpu_unset_interrupt(), and flush them during the call of
    vcpu_flush_interrupts() (which is introduced in follow up patch).
    
    vcpu_sync_interrupts(), which is called just before entering the VM,
    slightly bends the rule that the irqs_pending bit must be written
    first, followed by updating the corresponding bit in irqs_pending_mask.
    However, it still respects the core guarantee that the producer never
    clears the mask and only writes to irqs_pending if it is the one that
    flipped the corresponding mask bit from 0 to 1.
    Moreover, since the consumer won't run concurrently because
    vcpu_sync_interrupts() and the consumer path are going to be invoked
    sequentially immediately before VM entry, it is safe to slightly relax
    this ordering rule in vcpu_sync_interrupts().
    
    Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
    Acked-by: Jan Beulich <jbeulich@xxxxxxxx>
---
 xen/arch/riscv/domain.c             | 63 +++++++++++++++++++++++++++++++++++++
 xen/arch/riscv/include/asm/domain.h | 22 +++++++++++++
 xen/arch/riscv/traps.c              |  4 +++
 3 files changed, 89 insertions(+)

diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 14cc85093b..694a03ab74 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -6,6 +6,7 @@
 #include <xen/sched.h>
 #include <xen/vmap.h>
 
+#include <asm/bitops.h>
 #include <asm/cpufeature.h>
 #include <asm/csr.h>
 #include <asm/riscv_encoding.h>
@@ -168,6 +169,68 @@ void arch_vcpu_destroy(struct vcpu *v)
     vfree((void *)&v->arch.cpu_info[1] - STACK_SIZE);
 }
 
+int vcpu_set_interrupt(struct vcpu *v, unsigned int irq)
+{
+    bool kick_vcpu;
+
+    /* We only allow VS-mode software, timer, and external interrupts */
+    if ( irq != IRQ_VS_SOFT &&
+         irq != IRQ_VS_TIMER &&
+         irq != IRQ_VS_EXT )
+        return -EINVAL;
+
+    kick_vcpu = !test_and_set_bit(irq, v->arch.irqs_pending);
+
+    /*
+     * The counterpart of this barrier is the one encoded implicitly in xchg()
+     * which is used in consumer part (vcpu_flush_interrupts()).
+     */
+    smp_wmb();
+
+    kick_vcpu |= !test_and_set_bit(irq, v->arch.irqs_pending_mask);
+
+    if ( kick_vcpu )
+        vcpu_kick(v);
+
+    return 0;
+}
+
+int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq)
+{
+    /* We only allow VS-mode software, timer, external interrupts */
+    if ( irq != IRQ_VS_SOFT &&
+         irq != IRQ_VS_TIMER &&
+         irq != IRQ_VS_EXT )
+        return -EINVAL;
+
+    clear_bit(irq, v->arch.irqs_pending);
+    /*
+     * The counterpart of this barrier is the one encoded implicitly in xchg()
+     * which is used in consumer part (vcpu_flush_interrupts()).
+     */
+    smp_wmb();
+    set_bit(irq, v->arch.irqs_pending_mask);
+
+    return 0;
+}
+
+void vcpu_sync_interrupts(struct vcpu *curr)
+{
+    unsigned long hvip = csr_read(CSR_HVIP);
+
+    ASSERT(curr == current);
+
+    /* Sync-up HVIP.VSSIP bit changes done by Guest */
+    if ( ((curr->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL)) &&
+         !test_and_set_bit(IRQ_VS_SOFT, &curr->arch.irqs_pending_mask) )
+    {
+        if ( hvip & BIT(IRQ_VS_SOFT, UL) )
+            set_bit(IRQ_VS_SOFT, &curr->arch.irqs_pending);
+        else
+            clear_bit(IRQ_VS_SOFT, &curr->arch.irqs_pending);
+    }
+}
+
 static void __init __maybe_unused build_assertions(void)
 {
     /*
diff --git a/xen/arch/riscv/include/asm/domain.h 
b/xen/arch/riscv/include/asm/domain.h
index 17be792afe..1ecfe18c85 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -54,8 +54,25 @@ struct arch_vcpu {
     register_t hideleg;
     register_t henvcfg;
     register_t hstateen0;
+    register_t hvip;
 
     register_t vsatp;
+
+    /*
+     * VCPU interrupts
+     *
+     * We have a lockless approach for tracking pending VCPU interrupts
+     * implemented using atomic bitops. The irqs_pending bitmap represent
+     * pending interrupts whereas irqs_pending_mask represent bits changed
+     * in irqs_pending. Our approach is modeled around multiple producer
+     * and single consumer problem where the consumer is the VCPU itself.
+     *
+     * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
+     * on RV32 host.
+     */
+#define RISCV_VCPU_NR_IRQS MAX(BITS_PER_LONG, 64)
+    DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
+    DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
 };
 
 struct paging_domain {
@@ -94,6 +111,11 @@ static inline void update_guest_memory_policy(struct vcpu 
*v,
 
 static inline void arch_vcpu_block(struct vcpu *v) {}
 
+int vcpu_set_interrupt(struct vcpu *v, unsigned int irq);
+int vcpu_unset_interrupt(struct vcpu *v, unsigned int irq);
+
+void vcpu_sync_interrupts(struct vcpu *curr);
+
 #endif /* ASM__RISCV__DOMAIN_H */
 
 /*
diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
index 9fca941526..551f886e3a 100644
--- a/xen/arch/riscv/traps.c
+++ b/xen/arch/riscv/traps.c
@@ -171,6 +171,10 @@ static void do_unexpected_trap(const struct cpu_user_regs 
*regs)
 
 static void check_for_pcpu_work(void)
 {
+    struct vcpu *curr = current;
+
+    vcpu_sync_interrupts(curr);
+
     p2m_handle_vmenter();
 }
 
--
generated by git-patchbot for /home/xen/git/xen.git#master



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.