[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 21/26] xen/riscv: implement virtual APLIC MMIO emulation





On 6/15/26 5:13 PM, Jan Beulich wrote:
On 08.05.2026 16:43, Oleksii Kurochko wrote:
Guests running under Xen program interrupt routing by writing to APLIC
MMIO registers. Xen must intercept these accesses to enforce interrupt
isolation between domains and to translate guest routing intent into the
underlying physical MSI topology.

Writes are gated by the domain's authorised interrupt bitmap so that a
guest cannot affect interrupts it does not own. TARGET register writes
additionally require translation of the hart and IMSIC guest-file
indices from virtual to physical, as the APLIC uses these fields
directly to compute the MSI delivery address.

Delegation (APLIC_SOURCECFG_D) is not yet supported.

Co-developed-by: Romain Caritey <Romain.Caritey@xxxxxxxxxxxxx>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
---
Changes in v2:
  - Merge the following patches into one:
     xen/riscv: add vaplic access check:
       - Add check that address is properly aligned.
       - Check vaplic range intead of APLIC one.
       - Return bool from vaplic_is_access instead of int.
     xen/riscv: emulate guest writes to virtual APLIC MMIO
       - Drop CALC_REG_VALUE.
       - Use unsigned int instead of uin32_t for offset.
       - s/.../subtracting in the comment.
       - start one line comments from the upper case.
       - Check the value before being written to sourcecfg register.
       - 'unsigned int' for loop index.
       - Omit unneessary braces.
       - s/vaplic_update_target/aplic_msi_target_gen.
       - Use IMSIC_MMIO_PAGE_SHIFT instead of 12 in aplic_msi_target_gen().
       - Drop explicit usage of APLIC register in store function.
       - Drop APLIC_REG_{GET,SET} macros and introudce APLIC specific funtcions.
       - Ignore write to SOURCECFG_BASE when value is out-of-range.
       - Drop ASSERT(!target_vcpu) inside handler of targer register setting,
         just avoid such writings + debug message.
       - domain_crash() instead of panic() in the case of default case.
       - Drop ASSERT() in APLIC_SOURCE_CFG_BASE case and use domain_crash()
         instead.
     xen/riscv: emulate guest reads from virtual APLIC MMIO:
       - s/regval_to_irqn/regindx_to_irqn.
       - pass to to_vaplic() a domain instead of vintc.
       - add check that load access is aligned.
       - instead of panic() just crash a domain().
       - use 'unsigned int' for local variable offset.
       - Return 0 in the case APLIC_CLRIE_BASE ...APLIC_CLRIE_LAST reading to
         follow AIA spec.
       - Drop explicit usage of physical APLIC registers.
---
  xen/arch/riscv/aplic.c              |  25 +++
  xen/arch/riscv/include/asm/aplic.h  |   9 +
  xen/arch/riscv/include/asm/intc.h   |  10 +-
  xen/arch/riscv/include/asm/vaplic.h |   3 +
  xen/arch/riscv/vaplic.c             | 289 +++++++++++++++++++++++++++-
  5 files changed, 333 insertions(+), 3 deletions(-)

diff --git a/xen/arch/riscv/aplic.c b/xen/arch/riscv/aplic.c
index 1c8fd0145eb2..1976733dfbaa 100644
--- a/xen/arch/riscv/aplic.c
+++ b/xen/arch/riscv/aplic.c
@@ -40,6 +40,31 @@ static struct intc_info __ro_after_init aplic_info = {
      .hw_version = INTC_APLIC,
  };
+uint32_t aplic_hw_read_reg(unsigned int offset, uint32_t mask)
+{
+    unsigned long flags;
+    uint32_t val;
+
+    ASSERT(offset < aplic.size);

Further assert suitable alignment of "offset"?

Sure, I will update ASSERT(...) to:
  ASSERT((offset < aplic.size) && IS_ALIGNED(offset, sizeof(uint32_t)));


+    spin_lock_irqsave(&aplic.lock, flags);
+    val = readl((void __iomem *)((uintptr_t)aplic.regs + offset)) & mask;

Easier as

     val = readl((volatile void __iomem *)aplic.regs + offset) & mask;

? (Note that like const, volatile also shouldn't be cast away.)

Is arithmetic on void * pointers correct from the C standard's point of view?

It works with GCC (see https://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html), but I can't find anything that guarantees the same behavior for other compilers.

I'm okay with the suggested change if it's acceptable for Xen to rely on GCC's void * pointer arithmetic extension.


--- a/xen/arch/riscv/include/asm/aplic.h
+++ b/xen/arch/riscv/include/asm/aplic.h
@@ -15,6 +15,8 @@
#include <asm/imsic.h> +#define APLIC_NUM_REGS 32

What's this? It isn't used afaics, and it also doesn't match up with ...

It is rudement and isn't used anymore. It should be just dropped.


@@ -24,6 +26,8 @@
  #define APLIC_DOMAINCFG_IE      BIT(8, U)
  #define APLIC_DOMAINCFG_DM      BIT(2, U)
+#define APLIC_SOURCECFG_D BIT(10, U)
+
  #define APLIC_SOURCECFG_SM_INACTIVE     0x0
  #define APLIC_SOURCECFG_SM_DETACH       0x1
  #define APLIC_SOURCECFG_SM_EDGE_RISE    0x4
@@ -71,6 +75,8 @@
  #define APLIC_SIZE(nr_cpus)     (APLIC_MIN_SIZE + \
                                   APLIC_SIZE_ALIGN(APLIC_IDC_SIZE * (nr_cpus)))
+#define APLIC_SETCLR_OFFSET_MASK ((32 * sizeof(uint32_t)) - 1)
+
  struct aplic_regs {
      uint32_t domaincfg;         /* 0x0000 */
      uint32_t sourcecfg[1023];   /* 0x0004 */
@@ -114,4 +120,7 @@ struct aplic_regs {
      uint32_t target[1023];      /* 0x3008 */
  };

... this struct holding all the (far more than 32) registers.

As to APLIC_SETCLR_OFFSET_MASK: Any reason it has the low 2 bits set?

Agree, with the current defintion of AIA spec. low 2 bits are always 0 so APLIC_SETCLR_OFFSET_MASK should be updated correspondingly ...


The
literal 32 in there also looks rather arbitrary. It would be helpful if
there was a connection to those 32-s in struct aplic_regs that it actually
matches up with.

t could be updated to make the relationship clearer:

#define APLIC_SETCLR_OFFSET_MASK (sizeof_field(struct aplic_regs, setip) - 1)

Using setip is fine here, as all SET* and CLR* register groups consist of 32 registers and therefore have identical sizes. Adding a brief comment above APLIC_SETCLR_OFFSET_MASK to explain this could improve readability:

/*
* Using setip is fine here, as all SET* and CLR* register groups consist of 32
 * registers and therefore have identical sizes.
 */
#define APLIC_SETCLR_OFFSET_MASK (sizeof_field(struct aplic_regs, setip) - 1)

Also, BUILD_BUG_ON() could be added to aplic_preinit() to verify what is mentioned in the comment if it makes sense.

...:

#define APLIC_SETCLR_OFFSET_MASK
    (sizeof_field(struct aplic_regs, setip) - sizeof(uint32_t))


--- a/xen/arch/riscv/include/asm/vaplic.h
+++ b/xen/arch/riscv/include/asm/vaplic.h
@@ -26,6 +26,9 @@ struct vaplic_regs {
  struct vaplic {
      struct vintc vintc;
      struct vaplic_regs regs;
+
+    paddr_t regs_start;
+    paddr_t regs_size;

Can regs_size really go beyond 4G?

Good question and it depends on an amount of vCPUs:

#define APLIC_MIN_SIZE          0x4000
#define APLIC_SIZE_ALIGN(x)     ROUNDUP(x, APLIC_MIN_SIZE)

#define APLIC_SIZE(nr_cpus)     (APLIC_MIN_SIZE + \
APLIC_SIZE_ALIGN(APLIC_IDC_SIZE * (nr_cpus)))

paddr_t aplic_size = APLIC_SIZE(d->max_vcpus);

With the current limitation of 128 vCPUs max (IIRC) it won't beyond 4G.


--- a/xen/arch/riscv/vaplic.c
+++ b/xen/arch/riscv/vaplic.c
@@ -26,6 +26,283 @@
#define FDT_VAPLIC_INT_CELLS 2 +#define AUTH_IRQ_BIT(d, irqn) ( \
+    ((irqn) <= (d)->arch.vintc->irq_nums) && \
+    test_bit(irqn, (d)->arch.vintc->allocated_irqs) )
+
+#define regindx_to_irqn(reg_val) ((reg_val) / sizeof(uint32_t))
+
+static inline uint32_t generate_auth_mask(const struct domain *d,
+                                          unsigned int irqsn)
+{
+    if ( irqsn >= DIV_ROUND_UP(d->arch.vintc->irq_nums,
+                               sizeof(uint32_t) * BITS_PER_BYTE) )

Why the rounding up? Isn't ->irqs_num the proper upper bound?

Probably irqsn isn't a correct name here as it looks like "IRQ source number" (an IRQ number), but regindx_to_irqn(offset & MASK) actually computes offset / 4 - a word index into the used_irqs bitmap. Word 0 covers IRQs 0–31, word 1 covers IRQs 32–63, etc.
Given that, the bounds check:
  irqsn >= DIV_ROUND_UP(irq_nums, sizeof(uint32_t) * BITS_PER_BYTE)
is "word index >= number of 32-bit words in the bitmap" which is correct. The DIV_ROUND_UP converts the IRQ count into a word count to compare against the word index.

So the real issue is naming. There are two options to resolve it:
- rename to reflect reality; call it word_idx (not irqsn), and rename regindx_to_irqn to something like regoffset_to_word_idx. The DIV_ROUND_UP check then reads clearly.

- Change the API to take an actual IRQ number, pass irqsn * 32 (the first IRQ in the word) and check irqsn >= irq_nums directly, computing the word index inside generate_auth_mask. This aligns with how AUTH_IRQ_BIT works.

Which one option do you prefer?


+    {
+        dprintk(XENLOG_DEBUG, "incorrect irqsn(%d) is passed\n", irqsn);

Once again: %u please with an unsigned int argument.

+        return 0U;
+    }
+
+    return *((uint32_t *)d->arch.vintc->allocated_irqs + irqsn);

Such casts would better be avoided

It was just a convenient way to find necessary word.

Would it be better to have the following:
return (uint32_t)(d->arch.vintc->used_irqs[(irqsn * 32) / BITS_PER_LONG] >> ((irqsn * 32) % BITS_PER_LONG));



+static int vaplic_emulate_load(const struct vcpu *vcpu,

v please for struct vcpu * variables (in the common case, of course there can
be exceptions).

+                               const unsigned long addr, uint32_t *out)
+{
+    const struct domain *d = vcpu->domain;
+    const struct vaplic *vaplic = to_vaplic(d);
+    const unsigned int offset = addr & APLIC_REG_OFFSET_MASK;
+    uint32_t auth_mask;
+    unsigned int i;
+
+    switch ( offset )
+    {
+    case APLIC_DOMAINCFG:
+        *out = vaplic->regs.domaincfg;
+
+        return 0;
+
+    case APLIC_SETIPNUM:
+    case APLIC_SETIPNUM_LE:
+    case APLIC_CLRIPNUM:
+    case APLIC_SETIENUM:
+    case APLIC_CLRIENUM:
+    case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:

For ranges like this APLIC_REG_OFFSET_MASK having the low two bits clear
(or there being some other mechanism to ensure only properly aligned
offsets are handled) would help. Mis-aligned accesses shouldn't be
handled ...

+        /*
+         * Based on the RISC-V AIA spec a read of these registers
+         * always returns zero
+         */
+        *out = 0;
+
+        return 0;

... like this.

There is such check in vaplic_is_access() below. I will add a comment above vaplic_emulate_load() that offset is properly aligned because of the check in vaplic_is_access(). The similar comment I'll add for vaplic_emulate_store().


+    case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
+    case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
+    case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
+        i = regindx_to_irqn(offset & APLIC_SETCLR_OFFSET_MASK);
+        auth_mask = generate_auth_mask(d, i);
+
+        break;
+
+    case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
+        /*
+         * As target registers start for 1:

s/for/from/ ?

I'll apply that.


+         *  0x3000 genmsi
+         *  0x3004 target[1]
+         *  0x3008 target[2]
+         *   ...
+         *  0x3FFC target[1023]
+         * It is necessary to calculate an interrupt number by substracting
+         * of APLIC_GENMSI instead of APLIC_TARGET_BASE.

Stray "of"? Or did you mean "subtraction"? (Also check other similar comments.)

Agree, 'of' should be dropped here.


+    default:
+        gdprintk(XENLOG_WARNING, "Unhandled APLIC read at offset %#x\n",
+                 offset);
+
+        domain_crash(vcpu->domain);
+
+        return -EINVAL;
+    }
+
+    *out = aplic_hw_read_reg(offset, auth_mask);

You blindly assume a 32-bit access here (and also in the write counterpart).
How do you end up knowing?

he APLIC spec requires all register accesses to be 32-bit wide.

Also, I have the following at the caller side (yes, it can't be understand from the current patch):

    /* Fault address should be aligned to length of MMIO */
    if ( fault_addr & (len - 1) )
        return -EIO;

    if ( vintc->ops->is_access(vcpu, fault_addr) )
    {
        /* PLIC/APLIC access are always on 32bit */
        ASSERT( len == 4 );
        rc = vintc->ops->emulate_store(vcpu, fault_addr, data32);
        if ( rc < 0 )
            return rc;
    }

Probably it would be better addi a size parameter to both callbacks:

int (*emulate_load)(const struct vcpu *vcpu, unsigned long addr,
                    unsigned int size, uint32_t *out);
int (*emulate_store)(const struct vcpu *vcpu, unsigned long addr,
                     unsigned int size, uint32_t in);
Then vaplic_emulate_load can validate upfront:


if ( size != sizeof(uint32_t) )
{
    gdprintk(XENLOG_WARNING, "APLIC: unsupported access size %u\n", size);
    domain_crash(vcpu->domain);
    return -EINVAL;
}

or maybe it would be better instead of domain_crash() just ignore such loads:

if ( size != sizeof(uint32_t) )
{
    gdprintk(XENLOG_WARNING,
             "APLIC: unsupported %u-byte access at offset %#x, ignoring\n",
             size, offset);
    *out = 0;   /* for loads */
    return 0;
}

I want to add size argument as I think that I want to introduce common MMIO handling instead of is_access() to avoid growing of if/else in the snippet above


+static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
+                                         unsigned long addr, uint32_t value)

Why cf_check here but not for vaplic_emulate_load()?

It should be added for both.


+{
+    int rc = -EINVAL;
+    const struct domain *d = vcpu->domain;
+    unsigned int offset = addr & APLIC_REG_OFFSET_MASK;
+
+    switch ( offset )
+    {
+    case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
+    case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
+    case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
+    case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:
+    {
+        unsigned int irqn = regindx_to_irqn(offset & APLIC_SETCLR_OFFSET_MASK);
+        value &= generate_auth_mask(d, irqn);
+
+        break;
+    }
+
+    case APLIC_SOURCECFG_BASE ... APLIC_SOURCECFG_LAST:
+        if ( value & APLIC_SOURCECFG_D )
+        {
+            rc = -EOPNOTSUPP;
+
+            dprintk(XENLOG_ERR, "APLIC_SOURCECFG_D isn't supported\n");
+
+            goto fail;
+        }
+
+        /*
+         * As sourcecfg register starts from 1:
+         *   0x0000 domaincfg
+         *   0x0004 sourcecfg[1]
+         *   0x0008 sourcecfg[2]
+         *    ...
+         *   0x0FFC sourcecfg[1023]
+         * It is necessary to calculate an interrupt number by subtracting
+         * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
+         */
+        if ( !AUTH_IRQ_BIT(d, regindx_to_irqn(offset - APLIC_DOMAINCFG)) )
+            /* Interrupt not enabled, ignore it */
+            return 0;
+
+        if ( value > APLIC_SOURCECFG_SM_LEVEL_LOW )
+        {
+            gdprintk(XENLOG_ERR,
+                     "value(%u) is incorrect for sourcecfg register\n", value);
+
+            return 0;
+        }
+
+        break;
+
+    case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
+    {
+        struct vcpu *target_vcpu = NULL;
+
+        /*
+         * Look at vaplic_emulate_load() for explanation why
+         * APLIC_GENMSI is subtracted.
+         */
+        if ( !AUTH_IRQ_BIT(d, regindx_to_irqn(offset - APLIC_GENMSI)) )
+            /* Interrupt not enabled, ignore it */
+            return 0;
+
+        for ( unsigned int i = 0; i < vcpu->domain->max_vcpus; i++ )
+        {
+            struct vcpu *v = vcpu->domain->vcpu[i];
+
+            if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
+            {
+                target_vcpu = v;
+                break;
+            }
+        }

Why is a loop needed here? vCPU-s are numbered sequentially.

Agree, it isn't needed. The following will be enough:

        unsigned int hart_idx = value >> APLIC_TARGET_HART_IDX_SHIFT;

        if ( hart_idx < vcpu->domain->max_vcpus )
            target_vcpu = vcpu->domain->vcpu[hart_idx];


+static bool cf_check vaplic_is_access(const struct vcpu *vcpu,
+                                      unsigned long addr)
+{
+    const struct vaplic *vaplic = to_vaplic(vcpu->domain);
+    paddr_t start = vaplic->regs_start;
+    paddr_t end = vaplic->regs_start + vaplic->regs_size;
+
+    if ( addr & 0x3 )

Nit: Does the 0x here add any value?

Agree, there is no any sense.


+    {
+        dprintk(XENLOG_DEBUG,
+                "APLIC MMIO address should be properly aligned\n");
+
+        return false;
+    }

Ah, okay - here is the alignment check. Mind me asking for a (documenting)
assertion then in the actual read and write handlers?

Sure, I will add the comment above or add a correspondent ASSERT().


+    /* check if it is an APLIC access */

Nit: Style.

I will update that.

Thanks.

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.