[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v4 8/9] hvm/ioreq: Negotiate extended destination ID support per ioreq server



Add a per-server capability flag in XEN_DMOP_create_ioreq_server to
signal extended destination ID support. Repurpose the first byte of the
existing pad[3] as a flags field, and define
XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID (bit 0) for a server to signal it will
use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings.

Track the flag in struct ioreq_server ext_dest_id.
hvm_ext_dest_id_enabled() returns true only if all registered ioreq
servers have opted in and at least one server is present. A single
server without the flag is sufficient to suppress the feature.

Lock the feature at domain creation time:
arch_domain_creation_finished() computes the levelled result into struct
hvm_domain.ext_dest_id using OR to preserve any value previously
restored from an HVM save record. After creation_finished,
arch_ioreq_server_create_check() rejects new servers that lack
XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID if the feature was already advertised
to the guest.

Persist the locked state in a new HVM_SAVE_TYPE(EXT_DEST_ID) record so
that migration preserves the guest-visible CPUID bit independently of
when the device model re-registers its ioreq servers on the destination
host.

On restore, ioapic_check() uses d->arch.hvm.ext_dest_id (restored from
the EXT_DEST_ID record) rather than the per-server dynamic check, since
the DM has not yet re-registered its servers at that point.

Update xendevicemodel_create_ioreq_server() in libxendevicemodel to
accept the new flags parameter, remove
xendevicemodel_enable_ext_dest_id(), and fix the
xc_hvm_create_ioreq_server() compat wrapper to pass zero flags.

Signed-off-by: Julian Vetter <julian.vetter@xxxxxxxxxx>
---
Changes in v4:
- As suggested by Roger, replaced XEN_DMOP_enable_ext_dest_id (v3 patch
  6), a separate DM op the device model had to call before starting
  vCPUs, with a flags byte repurposed from the existing pad[3] field of
  xen_dm_op_create_ioreq_server
- New XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID flag (bit 0) lets each ioreq
  server signal support at registration time
- As suggested by Roger level the feature across all ioreq servers.
  XEN_HVM_CPUID_EXT_DEST_ID is only advertised when every server
  registered before arch_domain_creation_finished() sets the flag. A
  single server without the flag suppresses the feature for the whole
  domain!
- Lock the levelled result at domain creation time and enforce it for
  servers registered afterwards, preventing a late opt-out from breaking
  guests that already see the feature in CPUID
- Persist the locked flag via HVM_SAVE_TYPE(EXT_DEST_ID) so that live
  migration preserves the guest-visible CPUID bit independently of when
  the device model registers its ioreq servers on the destination host
---
 tools/include/xendevicemodel.h          |  3 +-
 tools/libs/ctrl/xc_devicemodel_compat.c |  2 +-
 tools/libs/devicemodel/core.c           |  3 +-
 xen/arch/arm/ioreq.c                    |  5 ++++
 xen/arch/x86/domain.c                   | 10 +++++++
 xen/arch/x86/hvm/ioreq.c                | 37 +++++++++++++++++++++++++
 xen/arch/x86/hvm/vioapic.c              | 20 +++++++++++++
 xen/arch/x86/include/asm/hvm/domain.h   |  9 ++++++
 xen/common/ioreq.c                      | 13 +++++++--
 xen/drivers/passthrough/x86/hvm.c       | 13 +++++++++
 xen/include/public/arch-x86/hvm/save.h  | 17 +++++++++++-
 xen/include/public/hvm/dm_op.h          | 16 +++++++++--
 xen/include/xen/ioreq.h                 | 27 ++++++++++++++++++
 13 files changed, 165 insertions(+), 10 deletions(-)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index f15b35fa33..bc6764bd58 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -44,12 +44,13 @@ int xendevicemodel_close(xendevicemodel_handle *dmod);
  * @parm domid the domain id to be serviced
  * @parm handle_bufioreq how should the IOREQ Server handle buffered
  *                       requests (HVM_IOREQSRV_BUFIOREQ_*)?
+ * @parm flags bitmask of XEN_DMOP_IOREQ_SERVER_* capability flags.
  * @parm id pointer to an ioservid_t to receive the IOREQ Server id.
  * @return 0 on success, -1 on failure.
  */
 int xendevicemodel_create_ioreq_server(
     xendevicemodel_handle *dmod, domid_t domid, int handle_bufioreq,
-    ioservid_t *id);
+    uint8_t flags, ioservid_t *id);
 
 /**
  * This function retrieves the necessary information to allow an
diff --git a/tools/libs/ctrl/xc_devicemodel_compat.c 
b/tools/libs/ctrl/xc_devicemodel_compat.c
index a46011cd17..91366e250c 100644
--- a/tools/libs/ctrl/xc_devicemodel_compat.c
+++ b/tools/libs/ctrl/xc_devicemodel_compat.c
@@ -11,7 +11,7 @@ int xc_hvm_create_ioreq_server(
     ioservid_t *id)
 {
     return xendevicemodel_create_ioreq_server(xch->dmod, domid,
-                                              handle_bufioreq, id);
+                                              handle_bufioreq, 0, id);
 }
 
 int xc_hvm_get_ioreq_server_info(
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index adf2c41a96..49b9bf8a13 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -167,7 +167,7 @@ static int xendevicemodel_op(
 
 int xendevicemodel_create_ioreq_server(
     xendevicemodel_handle *dmod, domid_t domid, int handle_bufioreq,
-    ioservid_t *id)
+    uint8_t flags, ioservid_t *id)
 {
     struct xen_dm_op op;
     struct xen_dm_op_create_ioreq_server *data;
@@ -179,6 +179,7 @@ int xendevicemodel_create_ioreq_server(
     data = &op.u.create_ioreq_server;
 
     data->handle_bufioreq = handle_bufioreq;
+    data->flags = flags;
 
     rc = xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
     if (rc)
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index b4211f0159..d45228717a 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -201,6 +201,11 @@ void arch_ioreq_domain_init(struct domain *d)
 {
 }
 
+int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags)
+{
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1d458f1372..68ff315460 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -25,6 +25,7 @@
 #include <xen/init.h>
 #include <xen/iocap.h>
 #include <xen/iommu.h>
+#include <xen/ioreq.h>
 #include <xen/irq.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -1106,7 +1107,16 @@ int arch_domain_soft_reset(struct domain *d)
 void arch_domain_creation_finished(struct domain *d)
 {
     if ( is_hvm_domain(d) )
+    {
+        /*
+         * Lock the extended destination ID state. OR preserves any value
+         * already restored from an HVM save record (migration path). For a
+         * fresh domain, ext_dest_id starts false and the dynamic check
+         * supplies the levelled result across all registered ioreq servers.
+         */
+        d->arch.hvm.ext_dest_id |= hvm_ext_dest_id_enabled(d);
         hvm_domain_creation_finished(d);
+    }
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index a5fa97e149..894a63c522 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -19,6 +19,7 @@
 
 #include <asm/hvm/emulate.h>
 #include <asm/hvm/hvm.h>
+#include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/msr.h>
 
@@ -325,6 +326,42 @@ void arch_ioreq_domain_init(struct domain *d)
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 }
 
+int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags)
+{
+    if ( !is_hvm_domain(d) || !d->creation_finished )
+        return 0;
+
+    if ( d->arch.hvm.ext_dest_id &&
+         !(flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) )
+        return -EPERM;
+
+    return 0;
+}
+
+static int cf_check ext_dest_id_save(struct vcpu *v, hvm_domain_context_t *h)
+{
+    struct hvm_hw_ext_dest_id s = {
+        .enabled = v->domain->arch.hvm.ext_dest_id,
+    };
+
+    return hvm_save_entry(EXT_DEST_ID, 0, h, &s);
+}
+
+static int cf_check ext_dest_id_load(struct domain *d, hvm_domain_context_t *h)
+{
+    struct hvm_hw_ext_dest_id s;
+
+    if ( hvm_load_entry(EXT_DEST_ID, h, &s) )
+        return -EINVAL;
+
+    d->arch.hvm.ext_dest_id = s.enabled;
+
+    return 0;
+}
+
+HVM_REGISTER_SAVE_RESTORE(EXT_DEST_ID, ext_dest_id_save, NULL,
+                          ext_dest_id_load, 1, HVMSR_PER_DOM);
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 527cc770b7..7d037a53e1 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -24,6 +24,7 @@
  *  Ported to xen by using virtual IRQ line.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/mm.h>
 #include <xen/xmalloc.h>
@@ -597,6 +598,7 @@ int vioapic_get_trigger_mode(const struct domain *d, 
unsigned int gsi)
 static int cf_check ioapic_check(const struct domain *d, hvm_domain_context_t 
*h)
 {
     const HVM_SAVE_TYPE(IOAPIC) *s;
+    unsigned int i;
 
     if ( !has_vioapic(d) )
         return -ENODEV;
@@ -617,6 +619,24 @@ static int cf_check ioapic_check(const struct domain *d, 
hvm_domain_context_t *h
     if ( s->ioregsel > VIOAPIC_REG_RTE0 + (ARRAY_SIZE(s->redirtbl) - 1) * 2 + 
1 )
         return -EINVAL;
 
+    /*
+     * If any RTE uses extended destination ID bits, the EXT_DEST_ID save
+     * record must have been loaded first (restoring d->arch.hvm.ext_dest_id).
+     * The ioreq server re-registration by the DM happens later, so use the
+     * domain-level locked flag rather than the per-server dynamic check.
+     */
+    for ( i = 0; i < ARRAY_SIZE(s->redirtbl); i++ )
+    {
+        if ( s->redirtbl[i].fields.ext_dest_id && !d->arch.hvm.ext_dest_id )
+        {
+            printk(XENLOG_G_ERR "HVM restore: %pd IO-APIC RTE %u has "
+                                "extended destination ID bits set but "
+                                "EXT_DEST_ID is not enabled\n",
+                                d, i);
+            return -EINVAL;
+        }
+    }
+
     return 0;
 }
 
diff --git a/xen/arch/x86/include/asm/hvm/domain.h 
b/xen/arch/x86/include/asm/hvm/domain.h
index abf9bc448d..895b2e12ba 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -102,6 +102,15 @@ struct hvm_domain {
 
     bool                   is_s3_suspended;
 
+    /*
+     * True when XEN_HVM_CPUID_EXT_DEST_ID was advertised to the guest. Locked
+     * at domain creation time once every registered ioreq server has opted in
+     * via XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID. Persisted in HVM save/restore so
+     * migration preserves the guest-visible state independently of when the
+     * device model re-registers its ioreq servers on the destination host.
+     */
+    bool                   ext_dest_id;
+
     /* Compatibility setting for a bug in x2APIC LDR */
     bool bug_x2apic_ldr_vcpu_id;
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index f5fd30ce12..56a7eb8282 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -641,7 +641,7 @@ static void ioreq_server_deinit(struct ioreq_server *s)
 }
 
 static int ioreq_server_create(struct domain *d, int bufioreq_handling,
-                               ioservid_t *id)
+                               uint8_t flags, ioservid_t *id)
 {
     struct ioreq_server *s;
     unsigned int i;
@@ -683,6 +683,8 @@ static int ioreq_server_create(struct domain *d, int 
bufioreq_handling,
         goto fail;
     }
 
+    s->ext_dest_id = flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID;
+
     if ( id )
         *id = i;
 
@@ -1350,11 +1352,16 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct 
domain *d, bool *const_op)
         *const_op = false;
 
         rc = -EINVAL;
-        if ( data->pad[0] || data->pad[1] || data->pad[2] )
+        if ( data->flags & ~XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID ||
+             data->pad[0] || data->pad[1] )
+            break;
+
+        rc = arch_ioreq_server_create_check(d, data->flags);
+        if ( rc )
             break;
 
         rc = ioreq_server_create(d, data->handle_bufioreq,
-                                 &data->id);
+                                 data->flags, &data->id);
         break;
     }
 
diff --git a/xen/drivers/passthrough/x86/hvm.c 
b/xen/drivers/passthrough/x86/hvm.c
index 6fb4f8b7dc..f7f7c02076 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -21,6 +21,7 @@
 #include <xen/event.h>
 #include <xen/iommu.h>
 #include <xen/cpu.h>
+#include <xen/ioreq.h>
 #include <xen/irq.h>
 #include <asm/hvm/irq.h>
 #include <asm/io_apic.h>
@@ -455,6 +456,18 @@ int pt_irq_create_bind(
         uint64_t msi_addr;
         uint32_t msi_data;
 
+        /*
+         * Refuse the old MSI bind path when extended destination IDs are
+         * in use. The caller must use XEN_DMOP_bind_pt_msi_irq instead,
+         * which passes the raw MSI address so Xen can decode the extended
+         * bits. This old path only carries an 8-bit destination ID and
+         * would silently misroute interrupts to vCPUs with APIC IDs > 255.
+         */
+        if ( hvm_ext_dest_id_enabled(d) )
+        {
+            return -EPERM;
+        }
+
         msi_addr = MSI_ADDR_HEADER |
                    MASK_INSR(MASK_EXTR(gflags, 
XEN_DOMCTL_VMSI_X86_DEST_ID_MASK),
                              MSI_ADDR_DEST_ID_MASK) |
diff --git a/xen/include/public/arch-x86/hvm/save.h 
b/xen/include/public/arch-x86/hvm/save.h
index 483097d940..dd70ce18c6 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -627,12 +627,27 @@ struct hvm_msr {
 
 #define CPU_MSR_CODE  20
 
+/*
+ * HVM_SAVE_TYPE(EXT_DEST_ID): domain-level extended MSI destination ID state.
+ *
+ * Records whether the extended destination ID feature was enabled for this
+ * domain at the time guest vCPUs were started. This allows migration to
+ * preserve the setting across hosts without relying on the device model to
+ * re-register its ioreq servers before the guest's first CPUID query.
+ */
+struct hvm_hw_ext_dest_id {
+    uint8_t enabled;
+    uint8_t pad[7];
+};
+
+DECLARE_HVM_SAVE_TYPE(EXT_DEST_ID, 21, struct hvm_hw_ext_dest_id);
+
 /* Range 22 - 34 (inclusive) reserved for Amazon */
 
 /*
  * Largest type-code in use
  */
-#define HVM_SAVE_CODE_MAX 20
+#define HVM_SAVE_CODE_MAX 21
 
 #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
 
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index 43571b7713..73f33b3c46 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -39,18 +39,28 @@ typedef uint16_t ioservid_t;
  * XEN_DMOP_create_ioreq_server: Instantiate a new IOREQ Server for a
  *                               secondary emulator.
  *
- * The <id> handed back is unique for target domain. The valur of
+ * The <id> handed back is unique for target domain. The value of
  * <handle_bufioreq> should be one of HVM_IOREQSRV_BUFIOREQ_* defined in
- * hvm_op.h. If the value is HVM_IOREQSRV_BUFIOREQ_OFF then  the buffered
+ * hvm_op.h. If the value is HVM_IOREQSRV_BUFIOREQ_OFF then the buffered
  * ioreq ring will not be allocated and hence all emulation requests to
  * this server will be synchronous.
+ *
+ * If <flags> contains XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID, the server will
+ * use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings, passing
+ * raw MSI address/data fields so Xen can decode extended destination ID
+ * bits. Once any server sets this flag, Xen will advertise
+ * XEN_HVM_CPUID_EXT_DEST_ID to the guest. Must be set before the guest
+ * vCPUs are started.
  */
 #define XEN_DMOP_create_ioreq_server 1
 
 struct xen_dm_op_create_ioreq_server {
     /* IN - should server handle buffered ioreqs */
     uint8_t handle_bufioreq;
-    uint8_t pad[3];
+    /* IN - server capability flags */
+    uint8_t flags;
+#define XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID (1u << 0)
+    uint8_t pad[2];
     /* OUT - server id */
     ioservid_t id;
 };
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index e86f0869fa..ec78b63942 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -54,9 +54,35 @@ struct ioreq_server {
     evtchn_port_t          bufioreq_evtchn;
     struct rangeset        *range[NR_IO_RANGE_TYPES];
     bool                   enabled;
+    bool                   ext_dest_id;
     uint8_t                bufioreq_handling;
 };
 
+/*
+ * Return true if every registered ioreq server has opted in to extended
+ * destination IDs (XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) and at least one
+ * server exists. A single server without the flag is enough to suppress
+ * XEN_HVM_CPUID_EXT_DEST_ID, preventing misrouted interrupts.
+ */
+static inline bool hvm_ext_dest_id_enabled(const struct domain *d)
+{
+    unsigned int i;
+    bool found = false;
+
+    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
+    {
+        const struct ioreq_server *s = d->ioreq_server.server[i];
+
+        if ( !s )
+            continue;
+        if ( !s->ext_dest_id )
+            return false;
+        found = true;
+    }
+
+    return found;
+}
+
 static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
 {
     return unlikely(p->df) ?
@@ -137,6 +163,7 @@ bool arch_ioreq_server_destroy_all(struct domain *d);
 bool arch_ioreq_server_get_type_addr(const struct domain *d, const ioreq_t *p,
                                      uint8_t *type, uint64_t *addr);
 void arch_ioreq_domain_init(struct domain *d);
+int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags);
 
 #endif /* __XEN_IOREQ_H__ */
 
-- 
2.53.0



--
Julian Vetter | Vates Hypervisor & Kernel Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.