|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 8/9] hvm/ioreq: Negotiate extended destination ID support per ioreq server
Le 27/04/2026 à 15:57, Julian Vetter a écrit :
> Add a per-server capability flag in XEN_DMOP_create_ioreq_server to
> signal extended destination ID support. Repurpose the first byte of the
> existing pad[3] as a flags field, and define
> XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID (bit 0) for a server to signal it will
> use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings.
>
> Track the flag in struct ioreq_server ext_dest_id.
> hvm_ext_dest_id_enabled() returns true only if all registered ioreq
> servers have opted in and at least one server is present. A single
> server without the flag is sufficient to suppress the feature.
>
> Lock the feature at domain creation time:
> arch_domain_creation_finished() computes the levelled result into struct
> hvm_domain.ext_dest_id using OR to preserve any value previously
> restored from an HVM save record. After creation_finished,
> arch_ioreq_server_create_check() rejects new servers that lack
> XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID if the feature was already advertised
> to the guest.
>
> Persist the locked state in a new HVM_SAVE_TYPE(EXT_DEST_ID) record so
> that migration preserves the guest-visible CPUID bit independently of
> when the device model re-registers its ioreq servers on the destination
> host.
>
> On restore, ioapic_check() uses d->arch.hvm.ext_dest_id (restored from
> the EXT_DEST_ID record) rather than the per-server dynamic check, since
> the DM has not yet re-registered its servers at that point.
>
> Update xendevicemodel_create_ioreq_server() in libxendevicemodel to
> accept the new flags parameter, remove
> xendevicemodel_enable_ext_dest_id(), and fix the
> xc_hvm_create_ioreq_server() compat wrapper to pass zero flags.
>
> Signed-off-by: Julian Vetter <julian.vetter@xxxxxxxxxx>
That has somewhat already being discussed previously, but AFAIU,
extended destination ID is only meaningful when guest APIC IDs cannot be
represented with the "non-extended" model which can only happen in
practice when having more than 128 vCPUs in the guest.
I don't think we need to check for device model support unless the guest
can have more than 128 vCPUs, where in such case it becomes mandatory
(unless some form of interrupt remapping is implemented).
So I would rather check if domain->max_vcpus is more than 128 and
require device models to implement support for extended destination ID
in these cases.
In some way, that would imply that extended destination ID is only
exposed to guests with domain->max_vcpus > 128.
Overall, what I propose would be to keep the new
XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID flag, and if d->max_vcpus > 128, we
require the device model to support XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID.
> ---
> Changes in v4:
> - As suggested by Roger, replaced XEN_DMOP_enable_ext_dest_id (v3 patch
> 6), a separate DM op the device model had to call before starting
> vCPUs, with a flags byte repurposed from the existing pad[3] field of
> xen_dm_op_create_ioreq_server
> - New XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID flag (bit 0) lets each ioreq
> server signal support at registration time
> - As suggested by Roger level the feature across all ioreq servers.
> XEN_HVM_CPUID_EXT_DEST_ID is only advertised when every server
> registered before arch_domain_creation_finished() sets the flag. A
> single server without the flag suppresses the feature for the whole
> domain!
> - Lock the levelled result at domain creation time and enforce it for
> servers registered afterwards, preventing a late opt-out from breaking
> guests that already see the feature in CPUID
> - Persist the locked flag via HVM_SAVE_TYPE(EXT_DEST_ID) so that live
> migration preserves the guest-visible CPUID bit independently of when
> the device model registers its ioreq servers on the destination host
> ---
> tools/include/xendevicemodel.h | 3 +-
> tools/libs/ctrl/xc_devicemodel_compat.c | 2 +-
> tools/libs/devicemodel/core.c | 3 +-
> xen/arch/arm/ioreq.c | 5 ++++
> xen/arch/x86/domain.c | 10 +++++++
> xen/arch/x86/hvm/ioreq.c | 37 +++++++++++++++++++++++++
> xen/arch/x86/hvm/vioapic.c | 20 +++++++++++++
> xen/arch/x86/include/asm/hvm/domain.h | 9 ++++++
> xen/common/ioreq.c | 13 +++++++--
> xen/drivers/passthrough/x86/hvm.c | 13 +++++++++
> xen/include/public/arch-x86/hvm/save.h | 17 +++++++++++-
> xen/include/public/hvm/dm_op.h | 16 +++++++++--
> xen/include/xen/ioreq.h | 27 ++++++++++++++++++
> 13 files changed, 165 insertions(+), 10 deletions(-)
>
> diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
> index f15b35fa33..bc6764bd58 100644
> --- a/tools/include/xendevicemodel.h
> +++ b/tools/include/xendevicemodel.h
> @@ -44,12 +44,13 @@ int xendevicemodel_close(xendevicemodel_handle *dmod);
> * @parm domid the domain id to be serviced
> * @parm handle_bufioreq how should the IOREQ Server handle buffered
> * requests (HVM_IOREQSRV_BUFIOREQ_*)?
> + * @parm flags bitmask of XEN_DMOP_IOREQ_SERVER_* capability flags.
> * @parm id pointer to an ioservid_t to receive the IOREQ Server id.
> * @return 0 on success, -1 on failure.
> */
> int xendevicemodel_create_ioreq_server(
> xendevicemodel_handle *dmod, domid_t domid, int handle_bufioreq,
> - ioservid_t *id);
> + uint8_t flags, ioservid_t *id);
>
> /**
> * This function retrieves the necessary information to allow an
> diff --git a/tools/libs/ctrl/xc_devicemodel_compat.c
> b/tools/libs/ctrl/xc_devicemodel_compat.c
> index a46011cd17..91366e250c 100644
> --- a/tools/libs/ctrl/xc_devicemodel_compat.c
> +++ b/tools/libs/ctrl/xc_devicemodel_compat.c
> @@ -11,7 +11,7 @@ int xc_hvm_create_ioreq_server(
> ioservid_t *id)
> {
> return xendevicemodel_create_ioreq_server(xch->dmod, domid,
> - handle_bufioreq, id);
> + handle_bufioreq, 0, id);
> }
>
> int xc_hvm_get_ioreq_server_info(
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index adf2c41a96..49b9bf8a13 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -167,7 +167,7 @@ static int xendevicemodel_op(
>
> int xendevicemodel_create_ioreq_server(
> xendevicemodel_handle *dmod, domid_t domid, int handle_bufioreq,
> - ioservid_t *id)
> + uint8_t flags, ioservid_t *id)
> {
> struct xen_dm_op op;
> struct xen_dm_op_create_ioreq_server *data;
> @@ -179,6 +179,7 @@ int xendevicemodel_create_ioreq_server(
> data = &op.u.create_ioreq_server;
>
> data->handle_bufioreq = handle_bufioreq;
> + data->flags = flags;
>
> rc = xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> if (rc)
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index b4211f0159..d45228717a 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -201,6 +201,11 @@ void arch_ioreq_domain_init(struct domain *d)
> {
> }
>
> +int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags)
> +{
> + return 0;
> +}
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 1d458f1372..68ff315460 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -25,6 +25,7 @@
> #include <xen/init.h>
> #include <xen/iocap.h>
> #include <xen/iommu.h>
> +#include <xen/ioreq.h>
> #include <xen/irq.h>
> #include <xen/kernel.h>
> #include <xen/lib.h>
> @@ -1106,7 +1107,16 @@ int arch_domain_soft_reset(struct domain *d)
> void arch_domain_creation_finished(struct domain *d)
> {
> if ( is_hvm_domain(d) )
> + {
> + /*
> + * Lock the extended destination ID state. OR preserves any value
> + * already restored from an HVM save record (migration path). For a
> + * fresh domain, ext_dest_id starts false and the dynamic check
> + * supplies the levelled result across all registered ioreq servers.
> + */
> + d->arch.hvm.ext_dest_id |= hvm_ext_dest_id_enabled(d);
> hvm_domain_creation_finished(d);
> + }
> }
>
> #ifdef CONFIG_COMPAT
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index a5fa97e149..894a63c522 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -19,6 +19,7 @@
>
> #include <asm/hvm/emulate.h>
> #include <asm/hvm/hvm.h>
> +#include <asm/hvm/support.h>
> #include <asm/hvm/vmx/vmx.h>
> #include <asm/msr.h>
>
> @@ -325,6 +326,42 @@ void arch_ioreq_domain_init(struct domain *d)
> register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
> }
>
> +int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags)
> +{
> + if ( !is_hvm_domain(d) || !d->creation_finished )
> + return 0;
> +
> + if ( d->arch.hvm.ext_dest_id &&
> + !(flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) )
> + return -EPERM;
> +
> + return 0;
> +}
> +
> +static int cf_check ext_dest_id_save(struct vcpu *v, hvm_domain_context_t *h)
> +{
> + struct hvm_hw_ext_dest_id s = {
> + .enabled = v->domain->arch.hvm.ext_dest_id,
> + };
> +
> + return hvm_save_entry(EXT_DEST_ID, 0, h, &s);
> +}
> +
> +static int cf_check ext_dest_id_load(struct domain *d, hvm_domain_context_t
> *h)
> +{
> + struct hvm_hw_ext_dest_id s;
> +
> + if ( hvm_load_entry(EXT_DEST_ID, h, &s) )
> + return -EINVAL;
> +
> + d->arch.hvm.ext_dest_id = s.enabled;
> +
> + return 0;
> +}
> +
> +HVM_REGISTER_SAVE_RESTORE(EXT_DEST_ID, ext_dest_id_save, NULL,
> + ext_dest_id_load, 1, HVMSR_PER_DOM);
> +
> /*
> * Local variables:
> * mode: C
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 527cc770b7..7d037a53e1 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -24,6 +24,7 @@
> * Ported to xen by using virtual IRQ line.
> */
>
> +#include <xen/ioreq.h>
> #include <xen/types.h>
> #include <xen/mm.h>
> #include <xen/xmalloc.h>
> @@ -597,6 +598,7 @@ int vioapic_get_trigger_mode(const struct domain *d,
> unsigned int gsi)
> static int cf_check ioapic_check(const struct domain *d,
> hvm_domain_context_t *h)
> {
> const HVM_SAVE_TYPE(IOAPIC) *s;
> + unsigned int i;
>
> if ( !has_vioapic(d) )
> return -ENODEV;
> @@ -617,6 +619,24 @@ static int cf_check ioapic_check(const struct domain *d,
> hvm_domain_context_t *h
> if ( s->ioregsel > VIOAPIC_REG_RTE0 + (ARRAY_SIZE(s->redirtbl) - 1) * 2
> + 1 )
> return -EINVAL;
>
> + /*
> + * If any RTE uses extended destination ID bits, the EXT_DEST_ID save
> + * record must have been loaded first (restoring
> d->arch.hvm.ext_dest_id).
> + * The ioreq server re-registration by the DM happens later, so use the
> + * domain-level locked flag rather than the per-server dynamic check.
> + */
> + for ( i = 0; i < ARRAY_SIZE(s->redirtbl); i++ )
> + {
> + if ( s->redirtbl[i].fields.ext_dest_id && !d->arch.hvm.ext_dest_id )
> + {
> + printk(XENLOG_G_ERR "HVM restore: %pd IO-APIC RTE %u has "
> + "extended destination ID bits set but "
> + "EXT_DEST_ID is not enabled\n",
> + d, i);
> + return -EINVAL;
> + }
> + }
> +
> return 0;
> }
>
> diff --git a/xen/arch/x86/include/asm/hvm/domain.h
> b/xen/arch/x86/include/asm/hvm/domain.h
> index abf9bc448d..895b2e12ba 100644
> --- a/xen/arch/x86/include/asm/hvm/domain.h
> +++ b/xen/arch/x86/include/asm/hvm/domain.h
> @@ -102,6 +102,15 @@ struct hvm_domain {
>
> bool is_s3_suspended;
>
> + /*
> + * True when XEN_HVM_CPUID_EXT_DEST_ID was advertised to the guest.
> Locked
> + * at domain creation time once every registered ioreq server has opted
> in
> + * via XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID. Persisted in HVM save/restore
> so
> + * migration preserves the guest-visible state independently of when the
> + * device model re-registers its ioreq servers on the destination host.
> + */
> + bool ext_dest_id;
> +
> /* Compatibility setting for a bug in x2APIC LDR */
> bool bug_x2apic_ldr_vcpu_id;
>
> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
> index f5fd30ce12..56a7eb8282 100644
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -641,7 +641,7 @@ static void ioreq_server_deinit(struct ioreq_server *s)
> }
>
> static int ioreq_server_create(struct domain *d, int bufioreq_handling,
> - ioservid_t *id)
> + uint8_t flags, ioservid_t *id)
> {
> struct ioreq_server *s;
> unsigned int i;
> @@ -683,6 +683,8 @@ static int ioreq_server_create(struct domain *d, int
> bufioreq_handling,
> goto fail;
> }
>
> + s->ext_dest_id = flags & XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID;
> +
> if ( id )
> *id = i;
>
> @@ -1350,11 +1352,16 @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct
> domain *d, bool *const_op)
> *const_op = false;
>
> rc = -EINVAL;
> - if ( data->pad[0] || data->pad[1] || data->pad[2] )
> + if ( data->flags & ~XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID ||
> + data->pad[0] || data->pad[1] )
> + break;
> +
> + rc = arch_ioreq_server_create_check(d, data->flags);
> + if ( rc )
> break;
>
> rc = ioreq_server_create(d, data->handle_bufioreq,
> - &data->id);
> + data->flags, &data->id);
> break;
> }
>
> diff --git a/xen/drivers/passthrough/x86/hvm.c
> b/xen/drivers/passthrough/x86/hvm.c
> index 6fb4f8b7dc..f7f7c02076 100644
> --- a/xen/drivers/passthrough/x86/hvm.c
> +++ b/xen/drivers/passthrough/x86/hvm.c
> @@ -21,6 +21,7 @@
> #include <xen/event.h>
> #include <xen/iommu.h>
> #include <xen/cpu.h>
> +#include <xen/ioreq.h>
> #include <xen/irq.h>
> #include <asm/hvm/irq.h>
> #include <asm/io_apic.h>
> @@ -455,6 +456,18 @@ int pt_irq_create_bind(
> uint64_t msi_addr;
> uint32_t msi_data;
>
> + /*
> + * Refuse the old MSI bind path when extended destination IDs are
> + * in use. The caller must use XEN_DMOP_bind_pt_msi_irq instead,
> + * which passes the raw MSI address so Xen can decode the extended
> + * bits. This old path only carries an 8-bit destination ID and
> + * would silently misroute interrupts to vCPUs with APIC IDs > 255.
> + */
> + if ( hvm_ext_dest_id_enabled(d) )
> + {
> + return -EPERM;
> + }
> +
> msi_addr = MSI_ADDR_HEADER |
> MASK_INSR(MASK_EXTR(gflags,
> XEN_DOMCTL_VMSI_X86_DEST_ID_MASK),
> MSI_ADDR_DEST_ID_MASK) |
> diff --git a/xen/include/public/arch-x86/hvm/save.h
> b/xen/include/public/arch-x86/hvm/save.h
> index 483097d940..dd70ce18c6 100644
> --- a/xen/include/public/arch-x86/hvm/save.h
> +++ b/xen/include/public/arch-x86/hvm/save.h
> @@ -627,12 +627,27 @@ struct hvm_msr {
>
> #define CPU_MSR_CODE 20
>
> +/*
> + * HVM_SAVE_TYPE(EXT_DEST_ID): domain-level extended MSI destination ID
> state.
> + *
> + * Records whether the extended destination ID feature was enabled for this
> + * domain at the time guest vCPUs were started. This allows migration to
> + * preserve the setting across hosts without relying on the device model to
> + * re-register its ioreq servers before the guest's first CPUID query.
> + */
> +struct hvm_hw_ext_dest_id {
> + uint8_t enabled;
> + uint8_t pad[7];
> +};
> +
> +DECLARE_HVM_SAVE_TYPE(EXT_DEST_ID, 21, struct hvm_hw_ext_dest_id);
> +
> /* Range 22 - 34 (inclusive) reserved for Amazon */
>
> /*
> * Largest type-code in use
> */
> -#define HVM_SAVE_CODE_MAX 20
> +#define HVM_SAVE_CODE_MAX 21
>
> #endif /* __XEN_PUBLIC_HVM_SAVE_X86_H__ */
>
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index 43571b7713..73f33b3c46 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -39,18 +39,28 @@ typedef uint16_t ioservid_t;
> * XEN_DMOP_create_ioreq_server: Instantiate a new IOREQ Server for a
> * secondary emulator.
> *
> - * The <id> handed back is unique for target domain. The valur of
> + * The <id> handed back is unique for target domain. The value of
> * <handle_bufioreq> should be one of HVM_IOREQSRV_BUFIOREQ_* defined in
> - * hvm_op.h. If the value is HVM_IOREQSRV_BUFIOREQ_OFF then the buffered
> + * hvm_op.h. If the value is HVM_IOREQSRV_BUFIOREQ_OFF then the buffered
> * ioreq ring will not be allocated and hence all emulation requests to
> * this server will be synchronous.
> + *
> + * If <flags> contains XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID, the server will
> + * use XEN_DMOP_bind_pt_msi_irq for all passthrough MSI bindings, passing
> + * raw MSI address/data fields so Xen can decode extended destination ID
> + * bits. Once any server sets this flag, Xen will advertise
> + * XEN_HVM_CPUID_EXT_DEST_ID to the guest. Must be set before the guest
> + * vCPUs are started.
> */
> #define XEN_DMOP_create_ioreq_server 1
>
> struct xen_dm_op_create_ioreq_server {
> /* IN - should server handle buffered ioreqs */
> uint8_t handle_bufioreq;
> - uint8_t pad[3];
> + /* IN - server capability flags */
> + uint8_t flags;
> +#define XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID (1u << 0)
> + uint8_t pad[2];
> /* OUT - server id */
> ioservid_t id;
> };
> diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
> index e86f0869fa..ec78b63942 100644
> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -54,9 +54,35 @@ struct ioreq_server {
> evtchn_port_t bufioreq_evtchn;
> struct rangeset *range[NR_IO_RANGE_TYPES];
> bool enabled;
> + bool ext_dest_id;
> uint8_t bufioreq_handling;
> };
>
> +/*
> + * Return true if every registered ioreq server has opted in to extended
> + * destination IDs (XEN_DMOP_IOREQ_SERVER_EXT_DEST_ID) and at least one
> + * server exists. A single server without the flag is enough to suppress
> + * XEN_HVM_CPUID_EXT_DEST_ID, preventing misrouted interrupts.
> + */
> +static inline bool hvm_ext_dest_id_enabled(const struct domain *d)
> +{
> + unsigned int i;
> + bool found = false;
> +
> + for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
> + {
> + const struct ioreq_server *s = d->ioreq_server.server[i];
> +
> + if ( !s )
> + continue;
> + if ( !s->ext_dest_id )
> + return false;
> + found = true;
> + }
> +
> + return found;
> +}
> +
> static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
> {
> return unlikely(p->df) ?
> @@ -137,6 +163,7 @@ bool arch_ioreq_server_destroy_all(struct domain *d);
> bool arch_ioreq_server_get_type_addr(const struct domain *d, const ioreq_t
> *p,
> uint8_t *type, uint64_t *addr);
> void arch_ioreq_domain_init(struct domain *d);
> +int arch_ioreq_server_create_check(const struct domain *d, uint8_t flags);
>
> #endif /* __XEN_IOREQ_H__ */
>
Teddy
--
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |