|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/pci: Correct ECS handling with CF8/CFC emulation
On Mon, Apr 03, 2023 at 11:16:52AM +0100, Andrew Cooper wrote:
> On 03/04/2023 9:57 am, Roger Pau Monné wrote:
> > On Fri, Mar 31, 2023 at 06:57:19PM +0100, Andrew Cooper wrote:
> >> This started by noticing the dubious Fam17h check in
> >> arch_ioreq_server_get_type_addr(), and then realising that checking the
> >> host
> >> CF8_EXT setting is utterly bogus for the guest <-> qemu emulation path.
> >>
> >> What should be consulted here is the guest's MSR_AMD64_NB_CFG setting, but
> >> a)
> >> there isn't one, and b) the vestigial remnants of the cross-vendor
> >> migration
> >> logic cause MSR_AMD64_NB_CFG to be unconditionally read-as-zero, making the
> >> CF8_EXT path unused by any suitably-written OS in the first place.
> >>
> >> MSR_AMD64_NB_CFG really has been removed on Fam17h (it's now just a
> >> read-zero,
> >> write-discard stub), and the ECS extension is unconditionally active,
> >> meaning
> >> it is not correct for Xen to ignore the ExtRegNo field on newer AMD CPUs.
> >>
> >> It turns out that Xen did even have this behaviour in 4.5 and earlier, with
> >> this problematic CF8_EXT checking being added in 4.6. Therefore, revert
> >> back
> >> to Xen's older behaviour - it is objectively less wrong than the current
> >> logic.
> >>
> >> While fixing this, get rid of hvm_pci_decode_addr() - it is more
> >> complicated
> >> to follow (and to call) than using the CF8* macros in the calling context.
> >> Rename CF8_ADDR() to CF8_REG() to better describe what it does, and write a
> >> comment explaining all about CF8/CFC accesses.
> >>
> >> There's one rare case when CF8_EXT is visible to guests, and that is for a
> >> pinned hwdom. Right now, we permit such a dom0 to modify the CF8_EXT bit,
> >> but
> >> this seems like a very unwise idea. Leave a TODO for people to consider.
> > One weirdness I've noticed is that for vPCI we decode the accesses
> > taking the extended CF8 bit after this change, but then if the access
> > is relayed to the hardware using vpci_{read,write}_hw it will be
> > forwarded to the hardware using pci_conf_{read,write}<size> which
> > doesn't have support for CF8_EXT. So if the underlying hardware
> > doesn't have MMCFG support and the reg is > 255 it will be dropped.
>
> It is important to stress that this change does not influence whether
> the guest issues ECS accesses or not. All it does is change Xen's
> handling of such accesses.
>
> Previously vPCI blindly ignored ECS accesses, so the vPCI layer
> effectively truncated them to BCS accesses.
>
> Now, from your analysis, when MMCFG isn't active, Xen's PCI layer will
> effectively terminate ECS accesses with default behaviour, even on
> systems where IO ECS is available.
>
> So we've changed one valid behaviour for a different valid behaviour.
Given vPCI is currently limited to PVH dom0 I doubt there are many
systems capable of running PVH dom0 but not having MMCFG support.
Best way to fix would be to implement ECS accesses in
pci_conf_{read,write}<size>() if the register is > 255 and there's no
MMCFG.
>
> (Quick tangent... Our PCI handling is currently very dumb.
> pci_mmcfg_read() returns its value by pointer but the callers never
> check. Swapping it to return by value would improve code gen quite a
> lot. Also, when MMCFG is active we still pass BCS accesses to IO ports.)
I wonder if it's really preferred to access registers below 255 using
the IO ports, as Linux seems to do the same (prefer IO port access if
possible).
> So I think we do want to improve Xen's general behaviour too, but this
> difference here doesn't concern me.
>
> >
> >> Fixes: e0fbf3bf9871 ("x86/AMD: correct certain Fam17 checks")
> >> Fixes: 2d67a7a4d37a ("x86: synchronize PCI config space access decoding")
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >> ---
> >> CC: Jan Beulich <JBeulich@xxxxxxxx>
> >> CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> >> CC: Wei Liu <wl@xxxxxxx>
> >>
> >> Whoever reviewed those two patches originally was clearly a fool...
> >> ---
> >> xen/arch/x86/hvm/io.c | 24 ++++++------------------
> >> xen/arch/x86/hvm/ioreq.c | 19 ++-----------------
> >> xen/arch/x86/include/asm/hvm/io.h | 4 ----
> >> xen/arch/x86/include/asm/pci.h | 26 ++++++++++++++++++++++++--
> >> xen/arch/x86/pv/emul-priv-op.c | 19 ++++++-------------
> >> 5 files changed, 38 insertions(+), 54 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> >> index 5ae209d3b6b3..b0d3c236e985 100644
> >> --- a/xen/arch/x86/hvm/io.c
> >> +++ b/xen/arch/x86/hvm/io.c
> >> @@ -248,20 +248,6 @@ void register_g2m_portio_handler(struct domain *d)
> >> handler->ops = &g2m_portio_ops;
> >> }
> >>
> >> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> >> - pci_sbdf_t *sbdf)
> >> -{
> >> - ASSERT(CF8_ENABLED(cf8));
> >> -
> >> - sbdf->bdf = CF8_BDF(cf8);
> >> - sbdf->seg = 0;
> >> - /*
> >> - * NB: the lower 2 bits of the register address are fetched from the
> >> - * offset into the 0xcfc register when reading/writing to it.
> >> - */
> >> - return CF8_ADDR_LO(cf8) | (addr & 3);
> >> -}
> >> -
> >> /* vPCI config space IO ports handlers (0xcf8/0xcfc). */
> >> static bool cf_check vpci_portio_accept(
> >> const struct hvm_io_handler *handler, const ioreq_t *p)
> >> @@ -275,7 +261,7 @@ static int cf_check vpci_portio_read(
> >> {
> >> const struct domain *d = current->domain;
> >> unsigned int reg;
> >> - pci_sbdf_t sbdf;
> >> + pci_sbdf_t sbdf = {};
> >> uint32_t cf8;
> >>
> >> *data = ~(uint64_t)0;
> >> @@ -292,7 +278,8 @@ static int cf_check vpci_portio_read(
> >> if ( !CF8_ENABLED(cf8) )
> >> return X86EMUL_UNHANDLEABLE;
> >>
> >> - reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> >> + sbdf.bdf = CF8_BDF(cf8);
> >> + reg = CF8_REG(cf8) | (addr & 3);
> >>
> >> if ( !vpci_access_allowed(reg, size) )
> >> return X86EMUL_OKAY;
> >> @@ -308,7 +295,7 @@ static int cf_check vpci_portio_write(
> >> {
> >> struct domain *d = current->domain;
> >> unsigned int reg;
> >> - pci_sbdf_t sbdf;
> >> + pci_sbdf_t sbdf = {};
> >> uint32_t cf8;
> >>
> >> if ( addr == 0xcf8 )
> >> @@ -323,7 +310,8 @@ static int cf_check vpci_portio_write(
> >> if ( !CF8_ENABLED(cf8) )
> >> return X86EMUL_UNHANDLEABLE;
> >>
> >> - reg = hvm_pci_decode_addr(cf8, addr, &sbdf);
> >> + sbdf.bdf = CF8_BDF(cf8);
> >> + reg = CF8_REG(cf8) | (addr & 3);
> >>
> >> if ( !vpci_access_allowed(reg, size) )
> >> return X86EMUL_OKAY;
> >> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> >> index 0bdcca1e1a5f..325a9d118e52 100644
> >> --- a/xen/arch/x86/hvm/ioreq.c
> >> +++ b/xen/arch/x86/hvm/ioreq.c
> >> @@ -285,27 +285,12 @@ bool arch_ioreq_server_get_type_addr(const struct
> >> domain *d,
> >> (p->addr & ~3) == 0xcfc &&
> >> CF8_ENABLED(cf8) )
> >> {
> >> - unsigned int x86_fam, reg;
> >> - pci_sbdf_t sbdf;
> >> -
> >> - reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
> >> + pci_sbdf_t sbdf = { .bdf = CF8_BDF(cf8) };
> >> + unsigned int reg = CF8_REG(cf8) | (p->addr & 3);
> >>
> >> /* PCI config data cycle */
> >> *type = XEN_DMOP_IO_RANGE_PCI;
> >> *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
> >> - /* AMD extended configuration space access? */
> >> - if ( CF8_ADDR_HI(cf8) &&
> >> - d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
> >> - (x86_fam = get_cpu_family(
> >> - d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
> >> - x86_fam < 0x17 )
> >> - {
> >> - uint64_t msr_val;
> >> -
> >> - if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
> >> - (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
> >> - *addr |= CF8_ADDR_HI(cf8);
> >> - }
> >> }
> >> else
> >> {
> >> diff --git a/xen/arch/x86/include/asm/hvm/io.h
> >> b/xen/arch/x86/include/asm/hvm/io.h
> >> index 54e0161b492c..3f3fb6403ccb 100644
> >> --- a/xen/arch/x86/include/asm/hvm/io.h
> >> +++ b/xen/arch/x86/include/asm/hvm/io.h
> >> @@ -144,10 +144,6 @@ void stdvga_deinit(struct domain *d);
> >>
> >> extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
> >>
> >> -/* Decode a PCI port IO access into a bus/slot/func/reg. */
> >> -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
> >> - pci_sbdf_t *sbdf);
> >> -
> >> /*
> >> * HVM port IO handler that performs forwarding of guest IO ports into
> >> machine
> >> * IO ports.
> >> diff --git a/xen/arch/x86/include/asm/pci.h
> >> b/xen/arch/x86/include/asm/pci.h
> >> index f4a58c8acf13..3b814f4ebacf 100644
> >> --- a/xen/arch/x86/include/asm/pci.h
> >> +++ b/xen/arch/x86/include/asm/pci.h
> >> @@ -3,10 +3,32 @@
> >>
> >> #include <xen/mm.h>
> >>
> >> +/*
> >> + * PCI config space accesses with CF8/CFC:
> >> + *
> >> + * 1) Write {Enable | BDF | Reg} to CF8 to set an address
> >> + * 2) Read or write CF{C..F} to access the register
> >> + *
> >> + * For sub-dword register accesses, the bottom two register address bits
> >> come
> >> + * from the CF{C..F} address, not from CF8.
> >> + *
> >> + * AMD have extention to this protocol to access PCIe Extend Config Space
> >> by
> >> + * storing the 4 extra register address bits in the penultimate nibble of
> >> CF8.
> >> + * This extention:
> >> + * - Is unconditionally active on Fam17h and later
> >> + * - Has model specific enablement on Fam10h thru Fam16h
> >> + * - Has reserved behaviour in all other cases, including other vendors
> >> + *
> >> + * For simplicity and because we are permitted to, given "reserved", Xen
> >> + * always treats ECS as active when emulating guest PCI config space
> >> accesses.
> >> + */
> >> #define CF8_BDF(cf8) ( ((cf8) & 0x00ffff00) >> 8)
> >> -#define CF8_ADDR_LO(cf8) ( (cf8) & 0x000000fc)
> >> -#define CF8_ADDR_HI(cf8) ( ((cf8) & 0x0f000000) >> 16)
> >> #define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000))
> >> +#define CF8_REG(cf8) \
> >> + ({ \
> >> + unsigned int _c = cf8; \
> >> + ((_c & 0x0f000000) >> 16) | (_c & 0xfc); \
> >> + })
> > What happens on Intel when the bit is set, is it just ignored?
>
> "the bit" => the ECS nibble, or the CF8_EXT bit?
The ECS nibble.
>
> The ECS nibble is ignored on Intel AFAICT, while the CF8_EXT bit is in a
> very AMD-only MSR, so won't exist on Intel.
>
> > We only allow such accesses for dom0 anyway.
>
> And guests running on AMD hardware where CF8_EXT is active on the
> northbridge of the core we are instantaneously scheduled on.
>
> >>
> >> #define IS_SNB_GFX(id) (id == 0x01068086 || id == 0x01168086 \
> >> || id == 0x01268086 || id == 0x01028086 \
> >> diff --git a/xen/arch/x86/pv/emul-priv-op.c
> >> b/xen/arch/x86/pv/emul-priv-op.c
> >> index 5da00e24e4ff..008367195c78 100644
> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -245,19 +245,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned
> >> int start,
> >> if ( ro_map && test_bit(machine_bdf, ro_map) )
> >> return false;
> >> }
> >> - start |= CF8_ADDR_LO(currd->arch.pci_cf8);
> >> - /* AMD extended configuration space access? */
> >> - if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
> >> - boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> >> - boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 < 0x17 )
> >> - {
> >> - uint64_t msr_val;
> >> -
> >> - if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
> >> - return false;
> >> - if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
> >> - start |= CF8_ADDR_HI(currd->arch.pci_cf8);
> >> - }
> >> + start |= CF8_REG(currd->arch.pci_cf8);
> >>
> >> return !write ?
> >> xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
> >> @@ -1104,6 +1092,11 @@ static int cf_check write_msr(
> >> if ( !is_hwdom_pinned_vcpu(curr) )
> >> return X86EMUL_OKAY;
> >> if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) ||
> >> + /*
> >> + * TODO: this is broken. What happens when dom0 is pinned
> >> but
> >> + * can't see the full system? CF8_EXT probably ought to be a
> >> + * Xen-owned setting, and made symmetric across the system.
> >> + */
> > I would assume CF8_EXT would be symmetric across the system, specially
> > given that it seems to be phased out and only used in older AMD
> > families that where all symmetric?
>
> The CF8_EXT bit has been phased out. The IO ECS functionality still exists.
>
> But yes, the more I think about letting dom0 play with this, the more I
> think it is a fundamentally broken idea... I bet it was from the very
> early AMD Fam10h days where dom0 knew how to turn it on, and Xen was
> trying to pretend it didn't have to touch any PCI devices.
It seems to me Xen should set CF8_EXT on all threads (when available)
and expose it to dom0, so that accesses using pci_{conf,write}_read()
work as expected?
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |