[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/pci: Correct ECS handling with CF8/CFC emulation
On Fri, Mar 31, 2023 at 06:57:19PM +0100, Andrew Cooper wrote: > This started by noticing the dubious Fam17h check in > arch_ioreq_server_get_type_addr(), and then realising that checking the host > CF8_EXT setting is utterly bogus for the guest <-> qemu emulation path. > > What should be consulted here is the guest's MSR_AMD64_NB_CFG setting, but a) > there isn't one, and b) the vestigial remnants of the cross-vendor migration > logic cause MSR_AMD64_NB_CFG to be unconditionally read-as-zero, making the > CF8_EXT path unused by any suitably-written OS in the first place. > > MSR_AMD64_NB_CFG really has been removed on Fam17h (it's now just a read-zero, > write-discard stub), and the ECS extension is unconditionally active, meaning > it is not correct for Xen to ignore the ExtRegNo field on newer AMD CPUs. > > It turns out that Xen did even have this behaviour in 4.5 and earlier, with > this problematic CF8_EXT checking being added in 4.6. Therefore, revert back > to Xen's older behaviour - it is objectively less wrong than the current > logic. > > While fixing this, get rid of hvm_pci_decode_addr() - it is more complicated > to follow (and to call) than using the CF8* macros in the calling context. > Rename CF8_ADDR() to CF8_REG() to better describe what it does, and write a > comment explaining all about CF8/CFC accesses. > > There's one rare case when CF8_EXT is visible to guests, and that is for a > pinned hwdom. Right now, we permit such a dom0 to modify the CF8_EXT bit, but > this seems like a very unwise idea. Leave a TODO for people to consider. One weirdness I've noticed is that for vPCI we decode the accesses taking the extended CF8 bit after this change, but then if the access is relayed to the hardware using vpci_{read,write}_hw it will be forwarded to the hardware using pci_conf_{read,write}<size> which doesn't have support for CF8_EXT. So if the underlying hardware doesn't have MMCFG support and the reg is > 255 it will be dropped. > Fixes: e0fbf3bf9871 ("x86/AMD: correct certain Fam17 checks") > Fixes: 2d67a7a4d37a ("x86: synchronize PCI config space access decoding") > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Roger Pau Monné <roger.pau@xxxxxxxxxx> > CC: Wei Liu <wl@xxxxxxx> > > Whoever reviewed those two patches originally was clearly a fool... > --- > xen/arch/x86/hvm/io.c | 24 ++++++------------------ > xen/arch/x86/hvm/ioreq.c | 19 ++----------------- > xen/arch/x86/include/asm/hvm/io.h | 4 ---- > xen/arch/x86/include/asm/pci.h | 26 ++++++++++++++++++++++++-- > xen/arch/x86/pv/emul-priv-op.c | 19 ++++++------------- > 5 files changed, 38 insertions(+), 54 deletions(-) > > diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c > index 5ae209d3b6b3..b0d3c236e985 100644 > --- a/xen/arch/x86/hvm/io.c > +++ b/xen/arch/x86/hvm/io.c > @@ -248,20 +248,6 @@ void register_g2m_portio_handler(struct domain *d) > handler->ops = &g2m_portio_ops; > } > > -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr, > - pci_sbdf_t *sbdf) > -{ > - ASSERT(CF8_ENABLED(cf8)); > - > - sbdf->bdf = CF8_BDF(cf8); > - sbdf->seg = 0; > - /* > - * NB: the lower 2 bits of the register address are fetched from the > - * offset into the 0xcfc register when reading/writing to it. > - */ > - return CF8_ADDR_LO(cf8) | (addr & 3); > -} > - > /* vPCI config space IO ports handlers (0xcf8/0xcfc). */ > static bool cf_check vpci_portio_accept( > const struct hvm_io_handler *handler, const ioreq_t *p) > @@ -275,7 +261,7 @@ static int cf_check vpci_portio_read( > { > const struct domain *d = current->domain; > unsigned int reg; > - pci_sbdf_t sbdf; > + pci_sbdf_t sbdf = {}; > uint32_t cf8; > > *data = ~(uint64_t)0; > @@ -292,7 +278,8 @@ static int cf_check vpci_portio_read( > if ( !CF8_ENABLED(cf8) ) > return X86EMUL_UNHANDLEABLE; > > - reg = hvm_pci_decode_addr(cf8, addr, &sbdf); > + sbdf.bdf = CF8_BDF(cf8); > + reg = CF8_REG(cf8) | (addr & 3); > > if ( !vpci_access_allowed(reg, size) ) > return X86EMUL_OKAY; > @@ -308,7 +295,7 @@ static int cf_check vpci_portio_write( > { > struct domain *d = current->domain; > unsigned int reg; > - pci_sbdf_t sbdf; > + pci_sbdf_t sbdf = {}; > uint32_t cf8; > > if ( addr == 0xcf8 ) > @@ -323,7 +310,8 @@ static int cf_check vpci_portio_write( > if ( !CF8_ENABLED(cf8) ) > return X86EMUL_UNHANDLEABLE; > > - reg = hvm_pci_decode_addr(cf8, addr, &sbdf); > + sbdf.bdf = CF8_BDF(cf8); > + reg = CF8_REG(cf8) | (addr & 3); > > if ( !vpci_access_allowed(reg, size) ) > return X86EMUL_OKAY; > diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c > index 0bdcca1e1a5f..325a9d118e52 100644 > --- a/xen/arch/x86/hvm/ioreq.c > +++ b/xen/arch/x86/hvm/ioreq.c > @@ -285,27 +285,12 @@ bool arch_ioreq_server_get_type_addr(const struct > domain *d, > (p->addr & ~3) == 0xcfc && > CF8_ENABLED(cf8) ) > { > - unsigned int x86_fam, reg; > - pci_sbdf_t sbdf; > - > - reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf); > + pci_sbdf_t sbdf = { .bdf = CF8_BDF(cf8) }; > + unsigned int reg = CF8_REG(cf8) | (p->addr & 3); > > /* PCI config data cycle */ > *type = XEN_DMOP_IO_RANGE_PCI; > *addr = ((uint64_t)sbdf.sbdf << 32) | reg; > - /* AMD extended configuration space access? */ > - if ( CF8_ADDR_HI(cf8) && > - d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && > - (x86_fam = get_cpu_family( > - d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 && > - x86_fam < 0x17 ) > - { > - uint64_t msr_val; > - > - if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) && > - (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) ) > - *addr |= CF8_ADDR_HI(cf8); > - } > } > else > { > diff --git a/xen/arch/x86/include/asm/hvm/io.h > b/xen/arch/x86/include/asm/hvm/io.h > index 54e0161b492c..3f3fb6403ccb 100644 > --- a/xen/arch/x86/include/asm/hvm/io.h > +++ b/xen/arch/x86/include/asm/hvm/io.h > @@ -144,10 +144,6 @@ void stdvga_deinit(struct domain *d); > > extern void hvm_dpci_msi_eoi(struct domain *d, int vector); > > -/* Decode a PCI port IO access into a bus/slot/func/reg. */ > -unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr, > - pci_sbdf_t *sbdf); > - > /* > * HVM port IO handler that performs forwarding of guest IO ports into > machine > * IO ports. > diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h > index f4a58c8acf13..3b814f4ebacf 100644 > --- a/xen/arch/x86/include/asm/pci.h > +++ b/xen/arch/x86/include/asm/pci.h > @@ -3,10 +3,32 @@ > > #include <xen/mm.h> > > +/* > + * PCI config space accesses with CF8/CFC: > + * > + * 1) Write {Enable | BDF | Reg} to CF8 to set an address > + * 2) Read or write CF{C..F} to access the register > + * > + * For sub-dword register accesses, the bottom two register address bits come > + * from the CF{C..F} address, not from CF8. > + * > + * AMD have extention to this protocol to access PCIe Extend Config Space by > + * storing the 4 extra register address bits in the penultimate nibble of > CF8. > + * This extention: > + * - Is unconditionally active on Fam17h and later > + * - Has model specific enablement on Fam10h thru Fam16h > + * - Has reserved behaviour in all other cases, including other vendors > + * > + * For simplicity and because we are permitted to, given "reserved", Xen > + * always treats ECS as active when emulating guest PCI config space > accesses. > + */ > #define CF8_BDF(cf8) ( ((cf8) & 0x00ffff00) >> 8) > -#define CF8_ADDR_LO(cf8) ( (cf8) & 0x000000fc) > -#define CF8_ADDR_HI(cf8) ( ((cf8) & 0x0f000000) >> 16) > #define CF8_ENABLED(cf8) (!!((cf8) & 0x80000000)) > +#define CF8_REG(cf8) \ > + ({ \ > + unsigned int _c = cf8; \ > + ((_c & 0x0f000000) >> 16) | (_c & 0xfc); \ > + }) What happens on Intel when the bit is set, is it just ignored? We only allow such accesses for dom0 anyway. > > #define IS_SNB_GFX(id) (id == 0x01068086 || id == 0x01168086 \ > || id == 0x01268086 || id == 0x01028086 \ > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c > index 5da00e24e4ff..008367195c78 100644 > --- a/xen/arch/x86/pv/emul-priv-op.c > +++ b/xen/arch/x86/pv/emul-priv-op.c > @@ -245,19 +245,7 @@ static bool pci_cfg_ok(struct domain *currd, unsigned > int start, > if ( ro_map && test_bit(machine_bdf, ro_map) ) > return false; > } > - start |= CF8_ADDR_LO(currd->arch.pci_cf8); > - /* AMD extended configuration space access? */ > - if ( CF8_ADDR_HI(currd->arch.pci_cf8) && > - boot_cpu_data.x86_vendor == X86_VENDOR_AMD && > - boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 < 0x17 ) > - { > - uint64_t msr_val; > - > - if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) ) > - return false; > - if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) ) > - start |= CF8_ADDR_HI(currd->arch.pci_cf8); > - } > + start |= CF8_REG(currd->arch.pci_cf8); > > return !write ? > xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf, > @@ -1104,6 +1092,11 @@ static int cf_check write_msr( > if ( !is_hwdom_pinned_vcpu(curr) ) > return X86EMUL_OKAY; > if ( (rdmsr_safe(MSR_AMD64_NB_CFG, temp) != 0) || > + /* > + * TODO: this is broken. What happens when dom0 is pinned but > + * can't see the full system? CF8_EXT probably ought to be a > + * Xen-owned setting, and made symmetric across the system. > + */ I would assume CF8_EXT would be symmetric across the system, specially given that it seems to be phased out and only used in older AMD families that where all symmetric? Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |