[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor
On 29.09.22 07:36, Ajay Kaher wrote: On 13/09/22, 7:05 PM, "Vitaly Kuznetsov" <vkuznets@xxxxxxxxxx> wrote:Thanks Vitaly for your response. 1. we have multiple objects of struct pci_raw_ops, 2. adding 'priority' field to struct pci_raw_ops doesn't seems to be appropriate as need to take decision which object of struct pci_raw_ops has to be used, not something with-in struct pci_raw_ops.I'm not sure I follow, you have two instances of 'struct pci_raw_ops' which are called 'raw_pci_ops' and 'raw_pci_ext_ops'. What if you do something like (completely untested): diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 70533fdcbf02..fb8270fa6c78 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -116,6 +116,7 @@ extern void (*pcibios_disable_irq)(struct pci_dev *dev); extern bool mp_should_keep_irq(struct device *dev); struct pci_raw_ops { + int rating; int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 *val); int (*write)(unsigned int domain, unsigned int bus, unsigned int devfn, diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index ddb798603201..e9965fd11576 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -40,7 +40,8 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops; int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 *val) { - if (domain == 0 && reg < 256 && raw_pci_ops) + if (domain == 0 && reg < 256 && raw_pci_ops && + (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating)) return raw_pci_ops->read(domain, bus, devfn, reg, len, val); if (raw_pci_ext_ops) return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val); @@ -50,7 +51,8 @@ int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn, int raw_pci_write(unsigned int domain, unsigned int bus, unsigned int devfn, int reg, int len, u32 val) { - if (domain == 0 && reg < 256 && raw_pci_ops) + if (domain == 0 && reg < 256 && raw_pci_ops && + (!raw_pci_ext_ops || raw_pci_ext_ops->rating <= raw_pci_ops->rating)) return raw_pci_ops->write(domain, bus, devfn, reg, len, val); if (raw_pci_ext_ops) return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, val); and then somewhere in Vmware hypervisor initialization code (arch/x86/kernel/cpu/vmware.c) you do raw_pci_ext_ops->rating = 100;Thanks Vitaly, for your review and helping us to improve the code. I was working to make changes as you suggested, but before sending v3 would like to discuss on following: If we add rating with-in struct pci_raw_ops then we can't have pci_mmcfg as const, and following change is must in arch/x86/pci/mmconfig_64.c: -const struct pci_raw_ops pci_mmcfg = { +struct pci_raw_ops pci_mmcfg = { .read = pci_mmcfg_read, .write = pci_mmcfg_write, }; So to avoid this change, is it fine to have global bool prefer_raw_pci_ext_ops? And raw_pci_read() will have following change: - if (domain == 0 && reg < 256 && raw_pci_ops) + if (domain == 0 && reg < 256 && raw_pci_ops && + (!prefer_raw_pci_ext_ops || !raw_pci_ext_ops)why wouldn't it work? (diclaimer: completely untested, raw_pci_ops/raw_pci_ext_ops initialization has to be checked so 'rating' is not garbage).It's a generic solution for all hypervisor (sorry for earlier wrong Subject), not specific to VMware. Further looking for feedback if it's impacting to any hypervisor.That's the tricky part. We can check modern hypervisor versions, but what about all other versions in existence? How can we know that there's no QEMU/Hyper-V/... version out there where MMIO path is broken? I'd suggest we limit the change to Vmware hypervisor, other hypervisors may use the same mechanism (like the one above) later (but the person suggesting the patch is always responsible for the research why it is safe to do so).Ok, as of now we will make this change specific to VMware hypervisor. Is there a way we can make it an ACPI property in MCFG to have the environment self-describe the fact that it's safe to do ECAM access for config space access over legacy PIO? That way we don't need to patch guests every time a hypervisor decides that it's safe to prefer ECAM. Also, Michael (CC'ed) mentioned that according to spec, your PCIe host bridge with PCI_COMMAND.MEMORY=0 would stop responding to its ECAM window. Given that most ARM systems have no PIO fallback path, we want to make sure we never hit that condition. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |