[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on VMware hypervisor



Ajay Kaher <akaher@xxxxxxxxxx> writes:

> During boot-time there are many PCI config reads, these could be performed
> either using Port IO instructions (PIO) or memory mapped I/O (MMIO).
>
> PIO are less efficient than MMIO, they require twice as many PCI accesses
> and PIO instructions are serializing. As a result, MMIO should be preferred
> when possible over PIO.
>
> Virtual Machine test result using VMware hypervisor
> 1 hundred thousand reads using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 8.517 seconds (~33.5% faster then PIO)
>
> Currently, when these reads are performed by a virtual machine, they all
> cause a VM-exit, and therefore each one of them induces a considerable
> overhead.
>
> This overhead can be further improved, by mapping MMIO region of virtual
> machine to memory area that holds the values that the “emulated hardware”
> is supposed to return. The memory region is mapped as "read-only” in the
> NPT/EPT, so reads from these regions would be treated as regular memory
> reads. Writes would still be trapped and emulated by the hypervisor.
>
> Virtual Machine test result with above changes in VMware hypervisor
> 1 hundred thousand read using raw_pci_read() took:
> PIO: 12.809 seconds
> MMIO: 0.010 seconds
>
> This helps to reduce virtual machine PCI scan and initialization time by
> ~65%. In our case it reduced to ~18 mSec from ~55 mSec.
>
> MMIO is also faster than PIO on bare-metal systems, but due to some bugs
> with legacy hardware and the smaller gains on bare-metal, it seems prudent
> not to change bare-metal behavior.

Out of curiosity, are we sure MMIO *always* works for other hypervisors
besides Vmware? Various Hyper-V version can probably be tested (were
they?) but with KVM it's much harder as PCI is emulated in VMM and
there's certainly more than 1 in existence...

>
> Signed-off-by: Ajay Kaher <akaher@xxxxxxxxxx>
> ---
> v1 -> v2:
> Limit changes to apply only to VMs [Matthew W.]
> ---
>  arch/x86/pci/common.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index ddb7986..1e5a8f7 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -20,6 +20,7 @@
>  #include <asm/pci_x86.h>
>  #include <asm/setup.h>
>  #include <asm/irqdomain.h>
> +#include <asm/hypervisor.h>
>  
>  unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
>                               PCI_PROBE_MMCONF;
> @@ -57,14 +58,58 @@ int raw_pci_write(unsigned int domain, unsigned int bus, 
> unsigned int devfn,
>       return -EINVAL;
>  }
>  
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +static int vm_raw_pci_read(unsigned int domain, unsigned int bus, unsigned 
> int devfn,
> +                                             int reg, int len, u32 *val)
> +{
> +     if (raw_pci_ext_ops)
> +             return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> +     if (domain == 0 && reg < 256 && raw_pci_ops)
> +             return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> +     return -EINVAL;
> +}
> +
> +static int vm_raw_pci_write(unsigned int domain, unsigned int bus, unsigned 
> int devfn,
> +                                             int reg, int len, u32 val)
> +{
> +     if (raw_pci_ext_ops)
> +             return raw_pci_ext_ops->write(domain, bus, devfn, reg, len, 
> val);
> +     if (domain == 0 && reg < 256 && raw_pci_ops)
> +             return raw_pci_ops->write(domain, bus, devfn, reg, len, val);
> +     return -EINVAL;
> +}

These look exactly like raw_pci_read()/raw_pci_write() but with inverted
priority. We could've added a parameter but to be more flexible, I'd
suggest we add a 'priority' field to 'struct pci_raw_ops' and make
raw_pci_read()/raw_pci_write() check it before deciding what to use
first. To be on the safe side, you can leave raw_pci_ops's priority
higher than raw_pci_ext_ops's by default and only tweak it in
arch/x86/kernel/cpu/vmware.c 

> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
>  static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, int 
> size, u32 *value)
>  {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +     /*
> +      * MMIO is faster than PIO, but due to some bugs with legacy
> +      * hardware, it seems prudent to prefer MMIO for VMs and PIO
> +      * for bare-metal.
> +      */
> +     if (!hypervisor_is_type(X86_HYPER_NATIVE))
> +             return vm_raw_pci_read(pci_domain_nr(bus), bus->number,
> +                                      devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
>       return raw_pci_read(pci_domain_nr(bus), bus->number,
>                                devfn, where, size, value);
>  }
>  
>  static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, int 
> size, u32 value)
>  {
> +#ifdef CONFIG_HYPERVISOR_GUEST
> +     /*
> +      * MMIO is faster than PIO, but due to some bugs with legacy
> +      * hardware, it seems prudent to prefer MMIO for VMs and PIO
> +      * for bare-metal.
> +      */
> +     if (!hypervisor_is_type(X86_HYPER_NATIVE))
> +             return vm_raw_pci_write(pci_domain_nr(bus), bus->number,
> +                                       devfn, where, size, value);
> +#endif /* CONFIG_HYPERVISOR_GUEST */
> +
>       return raw_pci_write(pci_domain_nr(bus), bus->number,
>                                 devfn, where, size, value);
>  }

-- 
Vitaly




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.