[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute



On Fri, Jun 06, 2025 at 01:00:19PM +0000, Tu Dinh wrote:
> Hi Roger,
> 
> On 05/06/2025 18:20, Roger Pau Monne wrote:
> > The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't
> > have the functionality of a traditional PCI device.  The exposed MMIO BAR
> > is used by some guests (including Linux) as a safe place to map foreign
> > memory, including the grant table itself.
> >
> > Traditionally BARs from devices have the uncacheable (UC) cache attribute
> > from the MTRR, to ensure correct functionality of such devices.  hvmloader
> > mimics this behavior and sets the MTRR attributes of both the low and high
> > PCI MMIO windows (where BARs of PCI devices reside) as UC in MTRR.
> >
> > This however causes performance issues for users of the Xen PCI device BAR,
> > as for the purposes of mapping remote memory there's no need to use the UC
> > attribute.  On Intel systems this is worked around by using iPAT, that
> > allows the hypervisor to force the effective cache attribute of a p2m entry
> > regardless of the guest PAT value.  AMD however doesn't have an equivalent
> > of iPAT, and guest PAT values are always considered.
> >
> > Linux commit:
> >
> > 41925b105e34 xen: replace xen_remap() with memremap()
> >
> > Attempted to mitigate this by forcing mappings of the grant-table to use
> > the write-back (WB) cache attribute.  However Linux memremap() takes MTRRs
> > into account to calculate which PAT type to use, and seeing the MTRR cache
> > attribute for the region being UC the PAT also ends up as UC, regardless of
> > the caller having requested WB.
> >
> > As a workaround to allow current Linux to map the grant-table as WB using
> > memremap() introduce an xl.cfg option (xenpci_bar_uc=0) that can be used to
> > select whether the Xen PCI device BAR will have the UC attribute in MTRR.
> > Such workaround in hvmloader should also be paired with a fix for Linux so
> > it attempts to change the MTRR of the Xen PCI device BAR to WB by itself.
> >
> > Overall, the long term solution would be to provide the guest with a safe
> > range in the guest physical address space where mappings to foreign pages
> > can be created.
> >
> > Some vif throughput performance figures provided by Anthoine from a 8
> > vCPUs, 4GB of RAM HVM guest(s) running on AMD hardware:
> >
> > Without this patch:
> > vm -> dom0: 1.1Gb/s
> > vm -> vm:   5.0Gb/s
> >
> > With the patch:
> > vm -> dom0: 4.5Gb/s
> > vm -> vm:   7.0Gb/s
> >
> > Reported-by: Anthoine Bourgeois <anthoine.bourgeois@xxxxxxxxxx>
> > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > ---
> > Changes since v2:
> >   - Add default value in xl.cfg.
> >   - List xenstore path in the pandoc file.
> >   - Adjust comment in hvmloader.
> >   - Fix commit message MIO -> MMIO.
> >
> > Changes since v1:
> >   - Leave the xenpci BAR as UC by default.
> >   - Introduce an option to not set it as UC.
> > ---
> >   docs/man/xl.cfg.5.pod.in                |  8 ++++
> >   docs/misc/xenstore-paths.pandoc         |  5 +++
> >   tools/firmware/hvmloader/config.h       |  2 +-
> >   tools/firmware/hvmloader/pci.c          | 49 ++++++++++++++++++++++++-
> >   tools/firmware/hvmloader/util.c         |  2 +-
> >   tools/include/libxl.h                   |  9 +++++
> >   tools/libs/light/libxl_create.c         |  1 +
> >   tools/libs/light/libxl_dom.c            |  9 +++++
> >   tools/libs/light/libxl_types.idl        |  1 +
> >   tools/xl/xl_parse.c                     |  2 +
> >   xen/include/public/hvm/hvm_xs_strings.h |  2 +
> >   11 files changed, 86 insertions(+), 4 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index c388899306c2..ddbff6fffc16 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -2351,6 +2351,14 @@ Windows 
> > L<https://xenproject.org/windows-pv-drivers/>.
> >   Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
> >   requires at least QEMU 1.6.
> >
> > +
> > +=item B<xenpci_bar_uc=BOOLEAN>
> > +
> > +B<x86 only:> Select whether the memory BAR of the Xen PCI device should 
> > have
> > +uncacheable (UC) cache attribute set in MTRR.
> > +
> > +Default is B<true>.
> > +
> >   =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
> >
> >   The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
> > diff --git a/docs/misc/xenstore-paths.pandoc 
> > b/docs/misc/xenstore-paths.pandoc
> > index 01a340fafcbe..073bed91eec1 100644
> > --- a/docs/misc/xenstore-paths.pandoc
> > +++ b/docs/misc/xenstore-paths.pandoc
> > @@ -234,6 +234,11 @@ These xenstore values are used to override some of the 
> > default string
> >   values in the SMBIOS table constructed in hvmloader. See the SMBIOS
> >   table specification at http://www.dmtf.org/standards/smbios/
> >
> > +#### ~/hvmloader/pci/xenpci-bar-uc = ("1"|"0") [HVM,INTERNAL]
> > +
> > +Select whether the Xen PCI device MMIO BAR will have the uncacheable cache
> > +attribute set in the MTRRs by hvmloader.
> > +
> >   #### ~/bios-strings/oem-* = STRING [HVM,INTERNAL]
> >
> >   1 to 99 OEM strings can be set in xenstore using values of the form
> > diff --git a/tools/firmware/hvmloader/config.h 
> > b/tools/firmware/hvmloader/config.h
> > index 6e1da137d779..c159db30eea9 100644
> > --- a/tools/firmware/hvmloader/config.h
> > +++ b/tools/firmware/hvmloader/config.h
> > @@ -58,7 +58,7 @@ extern uint32_t *cpu_to_apicid;
> >   #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
> >
> >   extern uint32_t pci_mem_start;
> > -extern const uint32_t pci_mem_end;
> > +extern uint32_t pci_mem_end;
> >   extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
> >
> >   extern bool acpi_enabled;
> > diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> > index cc67b18c0361..747f6cfb6794 100644
> > --- a/tools/firmware/hvmloader/pci.c
> > +++ b/tools/firmware/hvmloader/pci.c
> > @@ -30,7 +30,7 @@
> >   #include <xen/hvm/e820.h>
> >
> >   uint32_t pci_mem_start = HVM_BELOW_4G_MMIO_START;
> > -const uint32_t pci_mem_end = RESERVED_MEMBASE;
> > +uint32_t pci_mem_end = RESERVED_MEMBASE;
> >   uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
> >
> >   /*
> > @@ -116,6 +116,8 @@ void pci_setup(void)
> >        * experience the memory relocation bug described below.
> >        */
> >       bool allow_memory_relocate = 1;
> > +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> > +    bool xenpci_bar_uc = false;
> 
> Since this is meant to be a workaround, I wonder if it makes more sense
> to flip the setting (`xenpci_bar_wb`) and make it 0 by default?

I originally didn't want to go that route, because while it's true
that the default MTRR type is set to WB, and so any memory not covered
by a MTRR range will default to that memory type I got the impression
this was inferring too much.

Overall my intention would be for inverting the default long term, and
libxl setting build_info->u.hvm.xenpci_bar_uc = false by default,
which then makes all the naming nicer IMO.

> It also
> simplifies the logic for both hvmloader and the consumer (no need for
> double negatives).

I don't think there are double negatives?  That would happen if the
variable was named xenpci_bar_no_uc or similar?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.