Xen project Mailing List

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Alexey G <x1917x@xxxxxxxxx>

Date: Tue, 25 Jul 2017 04:34:58 +1000

Cc: "Zhang, Xiong Y" <xiong.y.zhang@xxxxxxxxx>, Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 24 Jul 2017 18:35:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 24 Jul 2017 18:01:39 +0100 Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 24/07/17 17:42, Alexey G wrote: > > Hi, > > > > On Mon, 24 Jul 2017 10:53:16 +0100 > > Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote: > >>> [Zhang, Xiong Y] Thanks for your suggestion. > >>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue. > >>> For this I still have two questions, could you help me ? > >>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see > >>> a different guest memory layout . So qemu ram maybe overlop with mmio, > >>> does xen have plan to fix this ? > >> hvmloader doesn't do memory relocation - this ability is turned off by > >> default. The reason for the issue is that libxl initially sets the size > >> of lower MMIO hole (based on the RMRR regions present and their size) > >> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument. > >> > >> When you set 'mmio_hole' size parameter you basically forces libxl to > >> pass this argument to QEMU. > >> > >> That means the proper fix would be to make libxl to pass this argument > >> to QEMU in case there are RMRR regions present. > > I tend to disagree a bit. > > What we lack actually is some way to perform a 'dynamical' physmem > > relocation, when a guest domain is running already. Right now it works > > only in the 'static' way - i.e. if memory layout was known for both > > QEMU and hvmloader before starting a guest domain and with no means of > > arbitrarily changing this layout at runtime when hvmloader runs. > > > > But, the problem is that overall MMIO hole(s) requirements are not known > > exactly at the time the HVM domain being created. Some PCI devices will > > be emulated, some will be merely passed through and yet there will be > > some RMRR ranges. libxl can't know all this stuff - some comes from the > > host, some comes from DM. So actual MMIO requirements are known to > > hvmloader at the PCI bus enumeration time. > > > > libxl can be taught to retrieve all missing info from QEMU, but this way > > will require to perform all grunt work of PCI BARs allocation in libxl > > itself - in order to calculate the real MMIO hole(s) size, one needs to > > take into account all PCI BARs sizes and their alignment requirements > > diversity + existing gaps due to RMRR ranges... basically, libxl will > > need to do most of hvmloader/pci.c's job. > > > > My 2kop opinion here is that we don't need to move all PCI BAR > > allocation to libxl, or invent some new QMP-interfaces, or introduce > > new hypercalls or else. A simple and somewhat good solution would be to > > implement this missing hvmloader <-> QEMU interface in the same manner > > how it is done in real hardware. > > > > When we move some part of guest memory in 4GB range to address space > > above 4GB via XENMEM_add_to_physmap, we basically perform what chipset's > > 'remap' (aka reclaim) does. So we can implement this interface between > > hvmloader and QEMU via providing custom emulation for MCH's > > remap/TOLUD/TOUUD stuff in QEMU if xen_enabled(). > > > > In this way hvmloader will calculate MMIO hole sizes as usual, relocate > > some guest RAM above 4GB base and communicate this information to QEMU > > via emulated host bridge registers -- so then QEMU will sync its memory > > layout info to actual physmap's. > > Qemu isn't the only entity which needs to know. There is currently an > attack surface via Xen by virtue of the fact that any hole in the p2m > gets emulated and forwarded to qemu. (Two problems caused by this are a > qemu segfault and qemu infinite loop.) > > The solution is to have Xen know which gfn ranges are supposed to be > MMIO, and terminate the access early if the guest frame falls outside of > the MMIO range. > > Doing this by working it out statically at domain creation time is far > more simple for all components involved. Well, I'm not acquainted with these issues, but this looks like a bit different problem, I think. We can possibly provide a fine-grained access to MMIO hole space by tracking all passthru MMIO ranges and IOREQ ranges and restricting accesses/forwarding for all currently 'unassigned' parts of MMIO holes, this will yield the least possible attack surface. This approach deals with the existing ioreq/p2m_mmio_direct ranges concept while one-time restriction of MMIO hole's parts accesses at domain creation time requires to introduce a new vision for guest's MMIO hole(s) for Xen. And not all MMIO-related information will be available in static. There are two things to consider: hotplugging PT PCI devices (some of them may have large Mem BARs) and guest's attempts to change BAR values to some arbitrary base ex. in (a large) high MMIO hole. If the guest sees a large MMIO hole described in DSDT, he should be able to use any part of it to relocate PCI BARs. On other hand, if we limit MMIO hole size in DSDT to some (barely required) minimum, we will have problems when hotplugging PT devices -- the guest OS might see no space to assign their resources. So, some MMIO freedom and space are assumed. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.