[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

On Mon, 24 Jul 2017 21:39:08 +0100
Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
> > But, the problem is that overall MMIO hole(s) requirements are not known
> > exactly at the time the HVM domain being created. Some PCI devices will
> > be emulated, some will be merely passed through and yet there will be
> > some RMRR ranges. libxl can't know all this stuff - some comes from the
> > host, some comes from DM. So actual MMIO requirements are known to
> > hvmloader at the PCI bus enumeration time.
> >   
> IMO hvmloader shouldn't really be allowed to relocate memory under any
> conditions. As Andrew said it's much easier to provision the hole
> statically in libxl during domain construction process and it doesn't
> really compromise any functionality. Having one more entity responsible
> for guest memory layout only makes things more convoluted.

If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
please let it be discussed on xen-devel first as it may affect many
people... and not all of them might be happy. :)

(tons of IMO and TLDR ahead, be warned)

Moving PCI BAR allocation from guest side to libxl is a controversial step.
This may be the architecturally wrong way in fact. There are properties and
areas of responsibility. Among primary responsibilities of guest's firmware
is PCI BARs and MMIO hole size allocation. That's a guest's territory.
Guest relocates PCI BARs (and not just BIOS able to do this), guest
firmware relocates MMIO hole base for them. If it was a real system, all
tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
them and generating ACPI tables. And that's ok as hvmloader can be
considered merely a 'supplemental' firmware to perform some tasks of
SeaBIOS/OVMF before passing control to them. This solution has some
architecture logic at least and doesn't look bad.

On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
know what the MMIO size value is might be a bad idea.
Aside from some code duplication, straying too far from the real hw paths,
or breaking existing (or future) interfaces this might have some other
negative consequences. Ex. who will be initializing guest's ACPI tables if
only libxl will know the memory layout? Some new interfaces between libxl
and hvmloader just to let the latter know what values to write to ACPI
tables being created? Or libxl will be initializing guest's ACPI tables as
well (another guest's internal task)? Similar concerns are applicable to
guest's final E820 construction.

Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
of the device model (at least for now). Right now it's QEMU who traps PCI
BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
QEMU already talks to Xen which ranges should be passed through or trapped
-- it can tell him the current overall MMIO limits as well... or handle
these limits himself -- if the MMIO hole range check is all what required to
avoid MMIO space misusing, this check can be easily implemented in QEMU,
provided that QEMU knows what memory/MMIO layout is. There is a lot of
implementation freedom where to place restrictions and checks, Xen or QEMU.
Strictly speaking, the MMIO hole itself can be considered a property of the
emulated machine and may have implementation differences for different
emulated chipsets. For example, the real i440' NB do not have an idea of
high MMIO hole at all.

We have already a sort of an interface between hvmloader and QEMU --
hvmloader has to do basic initialization for some emulated chipset's
registers (and this depends on the machine). Providing additional handling
for few other registers (TOM/TOLUD/etc) will cost almost nothing and
purpose of this registers will actually match their usage in real HW. This
way we can use an existing available interface and don't stray too far from
the real HW ways. 

I want to try this approach for Q35 bringup patches for Xen I'm currently
working on. I'll send these patches as RFC and will be glad to receive some
constructive criticism.

> > libxl can be taught to retrieve all missing info from QEMU, but this way
> > will require to perform all grunt work of PCI BARs allocation in libxl
> > itself - in order to calculate the real MMIO hole(s) size, one needs to
> > take into account all PCI BARs sizes and their alignment requirements
> > diversity + existing gaps due to RMRR ranges... basically, libxl will
> > need to do most of hvmloader/pci.c's job.
> >   
> The algorithm implemented in hvmloader for that is not complicated and
> can be moved to libxl easily. What we can do is to provision a hole big
> enough to include all the initially assigned PCI devices. We can also
> account for emulated MMIO regions if necessary. But, to be honest, it
> doesn't really matter since if there is no enough space in lower MMIO
> hole for some BARs - they can be easily relocated to upper MMIO
> hole by hvmloader or the guest itself (dynamically).

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.