[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks



On Mon, 2016-02-01 at 13:49 +0100, Gerd Hoffmann wrote:
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".ÂÂThat gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
>
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.ÂÂIt verifies the size and does a
> > free/realloc if not the right size.
>Â
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
>Â
> How likely is it that the opregion size ever changes?ÂÂShould we better
> be prepared to handle it?ÂÂOr would it be ok to have a ...
>Â
> ÂÂÂif (opregion_size > 8k)
> ÂÂÂÂÂÂpanic();
>Â
> ... style sanity check?
>Â

The patch below is what I'm working with now, it assumes that the
opregion is 8K, maps, verifies, and re-allocs if it's a different size.
Maybe it is safer to abort if it is over 8K, but we're not actually
clobbering anything with the mapping, we're just temporarily mapping
over it.ÂÂSo if there's not another thread of execution that could be
accessing something there and we're not stepping on our own stack or
data, it doesn't seem like there's a problem.

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c31c2fa..4f3251e 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -257,6 +257,52 @@ static void ich9_smbus_setup(struct pci_device *dev, void *
ÂÂÂÂÂpci_config_writeb(bdf, ICH9_SMB_HOSTC, ICH9_SMB_HOSTC_HST_EN);
Â}
Â
+static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
+{
+ÂÂÂÂu16 bdf = dev->bdf;
+ÂÂÂÂu32 orig;
+ÂÂÂÂvoid *opregion;
+ÂÂÂÂint size = 8;
+
+ÂÂÂÂif (!CONFIG_QEMU)
+ÂÂÂÂÂÂÂÂreturn;
+
+ÂÂÂÂorig = pci_config_readl(bdf, 0xFC);
+
+realloc:
+ÂÂÂÂopregion = malloc_high(size * 1024);
+ÂÂÂÂif (!opregion) {
+ÂÂÂÂÂÂÂÂwarn_noalloc();
+ÂÂÂÂÂÂÂÂreturn;
+ÂÂÂÂ}
+
+ÂÂÂÂ/*
+ÂÂÂÂÂ* QEMU maps the OpRegion into system memory at the address written here,
+ÂÂÂÂÂ* this overlaps our malloc, which marks the range e820 reserved.
+ÂÂÂÂÂ*/
+ÂÂÂÂpci_config_writel(bdf, 0xFC, cpu_to_le32((u32)opregion));
+
+ÂÂÂÂif (memcmp(opregion, "IntelGraphicsMem", 16)) {
+ÂÂÂÂÂÂÂÂpci_config_writel(bdf, 0xFC, orig);
+ÂÂÂÂÂÂÂÂfree(opregion);
+ÂÂÂÂÂÂÂÂreturn; /* the opregion didn't magically appear, not supported */
+ÂÂÂÂ}
+
+ÂÂÂÂif (size == le32_to_cpu(*(u32 *)(opregion + 16))) {
+ÂÂÂÂÂÂÂÂdprintf(1, "Intel IGD OpRegion enabled on %02x:%02x.%x\n",
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂpci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+ÂÂÂÂÂÂÂÂreturn; /* success! */
+ÂÂÂÂ}
+
+ÂÂÂÂpci_config_writel(bdf, 0xFC, orig);
+ÂÂÂÂfree(opregion);
+
+ÂÂÂÂif (size == 8) { /* try once more with a new size */
+ÂÂÂÂÂÂÂÂsize = le32_to_cpu(*(u32 *)(opregion + 16));
+ÂÂÂÂÂÂÂÂgoto realloc;
+ÂÂÂÂ}
+}
+
Âstatic const struct pci_device_id pci_device_tbl[] = {
ÂÂÂÂÂ/* PIIX3/PIIX4 PCI to ISA bridge */
ÂÂÂÂÂPCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0,
@@ -290,6 +336,10 @@ static const struct pci_device_id pci_device_tbl[] = {
ÂÂÂÂÂPCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0017, 0xff00, apple_macio_setup),
ÂÂÂÂÂPCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0022, 0xff00, apple_macio_setup),
Â
+ÂÂÂÂ/* Intel IGD OpRegion setup */
+ÂÂÂÂPCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂintel_igd_opregion_setup),
+
ÂÂÂÂÂPCI_DEVICE_END,
Â};
Â

> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
>Â
> Yes.
>Â
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.ÂÂSound right?
>Â
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.ÂÂSo, the firmware has to allocate space,
> reserve it etc.,ÂÂand programming the 0xfc register.ÂÂQemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Ah, so here is where we'd clobber data in firmware.ÂÂI currently do
this in vfio's pci config write in QEMU:

ÂÂÂÂÂÂÂÂorig = pci_get_long(pdev->config + IGD_OPREGION);
ÂÂÂÂÂÂÂÂpci_default_write_config(pdev, addr, val, len);
ÂÂÂÂÂÂÂÂcur = pci_get_long(pdev->config + IGD_OPREGION);

ÂÂÂÂÂÂÂÂif (cur != orig) {
ÂÂÂÂÂÂÂÂÂÂÂÂif (orig) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmemory_region_del_subregion(get_system_memory(),
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂvdev->igd_opregion->mem);
ÂÂÂÂÂÂÂÂÂÂÂÂ}

ÂÂÂÂÂÂÂÂÂÂÂÂif (cur) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmemory_region_add_subregion(get_system_memory(),
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcur, vdev->igd_opregion->mem);
ÂÂÂÂÂÂÂÂÂÂÂÂ}
ÂÂÂÂÂÂÂÂ}

This means that fw can write 0x0 back to the ASL storage register and
the mapping goes away, no firmware data is overwritten and the overlap
was temporary.ÂÂIf we copy it into the firmware provided buffer with
firmware not knowing the actual size then yes, we've just clobbered
something and it can't be recovered.ÂÂI'll post my patches and we can
hash out whether there's a better approach over something a little more
concrete.ÂÂI can see the opregion gets exposed and the guest driver does
use it.ÂÂI'm not entirely sure what functionality it's adding though
since a cursory test of booting an FC23 live iso image seems to
initialize the display correctly with or without the opregion.

> > > lpc bridge is no problem, only pci id fields are copied over and
> > > unprivileged access is allowed for them.
> > >Â
> > > Copying the gfx registers of the host bridge is a problem indeed.
>
> > I would argue that both are really a problem, libvirt wants to put QEMU
> > in a container that prevents access to any host system files other than
> > those explicitly allowed.ÂÂTherefore libvirt needs to grant the process
> > access to the lpc sysfs config file even though it only needs user
> > visible register values.
>Â
> Yes, correct.ÂÂWe want svirt be as strict as possible.

So it might be a good idea to expose these through vfio.ÂÂWhat about
stolen memory?ÂÂI noted the IOMMU faults that I get when assigning IGD,
the bulk of it seems to be to the memory reserved as stolen for the GPU.
I can avoid those by clearing the guest view of the BDSM register, but I
think then we're just leaving stolen memory unused, which seems rather
wasteful.ÂÂTrying to identity map that stolen memory into the VM so that
we don't need to reconfigure the GPU seems problematic, but if vfio
exposed it as another region, we could do the same trick of mapping into
the VM address space.ÂÂThe size of stolen memory is quite variable, so
we couldn't just assume a size.ÂÂWe'd also need to know how to
reconfigure (and restore) the GPU for a new location, the BDSM register
just reports it.ÂÂThanks,

Alex


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.