[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] ResettRe: [v5][PATCH 0/5] xen: add Intel IGD passthrough support



On Wed, 2014-07-02 at 18:12 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 02, 2014 at 04:50:15PM +0200, Paolo Bonzini wrote:
> > Il 02/07/2014 16:00, Konrad Rzeszutek Wilk ha scritto:
> > >With this long thread I lost a bit context about the challenges
> > >that exists. But let me try summarizing it here - which will hopefully
> > >get some consensus.
> > >
> > >1). Fix IGD hardware to not use Southbridge magic addresses.
> > >    We can moan and moan but I doubt it is going to change.
> > 
> > There are two problems:
> > 
> > - Northbridge (i.e. MCH i.e. PCI host bridge) configuration space addresses
> > 
> > - Southbridge (i.e. PCH i.e. ISA bridge) vendor/device ID; some versions of
> > the driver identify it by class, some versions identify it by slot (1f.0).
> > 
> > To solve the first, make a new machine type, PIIX4-based, and pass through
> > the registers you need.  The patch must document _exactly_ why the registers
> > are safe to pass.  If they are not reserved on PIIX4, the patch must
> > document what the same offsets mean on PIIX4, and why it's sensible to
> > assume that firmware for virtual machine will not read/write them.  Bonus
> > point for also documenting the same for Q35.
> > 
> > Regarding the second, fixing IGD hardware to not rely on chipset magic is a
> > no-go, I agree.  I disagree that it's a no-go to define a "backdoor" that
> > lets a hypervisor pass the right information to the driver without hacking
> > the chipset device model.
> > 
> > The hardware folks would have to give us a place for a pair of registers
> > (something like data/address), and a bit somewhere else that would be always
> > 0 on hardware and always 1 if the hypervisor is implementing the pair of
> > registers.  This is similar to CPUID, which has the HYPERVISOR bit +
> > hypervisor-defined leaves at 0x40000000.
> > 
> > The data/address pair could be in a BAR, in configuration space, in the low
> > VGA ports at 0x3c0-0x3df, wherever.  The hypervisor bit can be in the same

This all looks like wishful thinking to me.  Just look though the i915
driver, hardware seems to arbitrarily change between chips and I expect
the drivers have a hard enough time supporting real hardware.  I would
like to see a concise document/comment from Intel listing which
registers, opregions, gtt mappings are required to be mirrored to the
guest and what needs write access through to the host per device
generation though.  The dependency on MCH/PCH IDs is only part of the
story.  Things like opregions and gtt mappings may require identity
mapping to the host and therefore require reserved memory regions and
guest access.  In order to provide that access, we need to know exactly
what we're providing access to and whether it compromises the host
isolation.

I do want to note that we should not add any dependency on VGA space if
we do go the path of a paravirt interface.  VGA routing is a nightmare
and for a VFIO path forward, I think we'll want to rely on legacy-free
UEFI drivers.

> > place or somewhere else---again, whatever is convenient for the hardware
> > folks.  We just need *one bit* that is known-zero on all hardware, and 8
> > bytes in a reserved area.  I don't think it's too hard to find this space,
> > and I really, really would like Intel to follow up on a paravirtualized
> > backdoor.
> > 
> > That said, we have the problem of existing guests, so I agree something else
> > is needed.
> > 
> > >     a) Two bridges - one 'passthrough' and the legacy ISA bridge
> > >        that QEMU emulates. Both Linux and Windows are OK with
> > >        two bridges (even thought it is pretty weird).
> > 
> > This is pretty much the only solution for existing Linux guests that look up
> > the southbridge by class.
> > 
> > The proposed solution here is to define a new "pci stub" device in QEMU that
> > lets you define a do-nothing device with your desired vendor ID, device ID,
> > class and optionally subsystem IDs.
> > 
> > The new machine type (the one that instantiates the special
> > IGD-passthrough-enabled northbridge) can then instantiate this stub device
> > at 1f.0 with the desired vendor ID, device ID and class ID.
> > 
> > If we cannot get the paravirtualized backdoor, it would also make sense to:
> > 
> > - have drivers standardize on a single way to probe the southbridge
> > 
> > - make this be neither by class (because the firmware wants to distinguish
> > the actual ISA bridge from the stub, and it can do so by looking up the
> > class), nor by slot (because this conflicts with the Q35 chipset model that
> > has the southbridge at 1f.0).
> > 
> > mst's proposal was to probe by subsystem id.  I'm not sure I understood the
> > details exactly, but I trust him. :)  However, in case it wasn't clear I
> > think a paravirtualized backdoor would still be better.
> 
> This was a paravirtualized idea actually.
> Since ISA bridge is just needed for type
> identification, stick this info in subsystem device id.
> guest could do
>       if subsystem vendor id == xen then
>               get type from subsystem device id
> 
> does not address pci host registers.
> 
> 
> > >     b) One bridge - the one that QEMU emulates - and lets emulate
> > >        more of the registers (by emulate - I mean for some get the
> > >        data from the real hardware).
> > >
> > >           b1). We can't use the legacy because the registers are
> > >                above 256 (is that correct? Did I miss something?)
> > 
> > As I understand it, mst brought up Q35 because the northbridge configuration
> > space layout might be more similar to what the driver expects than for
> > PIIX4.  But I don't think anyone really said whether this is true or false.
> > 
> > I think Q35 is absolutely not a requirement for IGD passthrough, especially
> > until this statement is either proved or disproved.
> 
> 
> I kept saying MCH/PCH, people keep hearing Q35.
> 
> There is actually c:
> 
> 
> c) the ideal solution, clearly superior to the above set of hacks (a)
> and (b), is to actually emulate the chipsets that include the necessary
> cards.
> 
> For example, if you want to support Lynxpoint cards,
> emulate Z87.
> 
> Q35 codebase could be a good starting point in that work.

This would only guarantee that QEMU never supports the latest hardware.
Also, is it really sufficient to have a virtual implementation of the
host chipset or do we also still need to populate registers in that
virtual chipset with host values?  At least we'd know that the
passthrough registers don't conflict, but that may be the only problem
it solves.

> > >4). Code does a bit of sysfs that could use some refacturing with
> > >    the KVM code.
> > >    Problem: More time needed to do the code restructing.
> > 
> > FWIW, I don't really care about code sharing with KVM.  That's a separate
> > problem and it's not necessary to bring it up and make waters even more
> > muddy.
> > 
> > Paolo
> 
> I agree here.
> I would like to see everything passed to these fake devices through
> properties though.
> xen could populate these properties from sysfs, kvm
> could later do its own thing whatever it will be.

I expect that a VFIO solution would also want to populate from sysfs.  I
think we'll need a new "device specific" VFIO region to provide access
to non-PCI resources of IGD, but I don't want that to include random
registers on other devices.  It would really throw a wrench in the
isolation of a guest if IGD passthrough were to require write access to
registers or config space on the MCH/PCH though.  Thanks,

Alex




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.