[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API



On Wed, 2015-07-29 at 10:17 +0200, Paolo Bonzini wrote:
> 
> On 29/07/2015 02:47, Andy Lutomirski wrote:
> > > > If new kernels ignore the IOMMU for devices that don't set the flag
> > > > and there are physical devices that already exist and don't set the
> > > > flag, then those devices won't work reliably on most modern
> > > > non-virtual platforms, PPC included.
> > >
> > > Are there many virtio physical devices out there ? We are talking about
> > > a virtio flag right ? Or have you been considering something else ?
> >
> > Yes, virtio flag.  I dislike having a virtio flag at all, but so far
> > no one has come up with any better ideas.  If there was a reliable,
> > cross-platform mechanism for per-device PCI bus properties, I'd be all
> > for using that instead.
> 
> No, a virtio flag doesn't make sense.

It wouldn't if we were creating virtio from scratch.

However we have to be realistic here, we are contending with existing
practices and implementation. The fact is qemu *does* bypass any iommu
and has been doing so for a long time, *and* the guest drivers are
written today *also* bypassing all DMA mapping mechanisms and just
passing everything accross.

So if it's a bug, it's a bug on both sides of the fence. We are no
longer in "bug fixing" territory here, it's a fundamental change of ABI.
The ABI might not be what was intended (but that's arguable, see below),
but it is that way.

Arguably it was even known and considered a *feature* by some (including
myself) at the time. It somewhat improved performances on archs where
otherwise every page would have to be mapped/unmapped in guest iommu. In
fact, it also makes vhost a lot easier.

So I disagree, it's de-facto a feature (even if unintended) of the
existing virtio implementations and changing that would be a major
interface change, and thus should be exposed as such.

> Blindly using system memory is a bug in QEMU; it has to be fixed to use
> the right address space, and then whatever the system provides to
> describe "the right address space" can be used (like the DMAR table on x86).

Except that it's not so easy.

For example, on PPC PAPR guests, there is no such thing as a "no IOMMU"
space, the concept doesn't exist. So we have at least three things to
deal with:

 - Existing guests, so we must preserve the existing behaviour for
backward compatibility.

 - vhost is made more complex because it now needs to be informed of the
guest iommu updates

 - New guests have the "new driver" that knows how to map and unmap
would have a performance loss unless some mechanism to create a "no
iommu" space exists which for us would need to be added. Either that or
we rely on DDW which is a way for a guest to create a permanent mapping
of its entire address space in an IOMMU but that incur a significant
waste of host kernel memory.

> On PPC I suppose you could use the host bridge's device tree?  If you
> need a hook, you can add a

No because we can mix and match virtio and other devices on the same
host bridge. Unless we put a property that only applies to virtio
children of the host bridge.

>       bool virtio_should_bypass_iommu(void)
>       {
>               /* lookup something in the device tree?!? */
>       }
>       EXPORT_SYMBOL_GPL(virtio_should_bypass_iommu);
> 
> in some pseries.c file, and in the driver:
> 
>       static bool virtio_bypass_iommu(void)
>       {
>               bool (*fn)(void);
>       
>               fn = symbol_get(virtio_should_bypass_iommu);
>               return fn && fn();
>       }
> 
> Awful, but that's what this thing is.

Ben.

> Paolo



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.