[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using Restricted DMA for virtio-pci



On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote:
> On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote:
> > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote:
> > > > 
> > > > I don't mind as such (though I don't understand completely), but since
> > > > this is changing the device anyway, I am a bit confused why you can't
> > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit?  This forces DMA API
> > > > which will DTRT for you, will it not?
> > > 
> > > That would be necessary but not sufficient. ...
> 
> could you explain pls?

There was more to that in the previous email which I elided for this
followup.

https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@xxxxxxxxxxxxx/

> > My first cut at a proposed spec change looks something like this. I'll
> > post it to the virtio-comment list once I've done some corporate
> > bureaucracy and when the list stops sending me python tracebacks in
> > response to my subscribe request.
> 
> the linux foundation one does this? maybe poke at the admins.
> 
> > In the meantime I'll hack up some QEMU and guest Linux driver support
> > to match.
> > 
> > diff --git a/content.tex b/content.tex
> > index c17ffa6..1e6e1d6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
> > Feature Bits}
> >  Currently these device-independent feature bits are defined:
> >  
> >  \begin{description}
> > +  \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device
> > +  provides a memory region which is to be used for bounce buffering,
> > +  rather than permitting direct memory access to system memory.
> >    \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates
> >    that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> >    flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
> > Feature Bits}
> >  VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only 
> > physical
> >  addresses to the device.
> >  
> > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST
> > +then pass only addresses within the Software IOTLB bounce buffer to the
> > +device.
> > +
> >  A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered.
> >  
> >  A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered.
> > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved 
> > Feature Bits}
> >  A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not
> >  accepted.
> >  
> > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not
> > +provide a Software IOTLB bounce buffer.
> > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted.
> > +
> >  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> >  buffers in the same order in which they have been available.
> >  
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..23e0d57 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI 
> > Capabilities}\label{sec:Virtio Transport Option
> >  \item ISR Status
> >  \item Device-specific configuration (optional)
> >  \item PCI configuration access
> > +\item SWIOTLB bounce buffer
> >  \end{itemize}
> >  
> >  Each structure can be mapped by a Base Address register (BAR) belonging to
> > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI 
> > Capabilities}\label{sec:Virtio Transport Option
> >  #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
> >  /* Vendor-specific data */
> >  #define VIRTIO_PCI_CAP_VENDOR_CFG        9
> > +/* Software IOTLB bounce buffer */
> > +#define VIRTIO_PCI_CAP_SWIOTLB           10
> >  \end{lstlisting}
> >  
> >          Any other value is reserved for future use.
> > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio
> >  The driver MUST qualify the \field{vendor_id} before
> >  interpreting or writing into the Vendor data capability.
> >  
> > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The optional Software IOTLB bounce buffer capability allows the
> > +device to provide a memory region which can be used by the driver
> > +driver for bounce buffering. This allows a device on the PCI
> > +transport to operate without DMA access to system memory addresses.
> > +
> > +The Software IOTLB region is referenced by the
> > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced
> > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM
> > +capability, if negotiated.
> 
> 
> why not? an optimization?
> A mix of swiotlb and system memory might be very challenging from POV
> of ordering.

Conceptually, these addresses are *on* the PCI device. If the device is
accessing addresses which are local to it, they aren't subject to IOMMU
translation/filtering because they never even make it to the PCI bus as
memory transactions.

> 
> > +
> > +\devicenormative{\paragraph}{Software IOTLB bounce buffer 
> > capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +Devices which present the Software IOTLB bounce buffer capability
> > +SHOULD also offer the VIRTIO_F_SWIOTLB feature.
> > +
> > +\drivernormative{\paragraph}{Software IOTLB bounce buffer 
> > capability}{Virtio
> > +Transport Options / Virtio Over PCI Bus / PCI Device Layout /
> > +Software IOTLB bounce buffer capability}
> > +
> > +The driver SHOULD use the offered buffer in preference to passing system
> > +memory addresses to the device.
> 
> Even if not using VIRTIO_F_SWIOTLB? Is that really necessary?

That part isn't strictly necessary, but I think it makes sense, for
cases where the SWIOTLB support is an *optimisation* even if it isn't
strictly necessary.

Why might it be an "optimisation"? Well... if we're thinking of a model
like pKVM where the VMM can't just arbitrarily access guest memory,
using the SWIOTLB is a simple way to avoid that (by using the on-board
memory instead, which *can* be shared with the VMM).

But if we want to go to extra lengths to support unenlightened guests,
an implementation might choose to just *disable* the memory protection
if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking
that guest.

Or it might have a complicated emulation/snooping of virtqueues in the
trusted part of the hypervisor so that it knows which addresses the
guest has truly *asked* the VMM to access. (And yes, of course that's
what an IOMMU is for, but when have you seen hardware companies design
a two-stage IOMMU which supports actual PCI passthrough *and* get it
right for the hypervisor to 'snoop' on the stage1 page tables to
support emulated devices too....)

Ultimately I think it was natural to advertise the location of the
buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the
separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious
question of what a device should do if it sees one but *not* the other.

Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually
being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its
equivalent for other transports). But the converse seemed reasonable as
a *hint* even if the use of the SWIOTLB isn't mandatory.

Attachment: smime.p7s
Description: S/MIME cryptographic signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.