[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Using Restricted DMA for virtio-pci
On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote: > On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote: > > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote: > > > > > > > > I don't mind as such (though I don't understand completely), but since > > > > this is changing the device anyway, I am a bit confused why you can't > > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit? This forces DMA API > > > > which will DTRT for you, will it not? > > > > > > That would be necessary but not sufficient. ... > > could you explain pls? There was more to that in the previous email which I elided for this followup. https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@xxxxxxxxxxxxx/ > > My first cut at a proposed spec change looks something like this. I'll > > post it to the virtio-comment list once I've done some corporate > > bureaucracy and when the list stops sending me python tracebacks in > > response to my subscribe request. > > the linux foundation one does this? maybe poke at the admins. > > > In the meantime I'll hack up some QEMU and guest Linux driver support > > to match. > > > > diff --git a/content.tex b/content.tex > > index c17ffa6..1e6e1d6 100644 > > --- a/content.tex > > +++ b/content.tex > > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > > Feature Bits} > > Currently these device-independent feature bits are defined: > > > > \begin{description} > > + \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device > > + provides a memory region which is to be used for bounce buffering, > > + rather than permitting direct memory access to system memory. > > \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates > > that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT > > flag set, as described in \ref{sec:Basic Facilities of a Virtio > > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > > Feature Bits} > > VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only > > physical > > addresses to the device. > > > > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST > > +then pass only addresses within the Software IOTLB bounce buffer to the > > +device. > > + > > A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered. > > > > A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered. > > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved > > Feature Bits} > > A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not > > accepted. > > > > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not > > +provide a Software IOTLB bounce buffer. > > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not accepted. > > + > > If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use > > buffers in the same order in which they have been available. > > > > diff --git a/transport-pci.tex b/transport-pci.tex > > index a5c6719..23e0d57 100644 > > --- a/transport-pci.tex > > +++ b/transport-pci.tex > > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI > > Capabilities}\label{sec:Virtio Transport Option > > \item ISR Status > > \item Device-specific configuration (optional) > > \item PCI configuration access > > +\item SWIOTLB bounce buffer > > \end{itemize} > > > > Each structure can be mapped by a Base Address register (BAR) belonging to > > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI > > Capabilities}\label{sec:Virtio Transport Option > > #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 > > /* Vendor-specific data */ > > #define VIRTIO_PCI_CAP_VENDOR_CFG 9 > > +/* Software IOTLB bounce buffer */ > > +#define VIRTIO_PCI_CAP_SWIOTLB 10 > > \end{lstlisting} > > > > Any other value is reserved for future use. > > @@ -744,6 +747,36 @@ \subsubsection{Vendor data capability}\label{sec:Virtio > > The driver MUST qualify the \field{vendor_id} before > > interpreting or writing into the Vendor data capability. > > > > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / > > +Software IOTLB bounce buffer capability} > > + > > +The optional Software IOTLB bounce buffer capability allows the > > +device to provide a memory region which can be used by the driver > > +driver for bounce buffering. This allows a device on the PCI > > +transport to operate without DMA access to system memory addresses. > > + > > +The Software IOTLB region is referenced by the > > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced > > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM > > +capability, if negotiated. > > > why not? an optimization? > A mix of swiotlb and system memory might be very challenging from POV > of ordering. Conceptually, these addresses are *on* the PCI device. If the device is accessing addresses which are local to it, they aren't subject to IOMMU translation/filtering because they never even make it to the PCI bus as memory transactions. > > > + > > +\devicenormative{\paragraph}{Software IOTLB bounce buffer > > capability}{Virtio > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / > > +Software IOTLB bounce buffer capability} > > + > > +Devices which present the Software IOTLB bounce buffer capability > > +SHOULD also offer the VIRTIO_F_SWIOTLB feature. > > + > > +\drivernormative{\paragraph}{Software IOTLB bounce buffer > > capability}{Virtio > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / > > +Software IOTLB bounce buffer capability} > > + > > +The driver SHOULD use the offered buffer in preference to passing system > > +memory addresses to the device. > > Even if not using VIRTIO_F_SWIOTLB? Is that really necessary? That part isn't strictly necessary, but I think it makes sense, for cases where the SWIOTLB support is an *optimisation* even if it isn't strictly necessary. Why might it be an "optimisation"? Well... if we're thinking of a model like pKVM where the VMM can't just arbitrarily access guest memory, using the SWIOTLB is a simple way to avoid that (by using the on-board memory instead, which *can* be shared with the VMM). But if we want to go to extra lengths to support unenlightened guests, an implementation might choose to just *disable* the memory protection if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking that guest. Or it might have a complicated emulation/snooping of virtqueues in the trusted part of the hypervisor so that it knows which addresses the guest has truly *asked* the VMM to access. (And yes, of course that's what an IOMMU is for, but when have you seen hardware companies design a two-stage IOMMU which supports actual PCI passthrough *and* get it right for the hypervisor to 'snoop' on the stage1 page tables to support emulated devices too....) Ultimately I think it was natural to advertise the location of the buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious question of what a device should do if it sees one but *not* the other. Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its equivalent for other transports). But the converse seemed reasonable as a *hint* even if the use of the SWIOTLB isn't mandatory. Attachment:
smime.p7s
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |