[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Using Restricted DMA for virtio-pci
On 30 March 2025 17:59:13 BST, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >On Sun, Mar 30, 2025 at 04:07:56PM +0100, David Woodhouse wrote: >> On Sun, 2025-03-30 at 09:42 -0400, Michael S. Tsirkin wrote: >> > On Fri, Mar 28, 2025 at 05:40:41PM +0000, David Woodhouse wrote: >> > > On Fri, 2025-03-21 at 18:42 +0000, David Woodhouse wrote: >> > > > > >> > > > > I don't mind as such (though I don't understand completely), but >> > > > > since >> > > > > this is changing the device anyway, I am a bit confused why you can't >> > > > > just set the VIRTIO_F_ACCESS_PLATFORM feature bit? This forces DMA >> > > > > API >> > > > > which will DTRT for you, will it not? >> > > > >> > > > That would be necessary but not sufficient. ... >> > >> > could you explain pls? >> >> There was more to that in the previous email which I elided for this >> followup. >> >> https://lore.kernel.org/all/d1382a6ee959f22dc5f6628d8648af77f4702418.camel@xxxxxxxxxxxxx/ >> >> > > My first cut at a proposed spec change looks something like this. I'll >> > > post it to the virtio-comment list once I've done some corporate >> > > bureaucracy and when the list stops sending me python tracebacks in >> > > response to my subscribe request. >> > >> > the linux foundation one does this? maybe poke at the admins. >> > >> > > In the meantime I'll hack up some QEMU and guest Linux driver support >> > > to match. >> > > >> > > diff --git a/content.tex b/content.tex >> > > index c17ffa6..1e6e1d6 100644 >> > > --- a/content.tex >> > > +++ b/content.tex >> > > @@ -773,6 +773,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >> > > Feature Bits} >> > > Currently these device-independent feature bits are defined: >> > > >> > > \begin{description} >> > > + \item[VIRTIO_F_SWIOTLB (27)] This feature indicates that the device >> > > + provides a memory region which is to be used for bounce buffering, >> > > + rather than permitting direct memory access to system memory. >> > > \item[VIRTIO_F_INDIRECT_DESC (28)] Negotiating this feature indicates >> > > that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT >> > > flag set, as described in \ref{sec:Basic Facilities of a Virtio >> > > @@ -885,6 +888,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >> > > Feature Bits} >> > > VIRTIO_F_ACCESS_PLATFORM is not offered, then a driver MUST pass only >> > > physical >> > > addresses to the device. >> > > >> > > +A driver SHOULD accept VIRTIO_F_SWIOTLB if it is offered, and it MUST >> > > +then pass only addresses within the Software IOTLB bounce buffer to the >> > > +device. >> > > + >> > > A driver SHOULD accept VIRTIO_F_RING_PACKED if it is offered. >> > > >> > > A driver SHOULD accept VIRTIO_F_ORDER_PLATFORM if it is offered. >> > > @@ -921,6 +928,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved >> > > Feature Bits} >> > > A device MAY fail to operate further if VIRTIO_F_ACCESS_PLATFORM is not >> > > accepted. >> > > >> > > +A device MUST NOT offer VIRTIO_F_SWIOTLB if its transport does not >> > > +provide a Software IOTLB bounce buffer. >> > > +A device MAY fail to operate further if VIRTIO_F_SWIOTLB is not >> > > accepted. >> > > + >> > > If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use >> > > buffers in the same order in which they have been available. >> > > >> > > diff --git a/transport-pci.tex b/transport-pci.tex >> > > index a5c6719..23e0d57 100644 >> > > --- a/transport-pci.tex >> > > +++ b/transport-pci.tex >> > > @@ -129,6 +129,7 @@ \subsection{Virtio Structure PCI >> > > Capabilities}\label{sec:Virtio Transport Option >> > > \item ISR Status >> > > \item Device-specific configuration (optional) >> > > \item PCI configuration access >> > > +\item SWIOTLB bounce buffer >> > > \end{itemize} >> > > >> > > Each structure can be mapped by a Base Address register (BAR) belonging >> > > to >> > > @@ -188,6 +189,8 @@ \subsection{Virtio Structure PCI >> > > Capabilities}\label{sec:Virtio Transport Option >> > > #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 >> > > /* Vendor-specific data */ >> > > #define VIRTIO_PCI_CAP_VENDOR_CFG 9 >> > > +/* Software IOTLB bounce buffer */ >> > > +#define VIRTIO_PCI_CAP_SWIOTLB 10 >> > > \end{lstlisting} >> > > >> > > Any other value is reserved for future use. >> > > @@ -744,6 +747,36 @@ \subsubsection{Vendor data >> > > capability}\label{sec:Virtio >> > > The driver MUST qualify the \field{vendor_id} before >> > > interpreting or writing into the Vendor data capability. >> > > >> > > +\subsubsection{Software IOTLB bounce buffer capability}\label{sec:Virtio >> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / >> > > +Software IOTLB bounce buffer capability} >> > > + >> > > +The optional Software IOTLB bounce buffer capability allows the >> > > +device to provide a memory region which can be used by the driver >> > > +driver for bounce buffering. This allows a device on the PCI >> > > +transport to operate without DMA access to system memory addresses. >> > > + >> > > +The Software IOTLB region is referenced by the >> > > +VIRTIO_PCI_CAP_SWIOTLB capability. Bus addresses within the referenced >> > > +range are not subject to the requirements of the VIRTIO_F_ORDER_PLATFORM >> > > +capability, if negotiated. >> > >> > >> > why not? an optimization? >> > A mix of swiotlb and system memory might be very challenging from POV >> > of ordering. >> >> Conceptually, these addresses are *on* the PCI device. If the device is >> accessing addresses which are local to it, they aren't subject to IOMMU >> translation/filtering because they never even make it to the PCI bus as >> memory transactions. >> >> > >> > > + >> > > +\devicenormative{\paragraph}{Software IOTLB bounce buffer >> > > capability}{Virtio >> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / >> > > +Software IOTLB bounce buffer capability} >> > > + >> > > +Devices which present the Software IOTLB bounce buffer capability >> > > +SHOULD also offer the VIRTIO_F_SWIOTLB feature. >> > > + >> > > +\drivernormative{\paragraph}{Software IOTLB bounce buffer >> > > capability}{Virtio >> > > +Transport Options / Virtio Over PCI Bus / PCI Device Layout / >> > > +Software IOTLB bounce buffer capability} >> > > + >> > > +The driver SHOULD use the offered buffer in preference to passing system >> > > +memory addresses to the device. >> > >> > Even if not using VIRTIO_F_SWIOTLB? Is that really necessary? >> >> That part isn't strictly necessary, but I think it makes sense, for >> cases where the SWIOTLB support is an *optimisation* even if it isn't >> strictly necessary. >> >> Why might it be an "optimisation"? Well... if we're thinking of a model >> like pKVM where the VMM can't just arbitrarily access guest memory, >> using the SWIOTLB is a simple way to avoid that (by using the on-board >> memory instead, which *can* be shared with the VMM). >> >> But if we want to go to extra lengths to support unenlightened guests, >> an implementation might choose to just *disable* the memory protection >> if the guest doesn't negotiate VIRTIO_F_SWIOTLB, instead of breaking >> that guest. >> >> Or it might have a complicated emulation/snooping of virtqueues in the >> trusted part of the hypervisor so that it knows which addresses the >> guest has truly *asked* the VMM to access. (And yes, of course that's >> what an IOMMU is for, but when have you seen hardware companies design >> a two-stage IOMMU which supports actual PCI passthrough *and* get it >> right for the hypervisor to 'snoop' on the stage1 page tables to >> support emulated devices too....) >> >> Ultimately I think it was natural to advertise the location of the >> buffer with the VIRTIO_PCI_CAP_SWIOTLB capability and then to have the >> separate VIRTIO_F_SWIOTLB for negotiation... leaving the obvious >> question of what a device should do if it sees one but *not* the other. >> >> Obviously you can't have VIRTIO_F_SWIOTLB *without* there actually >> being a buffer advertised with VIRTIO_PCI_CAP_SWIOTLB (or its >> equivalent for other transports). But the converse seemed reasonable as >> a *hint* even if the use of the SWIOTLB isn't mandatory. > >OK but I feel it's more work than you think, so we really need >a better reason than just "why not". > >For example, it's not at all clear to me how the ordering is >going to work if buffers are in memory but the ring is swiotlb >or the reverse. Ordering will all be messed up. Maybe. Although by the time the driver has *observed* the data written to the swiotlb on the device's BAR, it has had to cross the same PCI bus. But sure, we could require all-or-nothing. Or require that the SWIOTLB only be used if the driver negotiates VIRTIO_F_SWIOTLB. Even in the latter case we can still allow for SWIOTLB to either be a requirement or a hint, purely down to whether the device *allows* the driver not to negotiate `VIRTIO_F_SWIOTLB`.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |