[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Design session notes: GPU acceleration in Xen



On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> On Fri, Jun 14, 2024 at 10:39:37AM +0200, Roger Pau Monné wrote:
> > On Fri, Jun 14, 2024 at 10:12:40AM +0200, Jan Beulich wrote:
> > > On 14.06.2024 09:21, Roger Pau Monné wrote:
> > > > On Fri, Jun 14, 2024 at 08:38:51AM +0200, Jan Beulich wrote:
> > > >> On 13.06.2024 20:43, Demi Marie Obenour wrote:
> > > >>> GPU acceleration requires that pageable host memory be able to be 
> > > >>> mapped
> > > >>> into a guest.
> > > >>
> > > >> I'm sure it was explained in the session, which sadly I couldn't 
> > > >> attend.
> > > >> I've been asking Ray and Xenia the same before, but I'm afraid it still
> > > >> hasn't become clear to me why this is a _requirement_. After all that's
> > > >> against what we're doing elsewhere (i.e. so far it has always been
> > > >> guest memory that's mapped in the host). I can appreciate that it might
> > > >> be more difficult to implement, but avoiding to violate this 
> > > >> fundamental
> > > >> (kind of) rule might be worth the price (and would avoid other
> > > >> complexities, of which there may be lurking more than what you 
> > > >> enumerate
> > > >> below).
> > > > 
> > > > My limited understanding (please someone correct me if wrong) is that
> > > > the GPU buffer (or context I think it's also called?) is always
> > > > allocated from dom0 (the owner of the GPU).  The underling memory
> > > > addresses of such buffer needs to be mapped into the guest.  The
> > > > buffer backing memory might be GPU MMIO from the device BAR(s) or
> > > > system RAM, and such buffer can be paged by the dom0 kernel at any
> > > > time (iow: changing the backing memory from MMIO to RAM or vice
> > > > versa).  Also, the buffer must be contiguous in physical address
> > > > space.
> > > 
> > > This last one in particular would of course be a severe restriction.
> > > Yet: There's an IOMMU involved, isn't there?
> > 
> > Yup, IIRC that's why Ray said it was much more easier for them to
> > support VirtIO GPUs from a PVH dom0 rather than classic PV one.
> > 
> > It might be easier to implement from a classic PV dom0 if there's
> > pv-iommu support, so that dom0 can create it's own contiguous memory
> > buffers from the device PoV.
> 
> What makes PVH an improvement here?  I thought PV dom0 uses an identity
> mapping for the IOMMU, while a PVH dom0 uses an IOMMU that mirrors the
> dom0 second-stage page tables.

Indeed, hence finding a physically contiguous buffer on classic PV is
way more complicated, because the IOMMU identity maps mfns, and the PV
address space can be completely scattered.

OTOH, on PVH the IOMMU page tables are the same as the second stage
translation, and hence the physical address is way more compact (as it
would be on native).

> In both cases, the device physical
> addresses are identical to dom0’s physical addresses.

Yes, but a PV dom0 physical address space can be very scattered.

IIRC there's an hypercall to request physically contiguous memory for
PV, but you don't want to be using that every time you allocate a
buffer (not sure it would support the sizes needed by the GPU
anyway).

> PV is terrible for many reasons, so I’m okay with focusing on PVH dom0,
> but I’d like to know why there is a difference.
> 
> > > > I'm not sure it's possible to ensure that when using system RAM such
> > > > memory comes from the guest rather than the host, as it would likely
> > > > require some very intrusive hooks into the kernel logic, and
> > > > negotiation with the guest to allocate the requested amount of
> > > > memory and hand it over to dom0.  If the maximum size of the buffer is
> > > > known in advance maybe dom0 can negotiate with the guest to allocate
> > > > such a region and grant it access to dom0 at driver attachment time.
> > > 
> > > Besides the thought of transiently converting RAM to kind-of-MMIO, this
> > 
> > As a note here, changing the type to MMIO would likely involve
> > modifying the EPT/NPT tables to propagate the new type.  On a PVH dom0
> > this would likely involve shattering superpages in order to set the
> > correct memory types.
> > 
> > Depending on how often and how random those system RAM changes are
> > necessary this could also create contention on the p2m lock.
> > 
> > > makes me think of another possible option: Could Dom0 transfer ownership
> > > of the RAM that wants mapping in the guest (remotely resembling
> > > grant-transfer)? Would require the guest to have ballooned down enough
> > > first, of course. (In both cases it would certainly need working out how
> > > the conversion / transfer back could be made work safely and reasonably
> > > cleanly.)
> > 
> > Maybe.  The fact the guest needs to balloon down that amount of memory
> > seems weird to me, as from the guest PoV that mapped memory is
> > MMIO-like and not system RAM.
> 
> I don’t like it either.  Furthermore, this would require changes to the
> virtio-GPU driver in the guest, which I’d prefer to avoid.

IMO it would be helpful if you (or someone) could write the full
specification of how VirtIO GPU is supposed to work right now (with
the KVM model I assume?) as it would be a good starting point to
provide suggestions about how to make it work (or adapt it) on Xen.

I don't think the high level layers on top of VirtIO GPU are relevant,
but it's important to understand the protocol between the VirtIO GPU
front and back ends.

So far I only had scattered conversation about what's needed, but not
a formal write-up of how this is supposed to work.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.