[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

> From: Tim Deegan [mailto:tim@xxxxxxx]
> Sent: Thursday, December 18, 2014 11:47 PM
> Hi,
> At 07:24 +0000 on 12 Dec (1418365491), Tian, Kevin wrote:
> > > I'm afraid not.  There's nothing worrying per se in a backend knowing
> > > the MFNs of the pages -- the worry is that the backend can pass the
> > > MFNs to hardware.  If the check happens only at lookup time, then XenGT
> > > can (either through a bug or a security breach) just pass _any_ MFN to
> > > the GPU for DMA.
> > >
> > > But even without considering the security aspects, this model has bugs
> > > that may be impossible for XenGT itself to even detect.  E.g.:
> > >  1. Guest asks its virtual GPU to DMA to a frame of memory;
> > >  2. XenGT looks up the GFN->MFN mapping;
> > >  3. Guest balloons out the page;
> > >  4. Xen allocates the page to a different guest;
> > >  5. XenGT passes the MFN to the GPU, which DMAs to it.
> > >
> > > Whereas if stage 2 is a _mapping_ operation, Xen can refcount the
> > > underlying memory and make sure it doesn't get reallocated until XenGT
> > > is finished with it.
> >
> > yes, I see your point. Now we can't support ballooning in VM given above
> > reason, and refcnt is required to close that gap.
> >
> > but just to confirm one point. from my understanding whether it's a
> > mapping operation doesn't really matter. We can invent an interface
> > to get p2m mapping and then increase refcnt. the key is refcnt here.
> > when XenGT constructs a shadow GPU page table, it creates a reference
> > to guest memory page so the refcnt must be increased. :-)
> True. :)  But Xen does need to remember all the refcounts that were
> created (so it can tidy up if the domain crashes).  If Xen is already
> doing that it might as well do it in the IOMMU tables since that
> solves other problems.

would a refcnt in p2m layer enough so we don't need separate refcnt in both
EPT and IOMMU page table?

> > > [First some hopefully-helpful diagrams to explain my thinking.  I'll
> > >  borrow 'BFN' from Malcolm's discussion of IOMMUs to describe the
> > >  addresses that devices issue their DMAs in:
> >
> > what's 'BFN' short for? Bus Frame Number?
> Yes, I think so.
> > > If we replace that lookup with a _map_ hypercall, either with Xen
> > > choosing the BFN (as happens in the PV grant map operation) or with
> > > the guest choosing an unused address (as happens in the HVM/PVH
> > > grant map operation), then:
> > >  - the only extra code in XenGT itself is that you need to unmap
> > >    when you change the GTT;
> > >  - Xen can track and control exactly which MFNs XenGT/the GPU can
> access;
> > >  - running XenGT in a driver domain or PVH dom0 ought to work; and
> > >  - we fix the race condition I described above.
> >
> > ok, I see your point here. It does sound like a better design to meet
> > Xen hypervisor's security requirement and can also work with PVH
> > Dom0 or driver domain. Previously even when we said a MFN is
> > required, it's actually a BFN due to IOMMU existence, and it works
> > just because we have a 1:1 identity mapping in-place. And by finding
> > a BFN
> >
> > some follow-up think here:
> >
> > - one extra unmap call will have some performance impact, especially
> > for media processing workloads where GPU page table modifications
> > are hot. but suppose this can be optimized with batch request
> Yep.  In general I'd hope that the extra overhead of unmap is small
> compared with the trap + emulate + ioreq + schedule that's just
> happened.  Though I know that IOTLB shootdowns are potentially rather
> expensive right now so it might want some measurement.

yes, that's the hard part requiring experiments to find a good balance
between complexity and performance. IOMMU page table is not designed 
with same frequent modifications as CPU/GPU page tables, but following
above trend make them connected. Another option might be reserve a big
enough BFNs to cover all available guest memory at boot time, so to
eliminate run-time modification overhead.

> > - is there existing _map_ call for this purpose per your knowledge, or
> > a new one is required? If the latter, what's the additional logic to be
> > implemented there?
> For PVH, the XENMEM_add_to_physmap (gmfn_foreign) path ought to do
> what you need, I think.  For PV, I think we probably need a new map
> operation with sensible semantics.  My inclination would be to have it
> follow the grant-map semantics (i.e. caller supplies domid + gfn,
> hypervisor supplies BFN and success/failure code).

setup mapping is not a big problem. it's more about finding available BFNs
in a way not conflicting with other usages e.g. memory hotplug, ballooning
(well for this I'm not sure now whether it's only for existing gfns from other

> Malcolm might have opinions about this -- it starts looking like the
> sort of PV IOMMU interface he's suggested before.

we'd like to hear Malcolm's suggestion here.

> > - when you say _map_, do you expect this mapped into dom0's virtual
> > address space, or just guest physical space?
> For PVH, I mean into guest physical address space (and iommu tables,
> since those are the same).  For PV, I mean just the IOMMU tables --
> since the guest controls its own PFN space entirely there's nothing
> Xen can to map things into it.
> > - how is BFN or unused address (what do you mean by address here?)
> > allocated? does it need present in guest physical memory at boot time,
> > or just finding some holes?
> That's really a question for the xen maintainers in the linux kernel.
> I presume that whatever bookkeeping they currently do for grant-mapped
> memory would suffice here just as well.

will study that part.

> > - graphics memory size could be large. starting from BDW, there'll
> > be 64bit page table format. Do you see any limitation here on finding
> > BFN or address?
> Not really.  The IOMMU tables are also 64-bit so there must be enough
> addresses to map all of RAM.  There shouldn't be any need for these
> mappings to be _contiguous_, btw.  You just need to have one free
> address for each mapping.  Again, following how grant maps work, I'd
> imagine that PVH guests will allocate an unused GFN for each mapping
> and do enough bookkeeping to make sure they don't clash with other GFN
> users (grant mapping, ballooning, &c).  PV guests will probably be
> given a BFN by the hypervisor at map time (which will be == MFN in
> practice) and just needs to pass the same BFN to the unmap call later
> (it can store it in the GTT meanwhile).

if possible prefer to make both consistent, i.e. always finding unused GFN?

> > > The default policy I'm suggesting is that the XenGT backend domain
> > > should be marked IS_PRIV_FOR (or similar) over the XenGT client VMs,
> > > which will need a small extension in Xen since at the moment struct
> > > domain has only one "target" field.
> >
> > Is that connection setup by toolstack or by hypervisor today?
> It's set up by the toolstack using XEN_DOMCTL_set_target.  Extending
> that to something like XEN_DOMCTL_set_target_list would be OK, I
> think, along with some sort of lookup call.  Or maybe an
> add_target/remove_target pair would be easier?

Thanks for suggestions. Yu and I will have a detail study and work out a 
proposal. :-)


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.