[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

> From: Tim Deegan [mailto:tim@xxxxxxx]
> Sent: Friday, December 12, 2014 5:29 AM
> Hi, again. :)
> As promised, I'm going to talk about more abstract design
> considerations.  Thi will be a lot less concrete than in the other
> email, and about a larger range of things.  Some of of them may not be
> really desirable - or even possible.

Thanks for your time on sharing thoughts on this! I'll give my comments
in same level and leave detail technical discussion in another thread. :-)

> [ TL;DR: read the other reply with the practical suggestions in it :) ]
> I'm talking from the point of view of a hypervisor maintainer, looking
> at introducing this new XenGT component and thinking about what
> security properties we would like the _system_ to have once XenGT is
> introduced.  I'm going to lay out a series of broadly increasing
> levels of security goodness and talk about what we'd need to do to get
> there.

that's a good clarification of the levels.

> For the purposes of this discussion, Xen does not _trust_ XenGT.  By
> that I mean that Xen can't rely on the correctness/integrity of XenGT
> itself to maintain system security.  Now, we can decide that for some
> properties we _will_ choose to trust XenGT, but the default is to
> assume that XenGT could be compromised or buggy.  (This is not
> intended as a slur on XenGT, btw -- this is how we reason about device
> driver domains, qemu-dm and other components.  There will be bugs in
> any component, and we're designing the system to minimise the effect
> of those bugs.)

Yes, it's a fair concern.

> OK.  Properties we would like to have:
> LEVEL 0: Protect Xen itself from XenGT
> --------------------------------------
> Bugs in XenGT should not be able to crash he host, and a compromised
> XenGT should not be able to take over the hypervisor
> We're not there in the current design, purely because XenGT has to be
> in dom0 (so it can trivially DoS Xen by rebooting the host).

Can we really decouple dom0 from DoS Xen? I know there's on-going effort
like PVH Dom0, however there are lots of trickiness in Dom0 which can 
put the platform into a bad state. One example is ACPI. All the platform
details are encapsulated in AML language, and only dom0 knows how to
handle ACPI events. Unless Xen has another parser to guard all possible
resources which might be touched thru ACPI, a tampered dom0 has many
way to break out. But that'd be very challenging and complex.

If we can't containerize Dom0's behavior completely, I would think dom0
and Xen actually in the same trust zone, so putting XenGT in Dom0 shouldn't
make things worse.

> But it doesn't seem too hard: as soon as we can run XenGT in a driver
> domain, and with IOMMU tables that restrict the GPU from writing to Xen's
> datastructures, we'll have this property.
> [BTW, this whole discussion assumes that the GPU has no 'back door'
>  access to issue DMA that is not translated by the IOMMU.  I have heard
>  rumours in the past that such things exist. :) If the GPU can issue
>  untranslated DMA, then whetever controls it can take over the entire
>  system, and so we can't make _any_ security guarantees about it.]

I definitely agree with this LEVEL 0 requirement in general, e.g. dom0 
can't DMA into Xen's data structure (this is ensured even for default 1:1 
identity mapping). However I'm not on whether XenGT must be put
in a driver domain as a hard requirement. It's nice to have (and some
implementation opens let's discuss in another thread)

> LEVEL 1: Isolate XenGT's clients from other VMs
> -----------------------------------------------
> In other words we partition the machine into VMs XenGT can touch
> (i.e. its clients) and those it can't.  Then a malicious client that
> compromises XenGT only gains access to other VMs that share a GPU with
> it.  That means we can deploy XenGT for some VMs without increasing
> the risk to other tenants.
> Again we're not there yet, but I think the design I was talking about
> in my other email would do it: if XenGT must map all the memory it
> wants to let the GPU DMA to, and Xen's policy is to deny mappings for
> non-client-vm memory, then VMs that aren't using XenGT are protected.

fully agree. We have a 'vgt' control option in each VM's config file. that
can be the hint for Xen to decide allow or deny mapping from XenGT.

> LEVEL 2: Isolate XenGT's clients from each other
> ------------------------------------------------
> This is trickier, as you pointed out.  We could:
> a) Decide that we will trust XenGT to provide this property.  After
>    all, that's its main purpose!  This is how we treat other shared
>    backends: if a NIC device driver domain is compromised, the
>    attacker controls the network traffic for all its frontends.
>    OTOH, we don't trust qemu in that way -- instead we use stub domains
>    and IS_PRIV_FOR to enforce isolation.

yep. Just curious, I thought stubdomain is not popularly used. typical
case is to have qemu in dom0. is this still true? :-)

> b) Move all of XenGT into Xen.  This is just defining the problem away
>    and would probably do more harm than good - after all, keeping it
>    separate has other advantages.

I'll explain below why we don't keep XenGT in Xen.

> c) Use privilege separation: break XenGT into parts, isolated from each
>    other, with the principle of least privilege applied to them.  E.g.
>    - GPU emulation could be in a per-client component that doesn't
>      share state with the other clients' emulators;

yes, we're doing it that way now. the emulation is a per-vm kernel thread.
a separate main thread manages physical GPU to do context switch.

>    - Shadowing GTTs and auditing GPU commands could move into Xen,
>      with a clean interface to the emulation parts.

I'm afraid there's no such a clean interface given the complexity of

Here let me give some other background which impacts XenGT design
(some are existing, and some are following plan). Putting them here 
is not to say "we don't want to change due to other reasons", but to
show the list of factors we need to balance:

1. the core device model will be merged as part of Intel graphics kernel
driver. This can avoid duplicated physical GPU management in XenGT
(that's today's implementation) with benefits on simplicity, quality and 

2. the same device model will then be shared by both XenGT and KVMGT,
only requiring Xen/KVM to provide a minimal set of emulation services,
like event forwarding, map guest memory, etc.

3. GPU emulation is complex, and generation-to-generation there are
lots of differences. Our customers need a flexible release model so 
we can release new features and bug fixes quickly thru kernel module.

Those are major reasons we come to current XenGT architecture. Then
back to your idea on moving shadow GTT and auditing GPU commands
into Xen. It will cause more complexity on:

- somehow it means we have two drivers on one device, each responsible
for some role. Then likely we need hack Intel graphics driver's GTT 
management code and scheduling code to cooperate with this movement. 
That's unlikely to be acceptable by driver people

- auditing GPU commands need to understand vGPU context, which
means share and synchronization of a large buffer required between 
Xen and XenGT

- GTT/command format are not compatible generation-to-generation,
which means unnecessary maintenance effort in Xen

- and last but not the least, GPU HW itself is not designed so cleanly
to separate GTT from remaining parts, which means even we move
GTT mgmt. into hypervisor, there are many means to bypass the control,
e.g. changing the root pointer of GTT (which may be in a register,
or maybe in a memory structure). while once we wants to move those
parts into Xen which will dig out more bits and finally we have to pull
the whole driver in Xen (though less complex than a real graphics driver)

sorry write a long detail in this high level discussion. Just write-down
when thinking whether this is practical, and hope it answers our concern
here. :-)

>    That way, even if a client VM can exploit a bug in the emulator,
>    it can't affect other clients because it can't see their emulator
>    state, and it can't bypass the safety rules because they're
>    enforced by Xen.
>    When I talked about privilege separation before I was suggesting
>    something like this, but without moving anything into Xen -- e.g.
>    the device-emulation code for each client could be in a per-client,
>    non-root process.  The code that audits and issues commands to the
>    GPU would be in a separate process, which is allowed to make
>    hypercalls, and which does not trust the emulator processes.
>    My apologies if you're already doing this -- I know XenGT has some
>    components in a kernel driver and some elsewhere but I haven't
>    looked at the details.

that's a good comment. we're implementing that way, but might not
be so strictly separated. I'll bring this comment back to our engineering
team to have it well considered.

> LEVEL 3: Isolate XenGT's clients from XenGT itself
> --------------------------------------------------
> XenGT should not be able to access parts of its client VMs that they
> have not given it permission to.  E.g. XenGT should not be able to
> read a client VM's crypto keys unless it displays them on the
> framebuffer or uses the GPU to accelerate crypto.
> Unlike level 2, device driver domains _do_ have this property: this is
> what the grant tables are used for.  A compromised NIC driver domain
> can MITM the frontend guest but it can't read any memory in the guest
> other than network buffers.
> Again there are a few approaches, like:
> a) Declare that we don't care (i.e. that we will trust XenGT for this
>    property too).  In a way it's no worse than trusting the firmware
>    on a dedicated pass-though GPU.  But on the other hand the client
>    VM is sharing that firmware with some other VMs... :(
> b) Make the GPU driver in the client use grant tables for all RAM that
>    it gives to the GPU.  Probably not practical!

yes, and that can be a good research topic. :-)

> c) Move just the code that builds the GTTs into Xen.  That way
>    Xen would guarantee that the GPU never accessed memory it wasn't
>    allowed to.

as explained above, it's impractical to separate a self-contained GTT logic
into Xen. In GPU, GTT is somehow an attribute belonging to a render context,
not like CPU CR3 which is very simple.

> I'm sure there are other ideas too.
> Conclusion
> ----------
> That's enough rambling from me -- time to come back down to earth.
> While I think it's useful to think about all these things, we don't
> want to get carried away. :)  And as I said, for some things we can
> decide to trust XenGT to provide them, as long as we're clear about
> what that means.
> I think that a reasonable minimum standard to expect is to enforce
> levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3.  And I
> think we can do that without needing any huge engineering effort;
> as I said, I think that's covered in my earlier reply.

I agree the conclusion that "minimum standard to expect is to enforce
levels 0 and 1 in Xen, and trust XenGT for levels 2 and 3", except the
concern whether PVH Dom0 is a hard requirement or not. Having
said that, I'm happy to discuss technical detail in another thread on
how to support PVH Dom0.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.