[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/4] Adding Virtual Memory Fuses to Xen





On 15/12/2022 19:27, Smith, Jackson wrote:
Hi Julien,

Hi Jackson,

-----Original Message-----
From: Julien Grall <julien@xxxxxxx>
Sent: Tuesday, December 13, 2022 3:55 PM
To: Smith, Jackson <rsmith@xxxxxxxxxxxxxxxxxxxxx>

On 13/12/2022 19:48, Smith, Jackson wrote:
Hi Xen Developers,

Hi Jackson,

Thanks for sharing the prototype with the community. Some
questions/remarks below.

My team at Riverside Research is currently spending IRAD funding to
prototype next-generation secure hypervisor design ideas on Xen. In
particular, we are prototyping the idea of Virtual Memory Fuses for
Software Enclaves, as described in this paper:
https://www.nspw.org/papers/2020/nspw2020-brookes.pdf. Note
that that
paper talks about OS/Process while we have implemented the idea
for
Hypervisor/VM.

Our goal is to emulate something akin to Intel SGX or AMD SEV, but
using only existing virtual memory features common in all
processors.
The basic idea is not to map guest memory into the hypervisor so
that
a compromised hypervisor cannot compromise (e.g. read/write) the
guest. This idea has been proposed before, however, Virtual Memory
Fuses go one step further; they delete the hypervisor's mappings to
its own page tables, essentially locking the virtual memory
configuration for the lifetime of the system. This creates what we
call "Software Enclaves", ensuring that an adversary with arbitrary
code execution in the hypervisor STILL cannot read/write guest
memory.

I am confused, if the attacker is able to execute arbitrary code, then
what prevent them to write code to map/unmap the page?

Skimming through the paper (pages 5-6), it looks like you would need
to implement extra defense in Xen to be able to prevent map/unmap a
page.


The key piece is deleting all virtual mappings to Xen's page table
structures. From the paper (4.4.1 last paragraph), "Because all memory
accesses operate through the MMU, even page table memory needs
corresponding page table entries in order to be written to." Without a
virtual mapping to the page table, no code can modify the page table
because it cannot read or write the table. Therefore the mappings to the
guest cannot be restored even with arbitrary code execution.
I don't think this is sufficient. Even if the page-tables not part of the virtual mapping, an attacker could still modify TTBR0_EL2 (that's a system register hold a host physical address). So, with a bit more work, you can gain access to everything (see more below).

AFAICT, this problem is pointed out in the paper (section 4.4.1):

"The remaining attack vector. Unfortunately, deleting the page
table mappings does not stop the kernel from creating an entirely
new page table with the necessary mappings and switching to it
as the active context. Although this would be very difficult for
an attacker, switching to a new context with a carefully crafted
new page table structure could compromise the VMFE."

I believe this will be easier to do it in Xen because the virtual layout is not very complex.

It would be a matter of inserting a new entry in the root table you control. A rough sequence would be:
   1) Allocate a page
   2) Prepare the page to act as a root (e.g. mapping of your code...)
   3) Map the "existing" root as a writable.
   4) Update TTBR0_EL2 to point to your new root
   5) Add a mapping in the "old" root
   6) Switch to the old root

So can you outline how you plan to prevent/mitigate it?



With this technique, we protect the integrity and confidentiality of
guest memory. However, a compromised hypervisor can still
read/write
register state during traps, or refuse to schedule a guest, denying
service. We also recognize that because this technique precludes
modifying Xen's page tables after startup, it may not be compatible
with all of Xen's potential use cases. On the other hand, there are
some uses cases (in particular statically defined embedded systems)
where our technique could be adopted with minimal friction.

  From what you wrote, this sounds very much like the project Citrix
and
Amazon worked on called "Secret-free hypervisor" with a twist. In your
case, you want to prevent the hypervisor to map/unmap the guest
memory.

You can find some details in [1]. The code is x86 only, but I don't
see
any major blocker to port it on arm64.


Yes, we are familiar with the "secret-free hypervisor" work. As you
point out, both our work and the secret-free hypervisor remove the
directmap region to mitigate the risk of leaking sensitive guest
secrets. However, our work is slightly different because it additionally
prevents attackers from tricking Xen into remapping a guest.

I understand your goal, but I don't think this is achieved (see above). You would need an entity to prevent write to TTBR0_EL2 in order to fully protect it.


We see our goals and the secret-free hypervisor goals as orthogonal.
While the secret-free hypervisor views guests as untrusted and wants to
keep compromised guests from leaking secrets, our work comes from the
perspective of an individual guest trying to protect its secrets from
the rest of the stack. So it wouldn't be unreasonable to say "I want a
hypervisor that is 'secret-free' and implements VMF". We see them as
different techniques with overlapping implementations.

I can see why you want to divide them. But to me if you have VMF, then you have a secret-free hypervisor in term of implementation.

The major difference is how the xenheap is dealt with. At the moment, for the implementation we are looking to still use the same heap.

However there are a few drawback in term of pages usage:
* A page can be allocated anywhere in the memory map. So you can end to allocate a L1 (Arm) or L3 (x86) just for a single page
  * Contiguous pages may be allocated at different time.
  * Page-tables can be empty

x86 has some logic to handle the last two points. but Arm don't have it yet. I feel this is quite complex (in particular because of the break-before-make).

So one solution would be to use a split heap. The trouble is that xenheap memory would be more "limited". That might be OK for VMF, I need to think a bit more for secret-free hypervisor.

Another solution would be to use the vmap() (which would not be possible for VMF).

Using the split xenheap approach means we don't have to worry about
unmapping guest pagetables or xen's dynamically allocated tables.

We still need to unmap the handful of static pagetables that are
declared at the top of xen/arch/arm/mm.c. Remember our goal is to
prevent Xen from reading or writing its own page tables. We can't just
unmap these static tables without shattering because they end up part of
the superpages that map the xen binary. We're probably only shattering a
single superpage for this right now. Maybe we can move the static tables
to a superpage aligned region of the binary and pad that region so we
can unmap an entire superpage without shattering?

For static pages you don't even need to shatter superpages because Xen is mapped with 4KB pages.

In the future we might
adjust the boot code to avoid the dependency on static page table
locations.

You will always need at least a few static page tables for the initial switch the MMU on. Now, you could possibly allocate a new set out of Xen and then switch to it.

But I am not sure this is worth the trouble if you can easily unmap the static version afterwards.


Finally, our initial testing suggests that Xen never reads guest
memory (in a static, non-dom0-enchanced configuration), but have
not
really explored this thoroughly.
We know at least these things work:
        Dom0less virtual serial terminal
        Domain scheduling
We are aware that these things currently depend on accessible guest
memory:
        Some hypercalls take guest pointers as arguments

There are not many hypercalls that don't take guest pointers.

        Virtualized MMIO on arm needs to decode certain load/store
        instructions

On Arm, this can be avoided of the guest OS is not using such
instruction. In fact they were only added to cater "broken" guest OS.


What do you mean by "broken" guests?

I see in the arm ARM where it discusses interpreting the syndrome
register. But I'm not understanding which instructions populate the
syndrome register and which do not. Why are guests using instructions
that don't populate the syndrome register considered "broken"?

The short answer is they can't be easily/safely decoded as Xen read from the data cache but the processor read instruction from the instruction cache. There are situation where they could mismatch. For more details...

Is there
somewhere I can look to learn more?
... you can read [1], [2].



Also, this will probably be a lot more difficult on x86 as, AFAIK,
there
is
no instruction syndrome. So you will need to decode the instruction in
order to emulate the access.


It's likely that other Xen features require guest memory access.

For Arm, guest memory access is also needed when using the GICv3 ITS
and/or second-level SMMU (still in RFC).


Thanks for pointing this out. We will be sure to make note of these
limitations going forward.


For x86, if you don't want to access the guest memory, then you may
need to restrict to PVH as for HVM we need to emulate some devices in
QEMU.
That said, I am not sure PVH is even feasible.


Is that mostly in reference to the need decode instructions on x86 or
are there other reasons why you feel it might not be feasible to apply
this to Xen on x86?

I am not aware of any other. But it would probably best to ask with someone more knowledgeable than me on x86.

Cheers,

[1] https://lore.kernel.org/xen-devel/e2d041b2-3b38-f19b-2d8e-3a255b0ac07e@xxxxxxx/ [2] https://lore.kernel.org/xen-devel/20211126131459.2bbc81ad@xxxxxxxxxxxxxxxxxxxxxxxxxx


--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.