[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Ongoing/future speculative mitigation work
Hello, This is an accumulation and summary of various tasks which have been discussed since the revelation of the speculative security issues in January, and also an invitation to discuss alternative ideas. They are x86 specific, but a lot of the principles are architecture-agnostic. 1) A secrets-free hypervisor. Basically every hypercall can be (ab)used by a guest, and used as an arbitrary cache-load gadget. Logically, this is the first half of a Spectre SP1 gadget, and is usually the first stepping stone to exploiting one of the speculative sidechannels. Short of compiling Xen with LLVM's Speculative Load Hardening (which is still experimental, and comes with a ~30% perf hit in the common case), this is unavoidable. Furthermore, throwing a few array_index_nospec() into the code isn't a viable solution to the problem. An alternative option is to have less data mapped into Xen's virtual address space - if a piece of memory isn't mapped, it can't be loaded into the cache. An easy first step here is to remove Xen's directmap, which will mean that guests general RAM isn't mapped by default into Xen's address space. This will come with some performance hit, as the map_domain_page() infrastructure will now have to actually create/destroy mappings, but removing the directmap will cause an improvement for non-speculative security as well (No possibility of ret2dir as an exploit technique). Beyond the directmap, there are plenty of other interesting secrets in the Xen heap and other mappings, such as the stacks of the other pcpus. Fixing this requires moving Xen to having a non-uniform memory layout, and this is much harder to change. I already experimented with this as a meltdown mitigation around about a year ago, and posted the resulting series on Jan 4th, https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html, some trivial bits of which have already found their way upstream. To have a non-uniform memory layout, Xen may not share L4 pagetables. i.e. Xen must never have two pcpus which reference the same pagetable in %cr3. This property already holds for 32bit PV guests, and all HVM guests, but 64bit PV guests are the sticking point. Because Linux has a flat memory layout, when a 64bit PV guest schedules two threads from the same process on separate vcpus, those two vcpus have the same virtual %cr3, and currently, Xen programs the same real %cr3 into hardware. If we want Xen to have a non-uniform layout, are two options are: * Fix Linux to have the same non-uniform layout that Xen wants (Backwards compatibility for older 64bit PV guests can be achieved with xen-shim). * Make use XPTI algorithm (specifically, the pagetable sync/copy part) forever more in the future. Option 2 isn't great (especially for perf on fixed hardware), but does keep all the necessary changes in Xen. Option 1 looks to be the better option longterm. As an interesting point to note. The 32bit PV ABI prohibits sharing of L3 pagetables, because back in the 32bit hypervisor days, we used to have linear mappings in the Xen virtual range. This check is stale (from a functionality point of view), but still present in Xen. A consequence of this is that 32bit PV guests definitely don't share top-level pagetables across vcpus. Juergen/Boris: Do you have any idea if/how easy this infrastructure would be to implement for 64bit PV guests as well? If a PV guest can advertise via Elfnote that it won't share top-level pagetables, then we can audit this trivially in Xen. 2) Scheduler improvements. (I'm afraid this is rather more sparse because I'm less familiar with the scheduler details.) At the moment, all of Xen's schedulers will happily put two vcpus from different domains on sibling hyperthreads. There has been a lot of sidechannel research over the past decade demonstrating ways for one thread to infer what is going on the other, but L1TF is the first vulnerability I'm aware of which allows one thread to directly read data out of the other. Either way, it is now definitely a bad thing to run different guests concurrently on siblings. Fixing this by simply not scheduling vcpus from a different guest on siblings does result in a lower resource utilisation, most notably when there are an odd number runable vcpus in a domain, as the other thread is forced to idle. A step beyond this is core-aware scheduling, where we schedule in units of a virtual core rather than a virtual thread. This has much better behaviour from the guests point of view, as the actually-scheduled topology remains consistent, but does potentially come with even lower utilisation if every other thread in the guest is idle. A side requirement for core-aware scheduling is for Xen to have an accurate idea of the topology presented to the guest. I need to dust off my Toolstack CPUID/MSR improvement series and get that upstream. One of the most insidious problems with L1TF is that, with hyperthreading enabled, a malicious guest kernel can engineer arbitrary data leakage by having one thread scanning the expected physical address, and the other thread using an arbitrary cache-load gadget in hypervisor context. This occurs because the L1 data cache is shared by threads. A solution to this issue was proposed, whereby Xen synchronises siblings on vmexit/entry, so we are never executing code in two different privilege levels. Getting this working would make it safe to continue using hyperthreading even in the presence of L1TF. Obviously, its going to come in perf hit, but compared to disabling hyperthreading, all its got to do is beat a 60% perf hit to make it the preferable option for making your system L1TF-proof. Anyway - enough of my rambling for now. Thoughts? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |