[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen PV PTE ABI (or lack thereof)
Hi, Pagetable entries are a shared medium between Xen and PV guests. To the best of my knowledge, there is nothing written down about the ABI here, and there are increasing problems as new hardware features become available. First of all, SMEP and SMAP. 32bit PV guests are subject to Xen's SMEP/SMAP choices, because of running in ring 1. SMAP in particular is problematic because older Linux guests do fall foul of it; they don't understand what a SMAP pagefault is, and enter an infinite loop of pagefaults. SMEP is also problematic because it breaks any guest wishing to use a shared address space between kernel and user. (I had some fun getting the test framework to function until I twigged what was happening). Both of these are regressions; older guests relying on existing behaviour cease to function on newer hardware/Xen despite identical settings. For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug build (so there is a guest observable difference between running on a debug and a non-debug Xen), and the comment beside it even identifies that it breaks BSD guests. PTE bits 62:59 used by hardware if CR4.PKE is set. Currently this means that we are not able to support Protection Key for PV guests (although this restriction technically only applies to debug builds of the hypervisor). The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52). This bit is used to notice when a 64bit PV guest attempts to override the fixup Xen applies to its PTEs. Xen unilaterally sets _PAGE_GLOBAL for user pages, and clears _PAGE_GLOBAL for supervisor mappings, setting _PAGE_USER in both cases as the PV kernel runs in ring3. The only thing _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a warning is logged and the kernel overridden. Neither of the used PTE bits exist in the Xen public ABI. Neither of them serve a purpose other than a debugging aid. I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of "all bits available for guest use". The other question is what we do when it comes to %cr4 and PV guests. The current SMAP issue is a blocker for XenServer, and I have some nasty logic to fix up behind the guests back. I have only just discovered the SMEP issue, but it is still a regression (again, nothing states that a PV guest must have a split address space; segmentation is a perfectly valid option in 32bit guests). The PK issue is one which shouldn't be an issue for us to implement in PV guests. I am leaning towards allowing a toolstack to permit a PV guest to be able to play with a few more CR4 bits. We can't give a guest kernel complete carte blanche, because of the security implications. However, we do already context switch CR4 for PV guests, so a few extra bits on a "nominated safe" domain is no extra hassle. Thoughts? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |