[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Xen PV PTE ABI (or lack thereof)


Pagetable entries are a shared medium between Xen and PV guests.

To the best of my knowledge, there is nothing written down about the ABI
here, and there are increasing problems as new hardware features become

First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
SMEP/SMAP choices, because of running in ring 1.

SMAP in particular is problematic because older Linux guests do fall
foul of it; they don't understand what a SMAP pagefault is, and enter an
infinite loop of pagefaults.  SMEP is also problematic because it breaks
any guest wishing to use a shared address space between kernel and
user.  (I had some fun getting the test framework to function until I
twigged what was happening).

Both of these are regressions; older guests relying on existing
behaviour cease to function on newer hardware/Xen despite identical

For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
build (so there is a guest observable difference between running on a
debug and a non-debug Xen), and the comment beside it even identifies
that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
is set.  Currently this means that we are not able to support Protection
Key for PV guests (although this restriction technically only applies to
debug builds of the hypervisor).

The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
is used to notice when a 64bit PV guest attempts to override the fixup
Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
_PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
_PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
warning is logged and the kernel overridden.

Neither of the used PTE bits exist in the Xen public ABI.  Neither of
them serve a purpose other than a debugging aid.

I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
"all bits available for guest use".

The other question is what we do when it comes to %cr4 and PV guests.

The current SMAP issue is a blocker for XenServer, and I have some nasty
logic to fix up behind the guests back.  I have only just discovered the
SMEP issue, but it is still a regression (again, nothing states that a
PV guest must have a split address space;  segmentation is a perfectly
valid option in 32bit guests).  The PK issue is one which shouldn't be
an issue for us to implement in PV guests.

I am leaning towards allowing a toolstack to permit a PV guest to be
able to play with a few more CR4 bits.  We can't give a guest kernel
complete carte blanche, because of the security implications.  However,
we do already context switch CR4 for PV guests, so a few extra bits  on
a "nominated safe" domain is no extra hassle.



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.