[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen PV PTE ABI (or lack thereof)

>>> On 21.01.16 at 12:16, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 21/01/16 11:07, Jan Beulich wrote:
>>>>> On 20.01.16 at 21:10, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> First of all, SMEP and SMAP.  32bit PV guests are subject to Xen's
>>> SMEP/SMAP choices, because of running in ring 1.
>>> SMAP in particular is problematic because older Linux guests do fall
>>> foul of it; they don't understand what a SMAP pagefault is, and enter an
>>> infinite loop of pagefaults.  SMEP is also problematic because it breaks
>>> any guest wishing to use a shared address space between kernel and
>>> user.  (I had some fun getting the test framework to function until I
>>> twigged what was happening).
>>> Both of these are regressions; older guests relying on existing
>>> behaviour cease to function on newer hardware/Xen despite identical
>>> settings.
>> And for both of them there simply should be a way for the guest to
>> state whether it's compatible (which should be the case for anything
>> we can't deal with completely transparently to guests).
>>> For the PTE bits, _PAGE_GNTTAB (bit 62) is used exclusively in debug
>>> build (so there is a guest observable difference between running on a
>>> debug and a non-debug Xen), and the comment beside it even identifies
>>> that it breaks BSD guests.  PTE bits 62:59 used by hardware if  CR4.PKE
>>> is set.  Currently this means that we are not able to support Protection
>>> Key for PV guests (although this restriction technically only applies to
>>> debug builds of the hypervisor).
>>> The other PTE bit used by Xen is _PAGE_GUEST_KERNEL (bit 52).  This bit
>>> is used to notice when a 64bit PV guest attempts to override the fixup
>>> Xen applies to its PTEs.  Xen unilaterally sets _PAGE_GLOBAL for user
>>> pages, and clears _PAGE_GLOBAL for supervisor mappings, setting
>>> _PAGE_USER in both cases as the PV kernel runs in ring3.  The only thing
>>> _PAGE_GUEST_KERNEL is used for is to notice when the kernel deliberately
>>> tries to create a _PAGE_GUEST_KERNEL|_PAGE_GLOBAL, at which point a
>>> warning is logged and the kernel overridden.
>>> Neither of the used PTE bits exist in the Xen public ABI.  Neither of
>>> them serve a purpose other than a debugging aid.
>>> I propose hiding them behind CONFIG_PV_PTE_DEBUG and declaring an ABI of
>>> "all bits available for guest use".
>> And a kernel using any of the conflicting bits would then become
>> unusable on a hypervisor with that debug option enabled? I'd
>> rather see us document the state things are in...
> _PAGE_GNTMAP is already states:
> /*
>  * Debug option: Ensure that granted mappings are not implicitly unmapped.
>  * WARNING: This will need to be disabled to run OSes that use the spare PTE
>  * bits themselves (e.g., *BSD).
>  */

But that's (assuming the use of the two bits were spelled out) a
guest OS not fully playing by the spec. To me, "available" PTE bits
being shared implies that some of them may be claimed by Xen,
while others may be claimed by guests. You're right that this needs
to be written down, but I don't think we need to go as far as
forbidding Xen to use any of them. And even less so should we
preclude their use for any purpose going forward.

In the end, with (as it seems) not much effort this could even be
made dynamic: A guest could advertise which of the bits it doesn't
use, and then Xen could pick two of them for the two purposes it
currently needs them for. Should a guest leave no or just one bit
available, the debugging aid could then be disabled.

> I was intending to have CONFIG_PV_PTE_DEBUG as an EXPERT option,
> disabled by default even in debug builds.
> There should not be an ABI difference between release and "normal" debug
> builds.

Well, I see your point, but as said above I'm not convinced
disabling all that code is the right solution. In fact, what you
propose is not far away from removing that code altogether.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.