[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] docs: FRED support in Xen


  • To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 7 Jan 2025 12:11:51 +0100
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 07 Jan 2025 11:12:12 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 06.01.2025 17:06, Andrew Cooper wrote:
> On 06/01/2025 2:28 pm, Jan Beulich wrote:
>> On 03.01.2025 21:47, Andrew Cooper wrote:
>>> +There are several uses of the vm86 fields in Xen:
>>> +
>>> + #. ``struct vcpu`` embeds a ``struct cpu_user_regs`` to hold GPRs/etc when
>>> +    the vCPU is scheduled out.  The vm86 fields are used by the PV logic 
>>> only
>>> +    (``{save,load}_segments()``) and can be moved into separate fields in
>>> +    ``struct pv_vcpu``.  PV's ``dom0_construct()`` sets these fields 
>>> directly,
>>> +    and needs a matching adjustment.
>>> +
>>> + #. As part of ``arch_{get,set}_info_guest()`` during hypercalls.  The
>>> +    guest side needs to remain as-is, but the Xen side can rearranged to 
>>> use
>>> +    the new fields from above.
>>> +
>>> + #. As part of vCPU diagnostics (``show_registers()`` etc).  The ``#DF`` 
>>> path
>>> +    uses the fields as scratch storage for the current register values, 
>>> while
>>> +    the other diagnostics are simply accessing the state of a scheduled-out
>>> +    vCPU.
>> Unlike for the former 2 points and for the one immediately below, but like 
>> for
>> the final one below you don't mention what you intend to do. Here I assume 
>> it'll
>> be reasonably straightforward to use scratch space elsewhere.
> 
> https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-fred
> is my working branch if you want to peek at things.
> 
> The diagnostics are handled by:
> 
> 1) "x86/traps: Rework register state printing to use an extra_state struct"
> 2) "x86/traps: Avoid OoB accesses to print the data selectors"

Doesn't this one remove the sole caller of read_sregs(), hence wanting to also
purge the function itself? Apart from this ...

> 3) "Revert "x86/traps: 'Fix' safety of read_registers() in #DF path""

... these all look fine to me; I'll wait with a formal R-b though until the
actual submission.

> Something else you might want to proactively look at.  "x86/idt:
> Generate bsp_idt[] at build time".  I figured out how to construct the
> IDT at build time, without using an external tool to format the table,
> and only some slightly disgusting linker script hackery.

Clever.

>>> +Stack layout
>>> +""""""""""""
>>> +
>>> +Xen's CPU stacks are 8-page (8-page aligned), arranged as::
>>> +
>>> +  7 - Primary stack (with a struct cpu_info at the top)
>>> +  6 - Primary stack
>>> +  5 - Primary Shadow Stack (read-only)
>>> +  4 - #DF IST stack
>>> +  3 - #DB IST stack
>>> +  2 - NMI IST stack
>>> +  1 - #MC IST stack
>>> +  0 - IST Shadow Stacks (4x 1k, read-only)
>>> +
>>> +which needs mapping into FREDs Stack Levels.
>>> +
>>> +FRED Stack Levels replace IST.  Most events from Ring3 enter Ring0 at SL0,
>>> +including interrupts, and even exceptions with a non-zero Stack Level
>>> +configured.  Nested exceptions originate from Ring0 even if they were 
>>> trying
>>> +to push a Ring3 event frame onto the stack, so do follow the Ring0 CSL 
>>> rules.
>>> +
>>> +Within Ring0, a stack switch occurs on event delivery if the event has a
>>> +higher configured Stack Level (exceptions in ``MSR_FRED_STK_LVLS``, 
>>> interrupts
>>> +in ``MSR_FRED_CONFIG``).  Otherwise, the new event is delivered on the 
>>> current
>>> +stack.
>>> +
>>> +Under FRED, most sources of ``#DF`` are gone; failure to push a new event
>>> +frame onto a stack is the main remaining one, so ``#DF`` needs to be the
>>> +highest stack level (SL3) to catch errors at all other stack levels.
>>> +
>>> +Also, FRED removes the "syscall gap", removing the primary need for 
>>> ``NMI``,
>>> +``#DB`` and ``#MC`` to need separate stacks.
>>> +
>>> +Therefore, Xen has no need for SL1 or SL2.  Under IDT delivery, we poison 
>>> the
>>> +unused stack pointers with a non-canonical address, but we cannot do that
>>> +under FRED; they're held in MSRs and checked at WRMSR time.  Instead, we 
>>> can
>>> +point the SL pairs (RSP + SSP) at each others (regular and shadow stack) 
>>> guard
>>> +pages such that any use of an unused SL will escalate to ``#DF``.
>> I may have a language issue here: "each others" reads wrong to me; do you 
>> perhaps
>> mean "each ones"?
> 
> It's poor grammar, but not wrong per say.  I'll try to find a different
> way to phrase it.
> 
>>
>> Further, mainly to double check my own understanding: Almost half of the 
>> stack
>> area then isn't used anymore when FRED is active: 2 pages for the primary 
>> stack,
>> 1 page for the primary shadow stack, 1 page for the SL3 stack, and (somewhat
>> wastefully) 1 more for the SL3 shadow stack.
> 
> This matches my understanding (on the proviso that I've still not wired
> up the stack handling yet.  Still cleaning up the initialisation paths.)
> 
>>  There'll be 3 unused pages, and
>> hence space again to have true guard pages, e.g.
>>
>>   7 - Primary stack (with a struct cpu_info at the top)
>>   6 - Primary stack
>>   5 - Guard page
>>   4 - Primary Shadow Stack (read-only)
>>   3 - Guard page
>>   2 - #DF stack
>>   1 - #DF Shadow Stack (read-only)
>>   0 - Guard page
> 
> Shadow stack frames are perfectly good guard pages for regular stacks,
> and vice versa.  That's why I set them up as shadow stack pages even
> when CET isn't active.

"Perfectly valid" isn't quite true: These pages being readable means
writes below the stack bottom (likely the prevailing kind of problem)
are detected, but reads wouldn't be.

> And yes, we could rearrange the stack.  But, there's also a good reason
> not to.  Our code has to cope with both IDT and FRED layouts, which is
> much easier if they're the same.

I certainly can see the value of keeping stack layout uniform.

>> Having reached the bottom of the section, there's one special IST aspect that
>> I'm missing, special enough imo to warrant mentioning even if only to state 
>> that
>> it's (hopefully) going to be irrelevant (i.e. not require replacing by 
>> another
>> similar hack): {dis,en}able_each_ist() as used by SVM (on the assumption that
>> sooner or later AMD is likely to also implement FRED, and that you may 
>> already
>> know of details of their respective VMCB integration).
> 
> AMD haven't said anything about FRED yet, despite a very large number of
> software partners asking about it.
> 
> However, If AMD were to do FRED, I'd expect it to work just like CET
> does today, seeing as there's a proper host/guest split of CR4, and
> everything else is in MSRs the guest can't touch.

As in "can be prevented to touch directly", aiui.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.