[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 00/18] x86: adventures in Address Space Isolation



On Tue, Jan 14, 2025 at 05:20:04PM +0100, Jan Beulich wrote:
> On 08.01.2025 15:26, Roger Pau Monne wrote:
> > Hello,
> > 
> > The aim of this series is to introduce the functionality required to
> > create linear mappings visible to a single pCPU.
> > 
> > Doing so requires having a per-vCPU root page-table (L4), and hence
> > requires shadowing the guest selected L4 on PV guests.  As follow ups
> > (and partially to ensure the per-CPU mappings work fine) the CPU stacks
> > are switched to use per-CPU mappings, so that remote stack contents are
> > not by default mapped on all page-tables (note: for this to be true the
> > directmap entries for the stack pages would need to be removed also).
> > 
> > There's one known shortcoming with the presented code: migration of PV
> > guests using per-vCPU root page-tables is not working.  I need to
> > introduce extra logic to deal with PV shadow mode when using unique root
> > page-tables.  I don't think this should block the series however, such
> > missing functionality can always be added as follow up work.
> > paging_domctl() is adjusted to reflect this restriction.
> > 
> > The main differences compared to v1 are the usage of per-vCPU root page
> > tables (as opposed to per-pCPU), and the usage of the existing perdomain
> > family of functions to manage the mappings in the per-domain slot, that
> > now becomes per-vCPU.
> > 
> > All patches until 17 are mostly preparatory, I think there's a nice
> > cleanup and generalization of the creation and managing of per-domain
> > mappings, by no longer storing references to L1 page-tables in the vCPU
> > or domain struct.
> 
> Since you referred me to the cover letter, I've looked back here after
> making some more progress with the series. Along with my earlier comment
> towards the need or ultimate goal, ...
> 
> > Patch 13 introduces the command line option, and would need discussion
> > and integration with the sparse direct map series.  IMO we should get
> > consensus on how we want the command line to look ASAP, so that we can
> > basic parsing logic in place to be used by both the work here and the
> > direct map removal series.
> > 
> > As part of this series the map_domain_page() helpers are also switched
> > to create per-vCPU mappings (see patch 15), which converts an existing
> > interface into creating per-vCPU mappings.  Such interface can be used
> > to hide (map per-vCPU) further data that we don't want to be part of the
> > direct map, or even shared between vCPUs of the same domain.  Also all
> > existing users of the interface will already create per-vCPU mappings
> > without needing additional changes.
> > 
> > Note that none of the logic introduced in the series removes entries for
> > the directmap, so even when creating the per-CPU mappings the underlying
> > physical addresses are fully accessible when using it's direct map
> > entries.
> > 
> > I also haven't done any benchmarking.  Doesn't seem to cripple
> > performance up to the point that XenRT jobs would timeout before
> > finishing, that the only objective reference I can provide at the
> > moment.
> > 
> > The series has been extensively tested on XenRT, but that doesn't cover
> > all possible use-cases, so it's likely to still have some rough edges,
> > handle with care.
> > 
> > Thanks, Roger.
> > 
> > Roger Pau Monne (18):
> >   x86/mm: purge unneeded destroy_perdomain_mapping()
> >   x86/domain: limit window where curr_vcpu != current on context switch
> >   x86/mm: introduce helper to detect per-domain L1 entries that need
> >     freeing
> >   x86/pv: introduce function to populate perdomain area and use it to
> >     map Xen GDT
> >   x86/mm: switch destroy_perdomain_mapping() parameter from domain to
> >     vCPU
> >   x86/pv: set/clear guest GDT mappings using
> >     {populate,destroy}_perdomain_mapping()
> >   x86/pv: update guest LDT mappings using the linear entries
> >   x86/pv: remove stashing of GDT/LDT L1 page-tables
> >   x86/mm: simplify create_perdomain_mapping() interface
> >   x86/mm: switch {create,destroy}_perdomain_mapping() domain parameter
> >     to vCPU
> >   x86/pv: untie issuing FLUSH_ROOT_PGTBL from XPTI
> >   x86/mm: move FLUSH_ROOT_PGTBL handling before TLB flush
> >   x86/spec-ctrl: introduce Address Space Isolation command line option
> >   x86/mm: introduce per-vCPU L3 page-table
> >   x86/mm: introduce a per-vCPU mapcache when using ASI
> >   x86/pv: allow using a unique per-pCPU root page table (L4)
> >   x86/mm: switch to a per-CPU mapped stack when using ASI
> >   x86/mm: zero stack on context switch
> > 
> >  docs/misc/xen-command-line.pandoc    |  24 +++
> >  xen/arch/x86/cpu/mcheck/mce.c        |   4 +
> >  xen/arch/x86/domain.c                | 157 +++++++++++----
> >  xen/arch/x86/domain_page.c           | 105 ++++++----
> >  xen/arch/x86/flushtlb.c              |  28 ++-
> >  xen/arch/x86/hvm/hvm.c               |   6 -
> >  xen/arch/x86/include/asm/config.h    |  16 +-
> >  xen/arch/x86/include/asm/current.h   |  58 +++++-
> >  xen/arch/x86/include/asm/desc.h      |   6 +-
> >  xen/arch/x86/include/asm/domain.h    |  50 +++--
> >  xen/arch/x86/include/asm/flushtlb.h  |   2 +-
> >  xen/arch/x86/include/asm/mm.h        |  15 +-
> >  xen/arch/x86/include/asm/processor.h |   5 +
> >  xen/arch/x86/include/asm/pv/mm.h     |   5 +
> >  xen/arch/x86/include/asm/smp.h       |  12 ++
> >  xen/arch/x86/include/asm/spec_ctrl.h |   4 +
> >  xen/arch/x86/mm.c                    | 291 +++++++++++++++++++++------
> >  xen/arch/x86/mm/hap/hap.c            |   2 +-
> >  xen/arch/x86/mm/paging.c             |   6 +
> >  xen/arch/x86/mm/shadow/hvm.c         |   2 +-
> >  xen/arch/x86/mm/shadow/multi.c       |   2 +-
> >  xen/arch/x86/pv/descriptor-tables.c  |  47 ++---
> >  xen/arch/x86/pv/dom0_build.c         |  12 +-
> >  xen/arch/x86/pv/domain.c             |  57 ++++--
> >  xen/arch/x86/pv/mm.c                 |  43 +++-
> >  xen/arch/x86/setup.c                 |  32 ++-
> >  xen/arch/x86/smp.c                   |  39 ++++
> >  xen/arch/x86/smpboot.c               |  26 ++-
> >  xen/arch/x86/spec_ctrl.c             | 205 ++++++++++++++++++-
> >  xen/arch/x86/traps.c                 |  25 ++-
> >  xen/arch/x86/x86_64/mm.c             |   7 +-
> >  xen/common/smp.c                     |  10 +
> >  xen/common/stop_machine.c            |  10 +
> >  xen/include/xen/smp.h                |   8 +
> >  34 files changed, 1052 insertions(+), 269 deletions(-)
> 
> ... this diffstat (even after subtracting out the contribution of the last two
> patches in the series) doesn't really look like a cleanup / simplification.

To be fair you would need to subtract the contribution of the last 8
patches, as all those are strictly related to ASI.  The perdomain
mapping interface cleanup is just the first 10 patches.  Which leaves
a diffstat of:

 xen/arch/x86/domain.c                |  81 ++++++++++++----
 xen/arch/x86/domain_page.c           |  19 ++--
 xen/arch/x86/hvm/hvm.c               |   6 --
 xen/arch/x86/include/asm/desc.h      |   6 +-
 xen/arch/x86/include/asm/domain.h    |  13 +--
 xen/arch/x86/include/asm/mm.h        |  11 ++-
 xen/arch/x86/include/asm/processor.h |   5 +
 xen/arch/x86/mm.c                    | 175 ++++++++++++++++++++++++++---------
 xen/arch/x86/pv/descriptor-tables.c  |  47 +++++-----
 xen/arch/x86/pv/domain.c             |  24 ++---
 xen/arch/x86/pv/mm.c                 |   3 +-
 xen/arch/x86/smpboot.c               |   6 +-
 xen/arch/x86/traps.c                 |  12 +--
 xen/arch/x86/x86_64/mm.c             |   7 +-
 14 files changed, 260 insertions(+), 155 deletions(-)

That's including the context switch change and not differentiating
between lines of code vs comments.

However, I don't think cleanup / simplifications should be purely
based on diffstat LoC.  Arguably the current
create_perdomain_mapping() set of parameters are not the most obvious
ones:

int create_perdomain_mapping(struct domain *d, unsigned long va,
                             unsigned int nr, l1_pgentry_t **pl1tab,
                             struct page_info **ppg);

Compared to the result after the first 10 patches in the series:

int create_perdomain_mapping(struct vcpu *v, unsigned long va,
                             unsigned int nr, bool populate);

Together with the fact that callers no longer need to keep a reference
to the L1(s) tables to populate such area.

> Things becoming slightly slower (because of the L1 no longer directly 
> available
> to modify) may, otoh, not be a significant issue, if we assume that GDT/LDT
> manipulation isn't normally a very frequent operation.

I introduce a fast path in both {populate,destroy}_perdomain_mapping()
that uses the recursive linear slot for manipulating the L1
page-table.  There's still a slow path that relies on walking the
page-tables, but that should only be used when the vCPU is not
running, and hence the added latency shouldn't be too critical.

As a side-effect of this logic the pages allocated for the per-domain
region page-tables can now uniformly be from domheap.  The usage of
xenheap pages for L1 page-tables is no longer needed once those are
not stashed in the domain structure anymore.

> IOW my earlier request stands: Can you please try to make more clear (in the
> patch descriptions) what exactly the motivation for these changes is? Just
> doing things differently with more code overall can't be it, I don't think.

The main motivation for the change is to remove stashing L1
page-tables for the per-domain area(s) in the domain struct, as with
the introduction of ASI Xen would need to stash L1 page-tables for the
per-domain area on the vcpu struct also, as a result of the per-domain
slot becoming per-vCPU.  IMO managing such references, and having
logic to deal with domain and vcpu L1 page-tables is too complex and
error prone.

Instead I propose to add an interface:
{populate,destroy}_perdomain_mapping() that can be used to manage
mappings on the per-domain area with callers being completely unaware
of whether the domain is running with per-vCPU mappings or not.

Please let me know if you are happy with the reasoning and arguments
provided.  I think the resulting perdomain mapping interface is much
better than what Xen currently does for manipulating per-domain
mapping entries, but I might be spoiled.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.