[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/6] x86/EPT: don't walk entire page tables when globally changing types



At 13:28 +0100 on 22 Apr (1398169701), Jan Beulich wrote:
> Instead leverage the EPT_MISCONFIG VM exit by marking just the top
> level entries as needing recalculation of their type, propagating the
> the recalculation state down as necessary such that the actual
> recalculation gets done upon access.
> 
> For this to work, we have to
> - restrict the types between which conversions can be done (right now
>   only the two types involved in log dirty tracking need to be taken
>   care of)
> - remember the ranges that log dirty tracking was requested for as well
>   as whether global log dirty tracking is in effect
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>

Generally looks good; I think the core idea of reusing the misconfig
mechanism is the right way to go. 

The naming you've chosen implies that it might be extended later to
other lazily-updated types; I'm not sure how well that will scale.
OTOH right now I can't think of any other users for this (and I can 
see why you're not happy with using change_type() to flush out
foreign mappings on teardown after this changes).

A few comments on detail below:

> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -110,11 +110,18 @@ int hap_track_dirty_vram(struct domain *
>          if ( begin_pfn != dirty_vram->begin_pfn ||
>               begin_pfn + nr != dirty_vram->end_pfn )
>          {
> +            unsigned long ostart = dirty_vram->begin_pfn;
> +            unsigned long oend = dirty_vram->end_pfn;
> +
>              dirty_vram->begin_pfn = begin_pfn;
>              dirty_vram->end_pfn = begin_pfn + nr;
>  
>              paging_unlock(d);
>  
> +            if ( oend > ostart )
> +                p2m_change_type_range(d, ostart, oend,
> +                                      p2m_ram_logdirty, p2m_ram_rw);
> +

Does this (and the simlar change below) not risk losing updates if
we're in full log-dirty mode?  I think it should be fine to leave
those entries as log-dirty, since at worst they'll provoke another
fault-and-fixup.

> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -116,8 +116,14 @@ static int p2m_init_hostp2m(struct domai
>  
>      if ( p2m )
>      {
> -        d->arch.p2m = p2m;
> -        return 0;
> +        p2m->logdirty_ranges = rangeset_new(d, "log-dirty",
> +                                            RANGESETF_prettyprint_hex);

Is there anything the guest can do to cause this rangeset to grow very
large?  Like moving its video RAM around?

> +/*
> + * Resolve deliberately mis-configured (EMT field set to an invalid value)
> + * entries in the page table hierarchy for the given GFN:
> + * - calculate the correct value for the EMT field
> + * - if marked so, re-calculate the P2M type
> + * - propagate EMT and re-calculation flag down to the next page table level
> + *   for entries not involved in the translation of the given GFN
> + */
> +static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
>  {

Please document the different return values here as well.

>  
> -    return !!okay;
> +    return rc >= !spurious;

Please just write out this logic in a human-readable way.  The compiler
will DTRT. :)  

>  }
>  
>  /*
> @@ -416,12 +470,11 @@ ept_set_entry(struct p2m_domain *p2m, un
>      unsigned long gfn_remainder = gfn;
>      int i, target = order / EPT_TABLE_ORDER;
>      int rc = 0;
> -    int ret = 0;
>      bool_t direct_mmio = (p2mt == p2m_mmio_direct);
>      uint8_t ipat = 0;
>      int need_modify_vtd_table = 1;
>      int vtd_pte_present = 0;
> -    int needs_sync = 1;
> +    int ret, needs_sync = -1;

Hmmm, another tristate.  I find these hard to follow, esp.  without
comments to say which non-zero value is which.  I wonder if we could
add some sort of framework for these to make them clearer.  Ideally
we'd have an enum-like type and rely on the compiler to DTRT about
optimizing the tests.

In this case, where needs_sync isn't being passed as a return value, I
think we'd be fine with two booleans in place of this int.

And more generally, I would like to see at least a comment on every
new instance of this trick before I ack it in x86/mm code.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.