[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 7/9] xen/arm: Implement virtual-linear page table for guest p2m mapping in live migration



> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx]
> Sent: Wednesday, October 16, 2013 9:51 PM
> To: Jaeyong Yoo
> Cc: xen-devel@xxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH v4 7/9] xen/arm: Implement virtual-linear
> page table for guest p2m mapping in live migration
> 
> On Fri, 2013-10-04 at 13:44 +0900, Jaeyong Yoo wrote:
> > Implement vlpt (virtual-linear page table) for fast accessing of 3rd
> > PTE of guest p2m. When creating a mapping for vlpt, just copy the 1st
> > level PTE of guest p2m to the xen's 2nd level PTE. Then the mapping
> > becomes the following:
> >     xen's 1st PTE -->
> >     xen's 2nd PTE (which is the same as 1st PTE of guest p2m) -->
> >     guest p2m's 2nd PTE -->
> >     guest p2m's 3rd PTE (the memory contents where the vlpt points)
> > For more info about vlpt, see:
> > http://www.technovelty.org/linux/virtual-linear-page-table.html
> >
> > This function is used in dirty-page tracing: when domU write-fault is
> > trapped by xen, xen can immediately locate the 3rd PTE of guest p2m.
> 
> ISTR you had some numbers which backed up the choice of this solution.
> Can you either include them here or link to the list archives?

I think link to the list is OK. The full text is too long, I guess.

> 
> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> > ---
> >  xen/arch/arm/mm.c            | 78
> ++++++++++++++++++++++++++++++++++++++++++++
> >  xen/include/asm-arm/config.h |  6 ++++  xen/include/asm-arm/domain.h
> > |  7 ++++
> >  xen/include/asm-arm/mm.h     | 22 +++++++++++++
> >  4 files changed, 113 insertions(+)
> >
> > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index
> > 120eae8..98edbc9 100644
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -1382,6 +1382,84 @@ int is_iomem_page(unsigned long mfn)
> >          return 1;
> >      return 0;
> >  }
> > +
> > +/* flush the vlpt area */
> > +static void flush_vlpt(struct domain *d) {
> > +    int flush_size;
> > +    flush_size = (d->arch.dirty.second_lvl_end -
> > +                  d->arch.dirty.second_lvl_start) << SECOND_SHIFT;
> > +    /* flushing the 3rd level mapping */
> > +    flush_xen_data_tlb_range_va(VIRT_LIN_P2M_START,
> > +                                flush_size);
> 
> Shouldn't the base here be VIRT_LIN_P2M_START +
> (d->arch.dirty.second_lvl_start) << SECOND_SHIFT) or something like that?

Yes, right. It will also decrease the flush overhead.

> 
> flush_xen_data_tlb_range_va just turns into a loop over the addresses, so
> you might find you may as well do the flushes as you update the ptes in
> the below, perhaps with an optimisation to pull the barriers outside that
> loop.

You mean the barriers in write_pte? OK. It looks better.

> 
> > +}
> > +
> > +/* restore the xen page table for vlpt mapping for domain d */ void
> > +restore_vlpt(struct domain *d) {
> > +    int i, j = 0;
> > +    int need_flush = 0;
> > +
> > +    for ( i = d->arch.dirty.second_lvl_start;
> > +          i < d->arch.dirty.second_lvl_end;
> > +          ++i, ++j )
> > +    {
> > +        if ( xen_second[i].bits != d->arch.dirty.second_lvl[j].bits )
> > +        {
> > +            write_pte(&xen_second[i], d->arch.dirty.second_lvl[j]);
> > +            need_flush = 1;
> > +        }
> > +    }
> > +
> > +    if ( need_flush ) flush_vlpt(d);
> > +}
> > +
> > +/* setting up the xen page table for vlpt mapping for domain d */
> > +void prepare_vlpt(struct domain *d) {
> > +    int xen_second_linear_base;
> > +    int gp2m_start_index, gp2m_end_index;
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +    struct page_info *second_lvl_page;
> > +    vaddr_t gma_start = 0;
> > +    vaddr_t gma_end = 0;
> > +    lpae_t *first;
> > +    int i, j;
> > +
> > +    xen_second_linear_base = second_linear_offset(VIRT_LIN_P2M_START);
> > +    get_gma_start_end(d, &gma_start, &gma_end);
> > +
> > +    gp2m_start_index = gma_start >> FIRST_SHIFT;
> > +    gp2m_end_index = (gma_end >> FIRST_SHIFT) + 1;
> > +
> > +    second_lvl_page = alloc_domheap_page(NULL, 0);
> 
> The p2m first is two concatenated pages with a total of 1024 entries
> (which is needed to give the full 40 bit IPA space). I have a feeling this
> means you need two pages here?
> 
> Or maybe we shouldn't be supporting the full 40 bit addresses on 32-bit?

For generality, I think it is better to support it. But, honestly, I'm not
sure how many people would use 40 bit ARM guest.

> 
> NB currently create_p2m_entries doesn't actually support 40 bits, but I
> need to fix that for an arm64 platform I'm looking at.
> 
> > +    d->arch.dirty.second_lvl = map_domain_page_global(
> > +                                 page_to_mfn(second_lvl_page) );
> > +
> > +    first = __map_domain_page(p2m->first_level);
> > +    for ( i = gp2m_start_index, j = 0; i < gp2m_end_index; ++i, ++j )
> > +    {
> > +        write_pte(&xen_second[xen_second_linear_base+i],
> > +                  first[i]);
> > +
> > +        /* we copy the mapping into domain's structure as a reference in
> case
> > +         * of the context switch (used in restore_vlpt) */
> > +        d->arch.dirty.second_lvl[j] = first[i];
> > +    }
> > +    unmap_domain_page(first);
> > +
> > +    /* storing the start and end index */
> > +    d->arch.dirty.second_lvl_start = xen_second_linear_base +
> gp2m_start_index;
> > +    d->arch.dirty.second_lvl_end = xen_second_linear_base +
> > + gp2m_end_index;
> > +
> > +    flush_vlpt(d);
> > +}
> > +
> > +void cleanup_vlpt(struct domain *d)
> > +{
> > +    unmap_domain_page_global(d->arch.dirty.second_lvl);
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/include/asm-arm/config.h
> > b/xen/include/asm-arm/config.h index 9e395c2..47fc26e 100644
> > --- a/xen/include/asm-arm/config.h
> > +++ b/xen/include/asm-arm/config.h
> > @@ -87,6 +87,7 @@
> >   *   0  -   8M   <COMMON>
> >   *
> >   *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
> > + * 128M - 256M   Virtual-linear mapping to P2M table
> >   * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
> >   *                    space
> >   *
> > @@ -132,6 +133,9 @@
> >
> >  #define VMAP_VIRT_END    XENHEAP_VIRT_START
> >
> > +#define VIRT_LIN_P2M_START     _AT(vaddr_t,0x08000000)
> 
> Can you put this up in the list of addresses in order please.
> 
> > +#define VIRT_LIN_P2M_END       VMAP_VIRT_START
> 
> > +
> >  #define DOMHEAP_ENTRIES        1024  /* 1024 2MB mapping slots */
> >
> >  /* Number of domheap pagetable pages required at the second level
> > (2MB mappings) */ @@ -158,6 +162,8 @@
> >
> >  #define HYPERVISOR_VIRT_END    DIRECTMAP_VIRT_END
> >
> > +/*TODO (ARM_64): define VIRT_LIN_P2M_START VIRT_LIN_P2M_END */
> > +
> >  #endif
> >
> >  /* Fixmap slots */
> > diff --git a/xen/include/asm-arm/domain.h
> > b/xen/include/asm-arm/domain.h index 755c7af..7e7743a 100644
> > --- a/xen/include/asm-arm/domain.h
> > +++ b/xen/include/asm-arm/domain.h
> > @@ -112,6 +112,13 @@ struct arch_domain
> >          spinlock_t                  lock;
> >      } vuart;
> >
> > +    /* dirty-page tracing */
> > +    struct {
> > +        int second_lvl_start;            /* for context switch */
> > +        int second_lvl_end;
> > +        lpae_t *second_lvl;
> > +    } dirty;
> > +
> >      struct dt_mem_info map_domain;
> >      spinlock_t         map_lock;
> >  }  __cacheline_aligned;
> > diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index
> > 021e8d6..b9cbcf2 100644
> > --- a/xen/include/asm-arm/mm.h
> > +++ b/xen/include/asm-arm/mm.h
> > @@ -344,6 +344,28 @@ static inline void put_page_and_type(struct
> > page_info *page)
> >
> >  void get_gma_start_end(struct domain *d, vaddr_t *start, vaddr_t
> > *end);
> >
> > +/* vlpt (virtual linear page table) related functions */ void
> > +prepare_vlpt(struct domain *d); void cleanup_vlpt(struct domain *d);
> > +void restore_vlpt(struct domain *d);
> > +
> > +/* calculate the xen's virtual address for accessing the leaf PTE of
> > + * a given address (GPA) */
> > +static inline lpae_t * get_vlpt_3lvl_pte(vaddr_t addr)
> 
> Argument is an IPA and should therefore be a paddr_t I think?

Oh, right, and the type in traps.c is also paddr_t. It is my mistake.

> 
> > +{
> > +    unsigned long ipa_third = third_table_offset(addr);
> > +    unsigned long ipa_second = second_table_offset(addr);
> > +    unsigned long ipa_first = first_table_offset(addr);
> > +
> > +    /* guest p2m's first level index is the index of xen's second level,
> > +       and guest p2m's second level index is the index of xen's third
> level */
> > +    return (lpae_t *)(
> > +        ( (second_linear_offset(VIRT_LIN_P2M_START) + ipa_first)
> > +           << SECOND_SHIFT) +
> > +        (ipa_second << THIRD_SHIFT) +
> > +        (ipa_third << 3) );
> > +}
> 
> Can we assign the table base to an "lpae_t *table" and then use
> &table[ipa_third] rather than hardcoding << 3?

OK.

> 
> I can't quite shake the feeling that the "ipa_first<<SECOND_SHIFT" and
> "ipa_second<<THIRD_SHIFT" elements of this could be achieved with a single
> shift of the original address (and maybe a mask).

Right, just one shift with LPAE_SHIFT on the addr is sufficient. I'm surprise
you figure this out immediately.

> 
> > +
> >  #endif /*  __ARCH_ARM_MM__ */
> >  /*
> >   * Local variables:



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.