Xen project Mailing List

Re: [Xen-devel] [for-4.8][PATCH v2 17/23] xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Wed, 21 Sep 2016 19:18:26 -0700 (PDT)

Cc: proskurin@xxxxxxxxxxxxx, sstabellini@xxxxxxxxxx, steve.capper@xxxxxxx, wei.chen@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 22 Sep 2016 02:18:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 15 Sep 2016, Julien Grall wrote: > The ARM architecture mandates to use of a break-before-make sequence > when changing translation entries if the page table is shared between > multiple CPUs whenever a valid entry is replaced by another valid entry > (see D4.7.1 in ARM DDI 0487A.j for more details). > > The break-before-make sequence can be divided in the following steps: > 1) Invalidate the old entry in the page table > 2) Issue a TLB invalidation instruction for the address associated > to this entry > 3) Write the new entry > > The current P2M code implemented in apply_one_level does not respect > this sequence and may result to break coherency on some processors. > > Adapting the current implementation to use the break-before-make > sequence would imply some code duplication and more TLBs invalidation > than necessary. For instance, if we are replacing a 4KB page and the > current mapping in the P2M is using a 1GB superpage, the following steps > will happen: > 1) Shatter the 1GB superpage into a series of 2MB superpages > 2) Shatter the 2MB superpage into a series of 4KB pages > 3) Replace the 4KB page > > As the current implementation is shattering while descending and install > the mapping, Xen would need to issue 3 TLB invalidation instructions > which is clearly inefficient. > > Furthermore, all the operations which modify the page table are using > the same skeleton. It is more complicated to maintain different code paths > than having a generic function that set an entry and take care of the > break-before-make sequence. > > The new implementation is based on the x86 EPT one which, I think, > fits quite well for the break-before-make sequence whilst keeping > the code simple. > > The main function of the new implementation is __p2m_set_entry. It will > only work on mapping that are aligned to a block entry in the page table > (i.e 1GB, 2MB, 4KB when using a 4KB granularity). > > Another function, p2m_set_entry, is provided to break down is region > into mapping that is aligned to a block entry or 4KB when memaccess is > enabled. > > Signed-off-by: Julien Grall <julien.grall@xxxxxxx> > > --- > Changes in v2: > - fix typoes in the commit message > - don't use the default access when shattering the superpage > - remove duplicated comment > - export p2m_set_entry > - s/todo/nr/ > - iommu flush > - update the stats when removing/adding a mapping > - update doc > - fix p2m_split_superpage > - don't try to allocate intermediate page table when removing an > entry > - the TLB flush is not necessary when only permission are > changed (e.g memaccess). > --- > xen/arch/arm/p2m.c | 374 > +++++++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/p2m.h | 11 ++ > xen/include/asm-arm/page.h | 4 + > 3 files changed, 389 insertions(+) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 3ca2e90..5f7aef0 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -783,6 +783,380 @@ static void p2m_put_l3_page(const lpae_t pte) > } > } > > +/* Free lpae sub-tree behind an entry */ > +static void p2m_free_entry(struct p2m_domain *p2m, > + lpae_t entry, unsigned int level) > +{ > + unsigned int i; > + lpae_t *table; > + mfn_t mfn; > + > + /* Nothing to do if the entry is invalid. */ > + if ( !p2m_valid(entry) ) > + return; > + > + /* Nothing to do but updating the states if the entry is a super-page. */ ^ stats Aside from this small typo, everything else is good: Reviewed-by: Stefano Stabellini <sstabellini@xxxxxxxxxx> > + if ( p2m_is_superpage(entry, level) ) > + { > + p2m->stats.mappings[level]--; > + return; > + } > + > + if ( level == 3 ) > + { > + p2m->stats.mappings[level]--; > + p2m_put_l3_page(entry); > + return; > + } > + > + table = map_domain_page(_mfn(entry.p2m.base)); > + for ( i = 0; i < LPAE_ENTRIES; i++ ) > + p2m_free_entry(p2m, *(table + i), level + 1); > + > + unmap_domain_page(table); > + > + /* > + * Make sure all the references in the TLB have been removed before > + * freing the intermediate page table. > + * XXX: Should we defer the free of the page table to avoid the > + * flush? > + */ > + if ( p2m->need_flush ) > + p2m_flush_tlb_sync(p2m); > + > + mfn = _mfn(entry.p2m.base); > + ASSERT(mfn_valid(mfn_x(mfn))); > + > + free_domheap_page(mfn_to_page(mfn_x(mfn))); > +} > + > +static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry, > + unsigned int level, unsigned int target, > + const unsigned int *offsets) > +{ > + struct page_info *page; > + unsigned int i; > + lpae_t pte, *table; > + bool rv = true; > + > + /* Convenience aliases */ > + mfn_t mfn = _mfn(entry->p2m.base); > + unsigned int next_level = level + 1; > + unsigned int level_order = level_orders[next_level]; > + > + /* > + * This should only be called with target != level and the entry is > + * a superpage. > + */ > + ASSERT(level < target); > + ASSERT(p2m_is_superpage(*entry, level)); > + > + page = alloc_domheap_page(NULL, 0); > + if ( !page ) > + return false; > + > + page_list_add(page, &p2m->pages); > + table = __map_domain_page(page); > + > + /* > + * We are either splitting a first level 1G page into 512 second level > + * 2M pages, or a second level 2M page into 512 third level 4K pages. > + */ > + for ( i = 0; i < LPAE_ENTRIES; i++ ) > + { > + lpae_t *new_entry = table + i; > + > + /* > + * Use the content of the superpage entry and override > + * the necessary fields. So the correct permission are kept. > + */ > + pte = *entry; > + pte.p2m.base = mfn_x(mfn_add(mfn, i << level_order)); > + > + /* > + * First and second level pages set p2m.table = 0, but third > + * level entries set p2m.table = 1. > + */ > + pte.p2m.table = (next_level == 3); > + > + write_pte(new_entry, pte); > + } > + > + /* Update stats */ > + p2m->stats.shattered[level]++; > + p2m->stats.mappings[level]--; > + p2m->stats.mappings[next_level] += LPAE_ENTRIES; > + > + /* > + * Shatter superpage in the page to the level we want to make the > + * changes. > + * This is done outside the loop to avoid checking the offset to > + * know whether the entry should be shattered for every entry. > + */ > + if ( next_level != target ) > + rv = p2m_split_superpage(p2m, table + offsets[next_level], > + level + 1, target, offsets); > + > + if ( p2m->clean_pte ) > + clean_dcache_va_range(table, PAGE_SIZE); > + > + unmap_domain_page(table); > + > + pte = mfn_to_p2m_entry(_mfn(page_to_mfn(page)), p2m_invalid, > + p2m->default_access); > + > + /* > + * Even if we failed, we should install the newly allocated LPAE > + * entry. The caller will be in charge to free the sub-tree. > + */ > + p2m_write_pte(entry, pte, p2m->clean_pte); > + > + return rv; > +} > + > +/* > + * Insert an entry in the p2m. This should be called with a mapping > + * equal to a page/superpage (4K, 2M, 1G). > + */ > +static int __p2m_set_entry(struct p2m_domain *p2m, > + gfn_t sgfn, > + unsigned int page_order, > + mfn_t smfn, > + p2m_type_t t, > + p2m_access_t a) > +{ > + paddr_t addr = pfn_to_paddr(gfn_x(sgfn)); > + unsigned int level = 0; > + unsigned int target = 3 - (page_order / LPAE_SHIFT); > + lpae_t *entry, *table, orig_pte; > + int rc; > + > + /* Convenience aliases */ > + const unsigned int offsets[4] = { > + zeroeth_table_offset(addr), > + first_table_offset(addr), > + second_table_offset(addr), > + third_table_offset(addr) > + }; > + > + ASSERT(p2m_is_write_locked(p2m)); > + > + /* > + * Check if the level target is valid: we only support > + * 4K - 2M - 1G mapping. > + */ > + ASSERT(target > 0 && target <= 3); > + > + table = p2m_get_root_pointer(p2m, sgfn); > + if ( !table ) > + return -EINVAL; > + > + for ( level = P2M_ROOT_LEVEL; level < target; level++ ) > + { > + /* > + * Don't try to allocate intermediate page table if the mapping > + * is about to be removed (i.e mfn == INVALID_MFN). > + */ > + rc = p2m_next_level(p2m, mfn_eq(smfn, INVALID_MFN), > + &table, offsets[level]); > + if ( rc == GUEST_TABLE_MAP_FAILED ) > + { > + /* > + * We are here because p2m_next_level has failed to map > + * the intermediate page table (e.g the table does not exist > + * and they p2m tree is read-only). It is a valid case > + * when removing a mapping as it may not exist in the > + * page table. In this case, just ignore it. > + */ > + rc = mfn_eq(smfn, INVALID_MFN) ? 0 : -ENOENT; > + goto out; > + } > + else if ( rc != GUEST_TABLE_NORMAL_PAGE ) > + break; > + } > + > + entry = table + offsets[level]; > + > + /* > + * If we are here with level < target, we must be at a leaf node, > + * and we need to break up the superpage. > + */ > + if ( level < target ) > + { > + /* We need to split the original page. */ > + lpae_t split_pte = *entry; > + > + ASSERT(p2m_is_superpage(*entry, level)); > + > + if ( !p2m_split_superpage(p2m, &split_pte, level, target, offsets) ) > + { > + /* > + * The current super-page is still in-place, so re-increment > + * the stats. > + */ > + p2m->stats.mappings[level]++; > + > + /* Free the allocated sub-tree */ > + p2m_free_entry(p2m, split_pte, level); > + > + rc = -ENOMEM; > + goto out; > + } > + > + /* > + * Follow the break-before-sequence to update the entry. > + * For more details see (D4.7.1 in ARM DDI 0487A.j). > + */ > + p2m_remove_pte(entry, p2m->clean_pte); > + p2m_flush_tlb_sync(p2m); > + > + p2m_write_pte(entry, split_pte, p2m->clean_pte); > + > + /* then move to the level we want to make real changes */ > + for ( ; level < target; level++ ) > + { > + rc = p2m_next_level(p2m, true, &table, offsets[level]); > + > + /* > + * The entry should be found and either be a table > + * or a superpage if level 3 is not targeted > + */ > + ASSERT(rc == GUEST_TABLE_NORMAL_PAGE || > + (rc == GUEST_TABLE_SUPER_PAGE && target < 3)); > + } > + > + entry = table + offsets[level]; > + } > + > + /* > + * We should always be there with the correct level because > + * all the intermediate tables have been installed if necessary. > + */ > + ASSERT(level == target); > + > + orig_pte = *entry; > + > + /* > + * The radix-tree can only work on 4KB. This is only used when > + * memaccess is enabled. > + */ > + ASSERT(!p2m->mem_access_enabled || page_order == 0); > + /* > + * The access type should always be p2m_access_rwx when the mapping > + * is removed. > + */ > + ASSERT(!mfn_eq(INVALID_MFN, smfn) || (a == p2m_access_rwx)); > + /* > + * Update the mem access permission before update the P2M. So we > + * don't have to revert the mapping if it has failed. > + */ > + rc = p2m_mem_access_radix_set(p2m, sgfn, a); > + if ( rc ) > + goto out; > + > + /* > + * Always remove the entry in order to follow the break-before-make > + * sequence when updating the translation table (D4.7.1 in ARM DDI > + * 0487A.j). > + */ > + if ( p2m_valid(orig_pte) ) > + p2m_remove_pte(entry, p2m->clean_pte); > + > + if ( mfn_eq(smfn, INVALID_MFN) ) > + /* Flush can be deferred if the entry is removed */ > + p2m->need_flush |= !!p2m_valid(orig_pte); > + else > + { > + lpae_t pte = mfn_to_p2m_entry(smfn, t, a); > + > + if ( level < 3 ) > + pte.p2m.table = 0; /* Superpage entry */ > + > + /* > + * It is necessary to flush the TLB before writing the new entry > + * to keep coherency when the previous entry was valid. > + * > + * Although, it could be defered when only the permissions are > + * changed (e.g in case of memaccess). > + */ > + if ( p2m_valid(orig_pte) ) > + { > + if ( likely(!p2m->mem_access_enabled) || > + P2M_CLEAR_PERM(pte) != P2M_CLEAR_PERM(orig_pte) ) > + p2m_flush_tlb_sync(p2m); > + else > + p2m->need_flush = true; > + } > + else /* new mapping */ > + p2m->stats.mappings[level]++; > + > + p2m_write_pte(entry, pte, p2m->clean_pte); > + > + p2m->max_mapped_gfn = gfn_max(p2m->max_mapped_gfn, > + gfn_add(sgfn, 1 << page_order)); > + p2m->lowest_mapped_gfn = gfn_min(p2m->lowest_mapped_gfn, sgfn); > + } > + > + /* > + * Free the entry only if the original pte was valid and the base > + * is different (to avoid freeing when permission is changed). > + */ > + if ( p2m_valid(orig_pte) && entry->p2m.base != orig_pte.p2m.base ) > + p2m_free_entry(p2m, orig_pte, level); > + > + if ( need_iommu(p2m->domain) && (p2m_valid(orig_pte) || > p2m_valid(*entry)) ) > + rc = iommu_iotlb_flush(p2m->domain, gfn_x(sgfn), 1UL << page_order); > + else > + rc = 0; > + > +out: > + unmap_domain_page(table); > + > + return rc; > +} > + > +int p2m_set_entry(struct p2m_domain *p2m, > + gfn_t sgfn, > + unsigned long nr, > + mfn_t smfn, > + p2m_type_t t, > + p2m_access_t a) > +{ > + int rc = 0; > + > + while ( nr ) > + { > + /* > + * XXX: Support superpage mappings if nr is not aligned to a > + * superpage size. > + */ > + unsigned long mask = gfn_x(sgfn) | mfn_x(smfn) | nr; > + unsigned long order; > + > + /* Always map 4k by 4k when memaccess is enabled */ > + if ( unlikely(p2m->mem_access_enabled) ) > + order = THIRD_ORDER; > + else if ( !(mask & ((1UL << FIRST_ORDER) - 1)) ) > + order = FIRST_ORDER; > + else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) ) > + order = SECOND_ORDER; > + else > + order = THIRD_ORDER; > + > + rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a); > + if ( rc ) > + break; > + > + sgfn = gfn_add(sgfn, (1 << order)); > + if ( !mfn_eq(smfn, INVALID_MFN) ) > + smfn = mfn_add(smfn, (1 << order)); > + > + nr -= (1 << order); > + } > + > + return rc; > +} > + > /* > * Returns true if start_gpaddr..end_gpaddr contains at least one > * suitably aligned level_size mappping of maddr. > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index 6fe6a37..cca86ef 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -187,6 +187,17 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn, > p2m_type_t *t, p2m_access_t *a, > unsigned int *page_order); > > +/* > + * Direct set a p2m entry: only for use by the P2M code. > + * The P2M write lock should be taken. > + */ > +int p2m_set_entry(struct p2m_domain *p2m, > + gfn_t sgfn, > + unsigned long nr, > + mfn_t smfn, > + p2m_type_t t, > + p2m_access_t a); > + > /* Clean & invalidate caches corresponding to a region of guest address > space */ > int p2m_cache_flush(struct domain *d, gfn_t start, unsigned long nr); > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index a43b0fa..ba63322 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -166,6 +166,10 @@ typedef struct __packed { > unsigned long sbz1:5; > } lpae_p2m_t; > > +/* Permission mask: xn, write, read */ > +#define P2M_PERM_MASK (0x00400000000000C0ULL) > +#define P2M_CLEAR_PERM(pte) ((pte).bits & ~P2M_PERM_MASK) > + > /* > * Walk is the common bits of p2m and pt entries which are needed to > * simply walk the table (e.g. for debug). > -- > 1.9.1 > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.