[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [PATCH v2 for-next v2 5/8] x86/mm: split PV guest supporting code to pv/mm.c
Move the following PV specific code to the new file: 1. Several hypercalls that are tied to PV: 1. do_mmuext_op 2. do_mmu_update 3. do_update_va_mapping 4. do_update_va_mapping_otherdomain 5. do_set_gdt 6. do_update_descriptor 2. PV MMIO emulation code 3. PV writable page table emulation code 4. PV grant table mapping creation / destruction code 5. Other supporting code for the above items Move everything in one patch because they share a lot of code. Also move the PV page table API comment to the new file. Remove all trailing white spaces. Due to the code movement, a few functions are exported via relevant header files. Some configuration variables are made non-static. No functional change. Signed-off-by: Wei Liu <wei.liu2@xxxxxxxxxx> --- xen/arch/x86/mm.c | 4964 ++++--------------------------------- xen/arch/x86/pv/Makefile | 1 + xen/arch/x86/pv/mm.c | 4118 ++++++++++++++++++++++++++++++ xen/include/asm-x86/grant_table.h | 4 + xen/include/asm-x86/mm.h | 9 + xen/include/xen/mm.h | 1 + 6 files changed, 4581 insertions(+), 4516 deletions(-) create mode 100644 xen/arch/x86/pv/mm.c diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index e1ce77b9ac..169ae7e4a1 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -18,71 +18,6 @@ * along with this program; If not, see <http://www.gnu.org/licenses/>. */ -/* - * A description of the x86 page table API: - * - * Domains trap to do_mmu_update with a list of update requests. - * This is a list of (ptr, val) pairs, where the requested operation - * is *ptr = val. - * - * Reference counting of pages: - * ---------------------------- - * Each page has two refcounts: tot_count and type_count. - * - * TOT_COUNT is the obvious reference count. It counts all uses of a - * physical page frame by a domain, including uses as a page directory, - * a page table, or simple mappings via a PTE. This count prevents a - * domain from releasing a frame back to the free pool when it still holds - * a reference to it. - * - * TYPE_COUNT is more subtle. A frame can be put to one of three - * mutually-exclusive uses: it might be used as a page directory, or a - * page table, or it may be mapped writable by the domain [of course, a - * frame may not be used in any of these three ways!]. - * So, type_count is a count of the number of times a frame is being - * referred to in its current incarnation. Therefore, a page can only - * change its type when its type count is zero. - * - * Pinning the page type: - * ---------------------- - * The type of a page can be pinned/unpinned with the commands - * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is, - * pinning is not reference counted, so it can't be nested). - * This is useful to prevent a page's type count falling to zero, at which - * point safety checks would need to be carried out next time the count - * is increased again. - * - * A further note on writable page mappings: - * ----------------------------------------- - * For simplicity, the count of writable mappings for a page may not - * correspond to reality. The 'writable count' is incremented for every - * PTE which maps the page with the _PAGE_RW flag set. However, for - * write access to be possible the page directory entry must also have - * its _PAGE_RW bit set. We do not check this as it complicates the - * reference counting considerably [consider the case of multiple - * directory entries referencing a single page table, some with the RW - * bit set, others not -- it starts getting a bit messy]. - * In normal use, this simplification shouldn't be a problem. - * However, the logic can be added if required. - * - * One more note on read-only page mappings: - * ----------------------------------------- - * We want domains to be able to map pages for read-only access. The - * main reason is that page tables and directories should be readable - * by a domain, but it would not be safe for them to be writable. - * However, domains have free access to rings 1 & 2 of the Intel - * privilege model. In terms of page protection, these are considered - * to be part of 'supervisor mode'. The WP bit in CR0 controls whether - * read-only restrictions are respected in supervisor mode -- if the - * bit is clear then any mapped page is writable. - * - * We get round this by always setting the WP bit and disallowing - * updates to it. This is very unlikely to cause a problem for guest - * OS's, which will generally use the WP bit to simplify copy-on-write - * implementation (in that case, OS wants a fault when it writes to - * an application-supplied buffer). - */ - #include <xen/init.h> #include <xen/kernel.h> #include <xen/lib.h> @@ -151,30 +86,9 @@ struct rangeset *__read_mostly mmio_ro_ranges; bool_t __read_mostly opt_allow_superpage; boolean_param("allowsuperpage", opt_allow_superpage); -static void put_superpage(unsigned long mfn); - -static uint32_t base_disallow_mask; -/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */ -#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL) - -#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \ - ? base_disallow_mask & ~_PAGE_PSE \ - : base_disallow_mask) - -#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \ - base_disallow_mask : 0xFFFFF198U) - -#define L4_DISALLOW_MASK (base_disallow_mask) - -#define l1_disallow_mask(d) \ - ((d != dom_io) && \ - (rangeset_is_empty((d)->iomem_caps) && \ - rangeset_is_empty((d)->arch.ioport_caps) && \ - !has_arch_pdevs(d) && \ - is_pv_domain(d)) ? \ - L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS)) +uint32_t base_disallow_mask; -static s8 __read_mostly opt_mmio_relax; +s8 __read_mostly opt_mmio_relax; static void __init parse_mmio_relax(const char *s) { if ( !*s ) @@ -539,165 +453,7 @@ void update_cr3(struct vcpu *v) make_cr3(v, cr3_mfn); } -/* Get a mapping of a PV guest's l1e for this virtual address. */ -static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn) -{ - l2_pgentry_t l2e; - - ASSERT(!paging_mode_translate(current->domain)); - ASSERT(!paging_mode_external(current->domain)); - - if ( unlikely(!__addr_ok(addr)) ) - return NULL; - - /* Find this l1e and its enclosing l1mfn in the linear map. */ - if ( __copy_from_user(&l2e, - &__linear_l2_table[l2_linear_offset(addr)], - sizeof(l2_pgentry_t)) ) - return NULL; - - /* Check flags that it will be safe to read the l1e. */ - if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT ) - return NULL; - - *gl1mfn = l2e_get_pfn(l2e); - - return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) + - l1_table_offset(addr); -} - -/* Pull down the mapping we got from guest_map_l1e(). */ -static inline void guest_unmap_l1e(void *p) -{ - unmap_domain_page(p); -} - -/* Read a PV guest's l1e that maps this virtual address. */ -static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e) -{ - ASSERT(!paging_mode_translate(current->domain)); - ASSERT(!paging_mode_external(current->domain)); - - if ( unlikely(!__addr_ok(addr)) || - __copy_from_user(eff_l1e, - &__linear_l1_table[l1_linear_offset(addr)], - sizeof(l1_pgentry_t)) ) - *eff_l1e = l1e_empty(); -} - -/* - * Read the guest's l1e that maps this address, from the kernel-mode - * page tables. - */ -static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr, - void *eff_l1e) -{ - bool_t user_mode = !(v->arch.flags & TF_kernel_mode); -#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) - - TOGGLE_MODE(); - guest_get_eff_l1e(addr, eff_l1e); - TOGGLE_MODE(); -} - -const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE) - zero_page[PAGE_SIZE]; - -static void invalidate_shadow_ldt(struct vcpu *v, int flush) -{ - l1_pgentry_t *pl1e; - unsigned int i; - struct page_info *page; - - BUG_ON(unlikely(in_irq())); - - spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); - - if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 ) - goto out; - - v->arch.pv_vcpu.shadow_ldt_mapcnt = 0; - pl1e = gdt_ldt_ptes(v->domain, v); - - for ( i = 16; i < 32; i++ ) - { - if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) ) - continue; - page = l1e_get_page(pl1e[i]); - l1e_write(&pl1e[i], l1e_empty()); - ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page); - ASSERT_PAGE_IS_DOMAIN(page, v->domain); - put_page_and_type(page); - } - - /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */ - if ( flush ) - flush_tlb_mask(v->vcpu_dirty_cpumask); - - out: - spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); -} - - -static int alloc_segdesc_page(struct page_info *page) -{ - const struct domain *owner = page_get_owner(page); - struct desc_struct *descs = __map_domain_page(page); - unsigned i; - - for ( i = 0; i < 512; i++ ) - if ( unlikely(!check_descriptor(owner, &descs[i])) ) - break; - - unmap_domain_page(descs); - - return i == 512 ? 0 : -EINVAL; -} - - -/* Map shadow page at offset @off. */ -int map_ldt_shadow_page(unsigned int off) -{ - struct vcpu *v = current; - struct domain *d = v->domain; - unsigned long gmfn; - struct page_info *page; - l1_pgentry_t l1e, nl1e; - unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT); - int okay; - - BUG_ON(unlikely(in_irq())); - - if ( is_pv_32bit_domain(d) ) - gva = (u32)gva; - guest_get_eff_kern_l1e(v, gva, &l1e); - if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) ) - return 0; - - gmfn = l1e_get_pfn(l1e); - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); - if ( unlikely(!page) ) - return 0; - - okay = get_page_type(page, PGT_seg_desc_page); - if ( unlikely(!okay) ) - { - put_page(page); - return 0; - } - - nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW); - - spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); - l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e); - v->arch.pv_vcpu.shadow_ldt_mapcnt++; - spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); - - return 1; -} - - -static int get_page_from_pagenr(unsigned long page_nr, struct domain *d) +int get_page_from_pagenr(unsigned long page_nr, struct domain *d) { struct page_info *page = mfn_to_page(page_nr); @@ -712,11 +468,11 @@ static int get_page_from_pagenr(unsigned long page_nr, struct domain *d) } -static int get_page_and_type_from_pagenr(unsigned long page_nr, - unsigned long type, - struct domain *d, - int partial, - int preemptible) +int get_page_and_type_from_pagenr(unsigned long page_nr, + unsigned long type, + struct domain *d, + int partial, + int preemptible) { struct page_info *page = mfn_to_page(page_nr); int rc; @@ -736,72 +492,6 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr, return rc; } -static void put_data_page( - struct page_info *page, int writeable) -{ - if ( writeable ) - put_page_and_type(page); - else - put_page(page); -} - -/* - * We allow root tables to map each other (a.k.a. linear page tables). It - * needs some special care with reference counts and access permissions: - * 1. The mapping entry must be read-only, or the guest may get write access - * to its own PTEs. - * 2. We must only bump the reference counts for an *already validated* - * L2 table, or we can end up in a deadlock in get_page_type() by waiting - * on a validation that is required to complete that validation. - * 3. We only need to increment the reference counts for the mapped page - * frame if it is mapped by a different root table. This is sufficient and - * also necessary to allow validation of a root table mapping itself. - */ -#define define_get_linear_pagetable(level) \ -static int \ -get_##level##_linear_pagetable( \ - level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \ -{ \ - unsigned long x, y; \ - struct page_info *page; \ - unsigned long pfn; \ - \ - if ( (level##e_get_flags(pde) & _PAGE_RW) ) \ - { \ - gdprintk(XENLOG_WARNING, \ - "Attempt to create linear p.t. with write perms\n"); \ - return 0; \ - } \ - \ - if ( (pfn = level##e_get_pfn(pde)) != pde_pfn ) \ - { \ - /* Make sure the mapped frame belongs to the correct domain. */ \ - if ( unlikely(!get_page_from_pagenr(pfn, d)) ) \ - return 0; \ - \ - /* \ - * Ensure that the mapped frame is an already-validated page table. \ - * If so, atomically increment the count (checking for overflow). \ - */ \ - page = mfn_to_page(pfn); \ - y = page->u.inuse.type_info; \ - do { \ - x = y; \ - if ( unlikely((x & PGT_count_mask) == PGT_count_mask) || \ - unlikely((x & (PGT_type_mask|PGT_validated)) != \ - (PGT_##level##_page_table|PGT_validated)) ) \ - { \ - put_page(page); \ - return 0; \ - } \ - } \ - while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x ); \ - } \ - \ - return 1; \ -} - - bool is_iomem_page(mfn_t mfn) { struct page_info *page; @@ -816,7 +506,7 @@ bool is_iomem_page(mfn_t mfn) return (page_get_owner(page) == dom_io); } -static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr) +int update_xen_mappings(unsigned long mfn, unsigned int cacheattr) { int err = 0; bool_t alias = mfn >= PFN_DOWN(xen_phys_start) && @@ -834,3414 +524,489 @@ static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr) return err; } -#ifndef NDEBUG -struct mmio_emul_range_ctxt { - const struct domain *d; - unsigned long mfn; -}; - -static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg) +bool_t fill_ro_mpt(unsigned long mfn) { - const struct mmio_emul_range_ctxt *ctxt = arg; - - if ( ctxt->mfn > e ) - return 0; + l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn)); + bool_t ret = 0; - if ( ctxt->mfn >= s ) + if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) ) { - static DEFINE_SPINLOCK(last_lock); - static const struct domain *last_d; - static unsigned long last_s = ~0UL, last_e; - bool_t print = 0; + l4tab[l4_table_offset(RO_MPT_VIRT_START)] = + idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]; + ret = 1; + } + unmap_domain_page(l4tab); - spin_lock(&last_lock); - if ( last_d != ctxt->d || last_s != s || last_e != e ) - { - last_d = ctxt->d; - last_s = s; - last_e = e; - print = 1; - } - spin_unlock(&last_lock); + return ret; +} - if ( print ) - printk(XENLOG_G_INFO - "d%d: Forcing write emulation on MFNs %lx-%lx\n", - ctxt->d->domain_id, s, e); - } +void zap_ro_mpt(unsigned long mfn) +{ + l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn)); - return 1; + l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); + unmap_domain_page(l4tab); } -#endif -int -get_page_from_l1e( - l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner) +int page_lock(struct page_info *page) { - unsigned long mfn = l1e_get_pfn(l1e); - struct page_info *page = mfn_to_page(mfn); - uint32_t l1f = l1e_get_flags(l1e); - struct vcpu *curr = current; - struct domain *real_pg_owner; - bool_t write; - - if ( !(l1f & _PAGE_PRESENT) ) - return 0; + unsigned long x, nx; - if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) ) - { - gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n", - l1f & l1_disallow_mask(l1e_owner)); - return -EINVAL; - } + do { + while ( (x = page->u.inuse.type_info) & PGT_locked ) + cpu_relax(); + nx = x + (1 | PGT_locked); + if ( !(x & PGT_validated) || + !(x & PGT_count_mask) || + !(nx & PGT_count_mask) ) + return 0; + } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x ); - if ( !mfn_valid(_mfn(mfn)) || - (real_pg_owner = page_get_owner_and_reference(page)) == dom_io ) - { - int flip = 0; + return 1; +} - /* Only needed the reference to confirm dom_io ownership. */ - if ( mfn_valid(_mfn(mfn)) ) - put_page(page); +void page_unlock(struct page_info *page) +{ + unsigned long x, nx, y = page->u.inuse.type_info; - /* DOMID_IO reverts to caller for privilege checks. */ - if ( pg_owner == dom_io ) - pg_owner = curr->domain; + do { + x = y; + nx = x - (1 | PGT_locked); + } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x ); +} - if ( !iomem_access_permitted(pg_owner, mfn, mfn) ) - { - if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */ - { - gdprintk(XENLOG_WARNING, - "d%d non-privileged attempt to map MMIO space %"PRI_mfn"\n", - pg_owner->domain_id, mfn); - return -EPERM; - } - return -EINVAL; - } +static int cleanup_page_cacheattr(struct page_info *page) +{ + unsigned int cacheattr = + (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base; - if ( pg_owner != l1e_owner && - !iomem_access_permitted(l1e_owner, mfn, mfn) ) - { - if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */ - { - gdprintk(XENLOG_WARNING, - "d%d attempted to map MMIO space %"PRI_mfn" in d%d to d%d\n", - curr->domain->domain_id, mfn, pg_owner->domain_id, - l1e_owner->domain_id); - return -EPERM; - } - return -EINVAL; - } + if ( likely(cacheattr == 0) ) + return 0; - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) - { - /* MMIO pages must not be mapped cachable unless requested so. */ - switch ( opt_mmio_relax ) - { - case 0: - break; - case 1: - if ( !is_hardware_domain(l1e_owner) ) - break; - /* fallthrough */ - case -1: - return 0; - default: - ASSERT_UNREACHABLE(); - } - } - else if ( l1f & _PAGE_RW ) - { -#ifndef NDEBUG - const unsigned long *ro_map; - unsigned int seg, bdf; - - if ( !pci_mmcfg_decode(mfn, &seg, &bdf) || - ((ro_map = pci_get_ro_map(seg)) != NULL && - test_bit(bdf, ro_map)) ) - printk(XENLOG_G_WARNING - "d%d: Forcing read-only access to MFN %lx\n", - l1e_owner->domain_id, mfn); - else - rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL, - print_mmio_emul_range, - &(struct mmio_emul_range_ctxt){ - .d = l1e_owner, - .mfn = mfn }); -#endif - flip = _PAGE_RW; - } + page->count_info &= ~PGC_cacheattr_mask; - switch ( l1f & PAGE_CACHE_ATTRS ) - { - case 0: /* WB */ - flip |= _PAGE_PWT | _PAGE_PCD; - break; - case _PAGE_PWT: /* WT */ - case _PAGE_PWT | _PAGE_PAT: /* WP */ - flip |= _PAGE_PCD | (l1f & _PAGE_PAT); - break; - } + BUG_ON(is_xen_heap_page(page)); - return flip; - } + return update_xen_mappings(page_to_mfn(page), 0); +} - if ( unlikely( (real_pg_owner != pg_owner) && - (real_pg_owner != dom_cow) ) ) - { - /* - * Let privileged domains transfer the right to map their target - * domain's pages. This is used to allow stub-domain pvfb export to - * dom0, until pvfb supports granted mappings. At that time this - * minor hack can go away. - */ - if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) || - xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) ) - { - gdprintk(XENLOG_WARNING, - "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n", - pg_owner->domain_id, l1e_owner->domain_id, - real_pg_owner ? real_pg_owner->domain_id : -1); - goto could_not_pin; - } - pg_owner = real_pg_owner; - } +void put_page(struct page_info *page) +{ + unsigned long nx, x, y = page->count_info; - /* Extra paranoid check for shared memory. Writable mappings - * disallowed (unshare first!) */ - if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) ) - goto could_not_pin; - - /* Foreign mappings into guests in shadow external mode don't - * contribute to writeable mapping refcounts. (This allows the - * qemu-dm helper process in dom0 to map the domain's memory without - * messing up the count of "real" writable mappings.) */ - write = (l1f & _PAGE_RW) && - ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)); - if ( write && !get_page_type(page, PGT_writable_page) ) - { - gdprintk(XENLOG_WARNING, "Could not get page type PGT_writable_page\n"); - goto could_not_pin; + do { + ASSERT((y & PGC_count_mask) != 0); + x = y; + nx = x - 1; } + while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) ); - if ( pte_flags_to_cacheattr(l1f) != - ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) ) + if ( unlikely((nx & PGC_count_mask) == 0) ) { - unsigned long x, nx, y = page->count_info; - unsigned long cacheattr = pte_flags_to_cacheattr(l1f); - int err; - - if ( is_xen_heap_page(page) ) - { - if ( write ) - put_page_type(page); - put_page(page); + if ( cleanup_page_cacheattr(page) == 0 ) + free_domheap_page(page); + else gdprintk(XENLOG_WARNING, - "Attempt to change cache attributes of Xen heap page\n"); - return -EACCES; - } + "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page)); + } +} - do { - x = y; - nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base); - } while ( (y = cmpxchg(&page->count_info, x, nx)) != x ); - err = update_xen_mappings(mfn, cacheattr); - if ( unlikely(err) ) - { - cacheattr = y & PGC_cacheattr_mask; - do { - x = y; - nx = (x & ~PGC_cacheattr_mask) | cacheattr; - } while ( (y = cmpxchg(&page->count_info, x, nx)) != x ); - - if ( write ) - put_page_type(page); - put_page(page); +struct domain *page_get_owner_and_reference(struct page_info *page) +{ + unsigned long x, y = page->count_info; + struct domain *owner; - gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn - " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n", - mfn, get_gpfn_from_mfn(mfn), - l1e_get_intpte(l1e), l1e_owner->domain_id); - return err; - } + do { + x = y; + /* + * Count == 0: Page is not allocated, so we cannot take a reference. + * Count == -1: Reference count would wrap, which is invalid. + * Count == -2: Remaining unused ref is reserved for get_page_light(). + */ + if ( unlikely(((x + 2) & PGC_count_mask) <= 2) ) + return NULL; } + while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x ); - return 0; + owner = page_get_owner(page); + ASSERT(owner); - could_not_pin: - gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn - ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d", - mfn, get_gpfn_from_mfn(mfn), - l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id); - if ( real_pg_owner != NULL ) - put_page(page); - return -EBUSY; + return owner; } -/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */ -define_get_linear_pagetable(l2); -static int -get_page_from_l2e( - l2_pgentry_t l2e, unsigned long pfn, struct domain *d) +int get_page(struct page_info *page, struct domain *domain) { - unsigned long mfn = l2e_get_pfn(l2e); - int rc; + struct domain *owner = page_get_owner_and_reference(page); - if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) ) + if ( likely(owner == domain) ) return 1; - if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) ) - { - gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n", - l2e_get_flags(l2e) & L2_DISALLOW_MASK); - return -EINVAL; - } - - if ( !(l2e_get_flags(l2e) & _PAGE_PSE) ) - { - rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0); - if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) ) - rc = 0; - return rc; - } + if ( !paging_mode_refcounts(domain) && !domain->is_dying ) + gprintk(XENLOG_INFO, + "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n", + page_to_mfn(page), domain->domain_id, + owner ? owner->domain_id : DOMID_INVALID, + page->count_info - !!owner, page->u.inuse.type_info); - if ( !opt_allow_superpage ) - { - gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n"); - return -EINVAL; - } + if ( owner ) + put_page(page); - if ( mfn & (L1_PAGETABLE_ENTRIES-1) ) - { - gdprintk(XENLOG_WARNING, - "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn); - return -EINVAL; - } - - return get_superpage(mfn, d); -} - - -define_get_linear_pagetable(l3); -static int -get_page_from_l3e( - l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial) -{ - int rc; - - if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) ) - return 1; - - if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) ) - { - gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n", - l3e_get_flags(l3e) & l3_disallow_mask(d)); - return -EINVAL; - } - - rc = get_page_and_type_from_pagenr( - l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1); - if ( unlikely(rc == -EINVAL) && - !is_pv_32bit_domain(d) && - get_l3_linear_pagetable(l3e, pfn, d) ) - rc = 0; - - return rc; -} - -define_get_linear_pagetable(l4); -static int -get_page_from_l4e( - l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial) -{ - int rc; - - if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) ) - return 1; - - if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) ) - { - gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n", - l4e_get_flags(l4e) & L4_DISALLOW_MASK); - return -EINVAL; - } - - rc = get_page_and_type_from_pagenr( - l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1); - if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) ) - rc = 0; - - return rc; -} - -#define adjust_guest_l1e(pl1e, d) \ - do { \ - if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) && \ - likely(!is_pv_32bit_domain(d)) ) \ - { \ - /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */ \ - if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \ - == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) ) \ - gdprintk(XENLOG_WARNING, \ - "Global bit is set to kernel page %lx\n", \ - l1e_get_pfn((pl1e))); \ - if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) ) \ - l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER)); \ - if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) ) \ - l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER)); \ - } \ - } while ( 0 ) - -#define adjust_guest_l2e(pl2e, d) \ - do { \ - if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) && \ - likely(!is_pv_32bit_domain(d)) ) \ - l2e_add_flags((pl2e), _PAGE_USER); \ - } while ( 0 ) - -#define adjust_guest_l3e(pl3e, d) \ - do { \ - if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) ) \ - l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ? \ - _PAGE_USER : \ - _PAGE_USER|_PAGE_RW); \ - } while ( 0 ) - -#define adjust_guest_l4e(pl4e, d) \ - do { \ - if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) && \ - likely(!is_pv_32bit_domain(d)) ) \ - l4e_add_flags((pl4e), _PAGE_USER); \ - } while ( 0 ) - -#define unadjust_guest_l3e(pl3e, d) \ - do { \ - if ( unlikely(is_pv_32bit_domain(d)) && \ - likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) ) \ - l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED); \ - } while ( 0 ) - -void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) -{ - unsigned long pfn = l1e_get_pfn(l1e); - struct page_info *page; - struct domain *pg_owner; - struct vcpu *v; - - if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) ) - return; - - page = mfn_to_page(pfn); - pg_owner = page_get_owner(page); - - /* - * Check if this is a mapping that was established via a grant reference. - * If it was then we should not be here: we require that such mappings are - * explicitly destroyed via the grant-table interface. - * - * The upshot of this is that the guest can end up with active grants that - * it cannot destroy (because it no longer has a PTE to present to the - * grant-table interface). This can lead to subtle hard-to-catch bugs, - * hence a special grant PTE flag can be enabled to catch the bug early. - * - * (Note that the undestroyable active grants are not a security hole in - * Xen. All active grants can safely be cleaned up when the domain dies.) - */ - if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) && - !l1e_owner->is_shutting_down && !l1e_owner->is_dying ) - { - gdprintk(XENLOG_WARNING, - "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", - l1e_get_intpte(l1e)); - domain_crash(l1e_owner); - } - - /* Remember we didn't take a type-count of foreign writable mappings - * to paging-external domains */ - if ( (l1e_get_flags(l1e) & _PAGE_RW) && - ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) ) - { - put_page_and_type(page); - } - else - { - /* We expect this is rare so we blow the entire shadow LDT. */ - if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) == - PGT_seg_desc_page)) && - unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) && - (l1e_owner == pg_owner) ) - { - for_each_vcpu ( pg_owner, v ) - invalidate_shadow_ldt(v, 1); - } - put_page(page); - } + return 0; } - /* - * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. - * Note also that this automatically deals correctly with linear p.t.'s. + * Special version of get_page() to be used exclusively when + * - a page is known to already have a non-zero reference count + * - the page does not need its owner to be checked + * - it will not be called more than once without dropping the thus + * acquired reference again. + * Due to get_page() reserving one reference, this call cannot fail. */ -static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn) -{ - if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) ) - return 1; - - if ( l2e_get_flags(l2e) & _PAGE_PSE ) - put_superpage(l2e_get_pfn(l2e)); - else - put_page_and_type(l2e_get_page(l2e)); - - return 0; -} - -static int __put_page_type(struct page_info *, int preemptible); - -static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, - int partial, bool_t defer) -{ - struct page_info *pg; - - if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) ) - return 1; - - if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) ) - { - unsigned long mfn = l3e_get_pfn(l3e); - int writeable = l3e_get_flags(l3e) & _PAGE_RW; - - ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1))); - do { - put_data_page(mfn_to_page(mfn), writeable); - } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) ); - - return 0; - } - - pg = l3e_get_page(l3e); - - if ( unlikely(partial > 0) ) - { - ASSERT(!defer); - return __put_page_type(pg, 1); - } - - if ( defer ) - { - current->arch.old_guest_table = pg; - return 0; - } - - return put_page_and_type_preemptible(pg); -} - -static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, - int partial, bool_t defer) -{ - if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && - (l4e_get_pfn(l4e) != pfn) ) - { - struct page_info *pg = l4e_get_page(l4e); - - if ( unlikely(partial > 0) ) - { - ASSERT(!defer); - return __put_page_type(pg, 1); - } - - if ( defer ) - { - current->arch.old_guest_table = pg; - return 0; - } - - return put_page_and_type_preemptible(pg); - } - return 1; -} - -static int alloc_l1_table(struct page_info *page) +void get_page_light(struct page_info *page) { - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l1_pgentry_t *pl1e; - unsigned int i; - int ret = 0; - - pl1e = map_domain_page(_mfn(pfn)); - - for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) - { - if ( is_guest_l1_slot(i) ) - switch ( ret = get_page_from_l1e(pl1e[i], d, d) ) - { - default: - goto fail; - case 0: - break; - case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: - ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); - l1e_flip_flags(pl1e[i], ret); - break; - } + unsigned long x, nx, y = page->count_info; - adjust_guest_l1e(pl1e[i], d); + do { + x = y; + nx = x + 1; + BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */ + BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */ + y = cmpxchg(&page->count_info, x, nx); } - - unmap_domain_page(pl1e); - return 0; - - fail: - gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i); - while ( i-- > 0 ) - if ( is_guest_l1_slot(i) ) - put_page_from_l1e(pl1e[i], d); - - unmap_domain_page(pl1e); - return ret; + while ( unlikely(y != x) ); } -static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e) +static int __put_final_page_type( + struct page_info *page, unsigned long type, int preemptible) { - struct page_info *page; - l3_pgentry_t l3e3; - - if ( !is_pv_32bit_domain(d) ) - return 1; - - pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK); - - /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */ - l3e3 = pl3e[3]; - if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) ) - { - gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n"); - return 0; - } + int rc = free_page_type(page, type, preemptible); - /* - * The Xen-private mappings include linear mappings. The L2 thus cannot - * be shared by multiple L3 tables. The test here is adequate because: - * 1. Cannot appear in slots != 3 because get_page_type() checks the - * PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3 - * 2. Cannot appear in another page table's L3: - * a. alloc_l3_table() calls this function and this check will fail - * b. mod_l3_entry() disallows updates to slot 3 in an existing table - */ - page = l3e_get_page(l3e3); - BUG_ON(page->u.inuse.type_info & PGT_pinned); - BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0); - BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2)); - if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) + /* No need for atomic update of type_info here: noone else updates it. */ + if ( rc == 0 ) { - gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n"); - return 0; + /* + * Record TLB information for flush later. We do not stamp page tables + * when running in shadow mode: + * 1. Pointless, since it's the shadow pt's which must be tracked. + * 2. Shadow mode reuses this field for shadowed page tables to + * store flags info -- we don't want to conflict with that. + */ + if ( !(shadow_mode_enabled(page_get_owner(page)) && + (page->count_info & PGC_page_table)) ) + page->tlbflush_timestamp = tlbflush_current_time(); + wmb(); + page->u.inuse.type_info--; } - - return 1; -} - -static int alloc_l2_table(struct page_info *page, unsigned long type, - int preemptible) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l2_pgentry_t *pl2e; - unsigned int i; - int rc = 0; - - pl2e = map_domain_page(_mfn(pfn)); - - for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ ) + else if ( rc == -EINTR ) { - if ( preemptible && i > page->nr_validated_ptes - && hypercall_preempt_check() ) - { - page->nr_validated_ptes = i; - rc = -ERESTART; - break; - } - - if ( !is_guest_l2_slot(d, type, i) || - (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 ) - continue; - - if ( rc < 0 ) - { - gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i); - while ( i-- > 0 ) - if ( is_guest_l2_slot(d, type, i) ) - put_page_from_l2e(pl2e[i], pfn); - break; - } - - adjust_guest_l2e(pl2e[i], d); + ASSERT((page->u.inuse.type_info & + (PGT_count_mask|PGT_validated|PGT_partial)) == 1); + if ( !(shadow_mode_enabled(page_get_owner(page)) && + (page->count_info & PGC_page_table)) ) + page->tlbflush_timestamp = tlbflush_current_time(); + wmb(); + page->u.inuse.type_info |= PGT_validated; } - - if ( rc >= 0 && (type & PGT_pae_xen_l2) ) + else { - /* Xen private mappings. */ - memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)], - &compat_idle_pg_table_l2[ - l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)], - COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e)); + BUG_ON(rc != -ERESTART); + wmb(); + get_page_light(page); + page->u.inuse.type_info |= PGT_partial; } - unmap_domain_page(pl2e); - return rc > 0 ? 0 : rc; + return rc; } -static int alloc_l3_table(struct page_info *page) +int __put_page_type(struct page_info *page, + int preemptible) { - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l3_pgentry_t *pl3e; - unsigned int i; - int rc = 0, partial = page->partial_pte; - - pl3e = map_domain_page(_mfn(pfn)); - - /* - * PAE guests allocate full pages, but aren't required to initialize - * more than the first four entries; when running in compatibility - * mode, however, the full page is visible to the MMU, and hence all - * 512 entries must be valid/verified, which is most easily achieved - * by clearing them out. - */ - if ( is_pv_32bit_domain(d) ) - memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e)); + unsigned long nx, x, y = page->u.inuse.type_info; + int rc = 0; - for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES; - i++, partial = 0 ) + for ( ; ; ) { - if ( is_pv_32bit_domain(d) && (i == 3) ) - { - if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) || - (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) ) - rc = -EINVAL; - else - rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]), - PGT_l2_page_table | - PGT_pae_xen_l2, - d, partial, 1); - } - else if ( !is_guest_l3_slot(i) || - (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 ) - continue; - - if ( rc == -ERESTART ) - { - page->nr_validated_ptes = i; - page->partial_pte = partial ?: 1; - } - else if ( rc == -EINTR && i ) - { - page->nr_validated_ptes = i; - page->partial_pte = 0; - rc = -ERESTART; - } - if ( rc < 0 ) - break; + x = y; + nx = x - 1; - adjust_guest_l3e(pl3e[i], d); - } + ASSERT((x & PGT_count_mask) != 0); - if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) ) - rc = -EINVAL; - if ( rc < 0 && rc != -ERESTART && rc != -EINTR ) - { - gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i); - if ( i ) - { - page->nr_validated_ptes = i; - page->partial_pte = 0; - current->arch.old_guest_table = page; - } - while ( i-- > 0 ) + if ( unlikely((nx & PGT_count_mask) == 0) ) { - if ( !is_guest_l3_slot(i) ) - continue; - unadjust_guest_l3e(pl3e[i], d); - } - } - - unmap_domain_page(pl3e); - return rc > 0 ? 0 : rc; -} - -void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d, - bool_t zap_ro_mpt) -{ - /* Xen private mappings. */ - memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT], - &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT], - root_pgt_pv_xen_slots * sizeof(l4_pgentry_t)); -#ifndef NDEBUG - if ( l4e_get_intpte(split_l4e) ) - l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] = - split_l4e; -#endif - l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] = - l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR); - l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] = - l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR); - if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) ) - l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); -} - -bool_t fill_ro_mpt(unsigned long mfn) -{ - l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn)); - bool_t ret = 0; - - if ( !l4e_get_intpte(l4tab[l4_table_offset(RO_MPT_VIRT_START)]) ) - { - l4tab[l4_table_offset(RO_MPT_VIRT_START)] = - idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]; - ret = 1; - } - unmap_domain_page(l4tab); - - return ret; -} - -void zap_ro_mpt(unsigned long mfn) -{ - l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn)); - - l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); - unmap_domain_page(l4tab); -} - -static int alloc_l4_table(struct page_info *page) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn)); - unsigned int i; - int rc = 0, partial = page->partial_pte; - - for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES; - i++, partial = 0 ) - { - if ( !is_guest_l4_slot(d, i) || - (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 ) - continue; - - if ( rc == -ERESTART ) - { - page->nr_validated_ptes = i; - page->partial_pte = partial ?: 1; - } - else if ( rc < 0 ) - { - if ( rc != -EINTR ) - gdprintk(XENLOG_WARNING, - "Failure in alloc_l4_table: slot %#x\n", i); - if ( i ) - { - page->nr_validated_ptes = i; - page->partial_pte = 0; - if ( rc == -EINTR ) - rc = -ERESTART; - else - { - if ( current->arch.old_guest_table ) - page->nr_validated_ptes++; - current->arch.old_guest_table = page; - } - } - } - if ( rc < 0 ) - { - unmap_domain_page(pl4e); - return rc; - } - - adjust_guest_l4e(pl4e[i], d); - } - - if ( rc >= 0 ) - { - init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict)); - atomic_inc(&d->arch.pv_domain.nr_l4_pages); - rc = 0; - } - unmap_domain_page(pl4e); - - return rc; -} - -static void free_l1_table(struct page_info *page) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l1_pgentry_t *pl1e; - unsigned int i; - - pl1e = map_domain_page(_mfn(pfn)); - - for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) - if ( is_guest_l1_slot(i) ) - put_page_from_l1e(pl1e[i], d); - - unmap_domain_page(pl1e); -} - - -static int free_l2_table(struct page_info *page, int preemptible) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l2_pgentry_t *pl2e; - unsigned int i = page->nr_validated_ptes - 1; - int err = 0; - - pl2e = map_domain_page(_mfn(pfn)); - - ASSERT(page->nr_validated_ptes); - do { - if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) && - put_page_from_l2e(pl2e[i], pfn) == 0 && - preemptible && i && hypercall_preempt_check() ) - { - page->nr_validated_ptes = i; - err = -ERESTART; - } - } while ( !err && i-- ); - - unmap_domain_page(pl2e); - - if ( !err ) - page->u.inuse.type_info &= ~PGT_pae_xen_l2; - - return err; -} - -static int free_l3_table(struct page_info *page) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l3_pgentry_t *pl3e; - int rc = 0, partial = page->partial_pte; - unsigned int i = page->nr_validated_ptes - !partial; - - pl3e = map_domain_page(_mfn(pfn)); - - do { - if ( is_guest_l3_slot(i) ) - { - rc = put_page_from_l3e(pl3e[i], pfn, partial, 0); - if ( rc < 0 ) - break; - partial = 0; - if ( rc > 0 ) - continue; - unadjust_guest_l3e(pl3e[i], d); - } - } while ( i-- ); - - unmap_domain_page(pl3e); - - if ( rc == -ERESTART ) - { - page->nr_validated_ptes = i; - page->partial_pte = partial ?: -1; - } - else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 ) - { - page->nr_validated_ptes = i + 1; - page->partial_pte = 0; - rc = -ERESTART; - } - return rc > 0 ? 0 : rc; -} - -static int free_l4_table(struct page_info *page) -{ - struct domain *d = page_get_owner(page); - unsigned long pfn = page_to_mfn(page); - l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn)); - int rc = 0, partial = page->partial_pte; - unsigned int i = page->nr_validated_ptes - !partial; - - do { - if ( is_guest_l4_slot(d, i) ) - rc = put_page_from_l4e(pl4e[i], pfn, partial, 0); - if ( rc < 0 ) - break; - partial = 0; - } while ( i-- ); - - if ( rc == -ERESTART ) - { - page->nr_validated_ptes = i; - page->partial_pte = partial ?: -1; - } - else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 ) - { - page->nr_validated_ptes = i + 1; - page->partial_pte = 0; - rc = -ERESTART; - } - - unmap_domain_page(pl4e); - - if ( rc >= 0 ) - { - atomic_dec(&d->arch.pv_domain.nr_l4_pages); - rc = 0; - } - - return rc; -} - -int page_lock(struct page_info *page) -{ - unsigned long x, nx; - - do { - while ( (x = page->u.inuse.type_info) & PGT_locked ) - cpu_relax(); - nx = x + (1 | PGT_locked); - if ( !(x & PGT_validated) || - !(x & PGT_count_mask) || - !(nx & PGT_count_mask) ) - return 0; - } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x ); - - return 1; -} - -void page_unlock(struct page_info *page) -{ - unsigned long x, nx, y = page->u.inuse.type_info; - - do { - x = y; - nx = x - (1 | PGT_locked); - } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x ); -} - -/* How to write an entry to the guest pagetables. - * Returns 0 for failure (pointer not valid), 1 for success. */ -static inline int update_intpte(intpte_t *p, - intpte_t old, - intpte_t new, - unsigned long mfn, - struct vcpu *v, - int preserve_ad) -{ - int rv = 1; -#ifndef PTE_UPDATE_WITH_CMPXCHG - if ( !preserve_ad ) - { - rv = paging_write_guest_entry(v, p, new, _mfn(mfn)); - } - else -#endif - { - intpte_t t = old; - for ( ; ; ) - { - intpte_t _new = new; - if ( preserve_ad ) - _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY); - - rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn)); - if ( unlikely(rv == 0) ) - { - gdprintk(XENLOG_WARNING, - "Failed to update %" PRIpte " -> %" PRIpte - ": saw %" PRIpte "\n", old, _new, t); - break; - } - - if ( t == old ) - break; - - /* Allowed to change in Accessed/Dirty flags only. */ - BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); - - old = t; - } - } - return rv; -} - -/* Macro that wraps the appropriate type-changes around update_intpte(). - * Arguments are: type, ptr, old, new, mfn, vcpu */ -#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \ - update_intpte(&_t ## e_get_intpte(*(_p)), \ - _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \ - (_m), (_v), (_ad)) - -/* - * PTE flags that a guest may change without re-validating the PTE. - * All other bits affect translation, caching, or Xen's safety. - */ -#define FASTPATH_FLAG_WHITELIST \ - (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \ - _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER) - -/* Update the L1 entry at pl1e to new value nl1e. */ -static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, - unsigned long gl1mfn, int preserve_ad, - struct vcpu *pt_vcpu, struct domain *pg_dom) -{ - l1_pgentry_t ol1e; - struct domain *pt_dom = pt_vcpu->domain; - int rc = 0; - - if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) - return -EFAULT; - - if ( unlikely(paging_mode_refcounts(pt_dom)) ) - { - if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) - return 0; - return -EBUSY; - } - - if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) - { - /* Translate foreign guest addresses. */ - struct page_info *page = NULL; - - if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) ) - { - gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n", - l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); - return -EINVAL; - } - - if ( paging_mode_translate(pg_dom) ) - { - page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC); - if ( !page ) - return -EINVAL; - nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e)); - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l1e(nl1e, pt_dom); - rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad); - if ( page ) - put_page(page); - return rc ? 0 : -EBUSY; - } - - switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) - { - default: - if ( page ) - put_page(page); - return rc; - case 0: - break; - case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: - ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); - l1e_flip_flags(nl1e, rc); - rc = 0; - break; - } - if ( page ) - put_page(page); - - adjust_guest_l1e(nl1e, pt_dom); - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad)) ) - { - ol1e = nl1e; - rc = -EBUSY; - } - } - else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, - preserve_ad)) ) - { - return -EBUSY; - } - - put_page_from_l1e(ol1e, pt_dom); - return rc; -} - - -/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */ -static int mod_l2_entry(l2_pgentry_t *pl2e, - l2_pgentry_t nl2e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - l2_pgentry_t ol2e; - struct domain *d = vcpu->domain; - struct page_info *l2pg = mfn_to_page(pfn); - unsigned long type = l2pg->u.inuse.type_info; - int rc = 0; - - if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) ) - { - gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n", - pgentry_ptr_to_slot(pl2e)); - return -EPERM; - } - - if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) ) - return -EFAULT; - - if ( l2e_get_flags(nl2e) & _PAGE_PRESENT ) - { - if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) ) - { - gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n", - l2e_get_flags(nl2e) & L2_DISALLOW_MASK); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l2e(nl2e, d); - if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) ) - return 0; - return -EBUSY; - } - - if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) ) - return rc; - - adjust_guest_l2e(nl2e, d); - if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, - preserve_ad)) ) - { - ol2e = nl2e; - rc = -EBUSY; - } - } - else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, - preserve_ad)) ) - { - return -EBUSY; - } - - put_page_from_l2e(ol2e, pfn); - return rc; -} - -/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */ -static int mod_l3_entry(l3_pgentry_t *pl3e, - l3_pgentry_t nl3e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - l3_pgentry_t ol3e; - struct domain *d = vcpu->domain; - int rc = 0; - - if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) ) - { - gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n", - pgentry_ptr_to_slot(pl3e)); - return -EINVAL; - } - - /* - * Disallow updates to final L3 slot. It contains Xen mappings, and it - * would be a pain to ensure they remain continuously valid throughout. - */ - if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) - return -EINVAL; - - if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) ) - return -EFAULT; - - if ( l3e_get_flags(nl3e) & _PAGE_PRESENT ) - { - if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) ) - { - gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n", - l3e_get_flags(nl3e) & l3_disallow_mask(d)); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l3e(nl3e, d); - rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad); - return rc ? 0 : -EFAULT; - } - - rc = get_page_from_l3e(nl3e, pfn, d, 0); - if ( unlikely(rc < 0) ) - return rc; - rc = 0; - - adjust_guest_l3e(nl3e, d); - if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, - preserve_ad)) ) - { - ol3e = nl3e; - rc = -EFAULT; - } - } - else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, - preserve_ad)) ) - { - return -EFAULT; - } - - if ( likely(rc == 0) ) - if ( !create_pae_xen_mappings(d, pl3e) ) - BUG(); - - put_page_from_l3e(ol3e, pfn, 0, 1); - return rc; -} - -/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */ -static int mod_l4_entry(l4_pgentry_t *pl4e, - l4_pgentry_t nl4e, - unsigned long pfn, - int preserve_ad, - struct vcpu *vcpu) -{ - struct domain *d = vcpu->domain; - l4_pgentry_t ol4e; - int rc = 0; - - if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) ) - { - gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n", - pgentry_ptr_to_slot(pl4e)); - return -EINVAL; - } - - if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) ) - return -EFAULT; - - if ( l4e_get_flags(nl4e) & _PAGE_PRESENT ) - { - if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) ) - { - gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n", - l4e_get_flags(nl4e) & L4_DISALLOW_MASK); - return -EINVAL; - } - - /* Fast path for sufficiently-similar mappings. */ - if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) - { - adjust_guest_l4e(nl4e, d); - rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad); - return rc ? 0 : -EFAULT; - } - - rc = get_page_from_l4e(nl4e, pfn, d, 0); - if ( unlikely(rc < 0) ) - return rc; - rc = 0; - - adjust_guest_l4e(nl4e, d); - if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, - preserve_ad)) ) - { - ol4e = nl4e; - rc = -EFAULT; - } - } - else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, - preserve_ad)) ) - { - return -EFAULT; - } - - put_page_from_l4e(ol4e, pfn, 0, 1); - return rc; -} - -static int cleanup_page_cacheattr(struct page_info *page) -{ - unsigned int cacheattr = - (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base; - - if ( likely(cacheattr == 0) ) - return 0; - - page->count_info &= ~PGC_cacheattr_mask; - - BUG_ON(is_xen_heap_page(page)); - - return update_xen_mappings(page_to_mfn(page), 0); -} - -void put_page(struct page_info *page) -{ - unsigned long nx, x, y = page->count_info; - - do { - ASSERT((y & PGC_count_mask) != 0); - x = y; - nx = x - 1; - } - while ( unlikely((y = cmpxchg(&page->count_info, x, nx)) != x) ); - - if ( unlikely((nx & PGC_count_mask) == 0) ) - { - if ( cleanup_page_cacheattr(page) == 0 ) - free_domheap_page(page); - else - gdprintk(XENLOG_WARNING, - "Leaking mfn %" PRI_pfn "\n", page_to_mfn(page)); - } -} - - -struct domain *page_get_owner_and_reference(struct page_info *page) -{ - unsigned long x, y = page->count_info; - struct domain *owner; - - do { - x = y; - /* - * Count == 0: Page is not allocated, so we cannot take a reference. - * Count == -1: Reference count would wrap, which is invalid. - * Count == -2: Remaining unused ref is reserved for get_page_light(). - */ - if ( unlikely(((x + 2) & PGC_count_mask) <= 2) ) - return NULL; - } - while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x ); - - owner = page_get_owner(page); - ASSERT(owner); - - return owner; -} - - -int get_page(struct page_info *page, struct domain *domain) -{ - struct domain *owner = page_get_owner_and_reference(page); - - if ( likely(owner == domain) ) - return 1; - - if ( !paging_mode_refcounts(domain) && !domain->is_dying ) - gprintk(XENLOG_INFO, - "Error pfn %lx: rd=%d od=%d caf=%08lx taf=%" PRtype_info "\n", - page_to_mfn(page), domain->domain_id, - owner ? owner->domain_id : DOMID_INVALID, - page->count_info - !!owner, page->u.inuse.type_info); - - if ( owner ) - put_page(page); - - return 0; -} - -/* - * Special version of get_page() to be used exclusively when - * - a page is known to already have a non-zero reference count - * - the page does not need its owner to be checked - * - it will not be called more than once without dropping the thus - * acquired reference again. - * Due to get_page() reserving one reference, this call cannot fail. - */ -static void get_page_light(struct page_info *page) -{ - unsigned long x, nx, y = page->count_info; - - do { - x = y; - nx = x + 1; - BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */ - BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */ - y = cmpxchg(&page->count_info, x, nx); - } - while ( unlikely(y != x) ); -} - -static int alloc_page_type(struct page_info *page, unsigned long type, - int preemptible) -{ - struct domain *owner = page_get_owner(page); - int rc; - - /* A page table is dirtied when its type count becomes non-zero. */ - if ( likely(owner != NULL) ) - paging_mark_dirty(owner, _mfn(page_to_mfn(page))); - - switch ( type & PGT_type_mask ) - { - case PGT_l1_page_table: - rc = alloc_l1_table(page); - break; - case PGT_l2_page_table: - rc = alloc_l2_table(page, type, preemptible); - break; - case PGT_l3_page_table: - ASSERT(preemptible); - rc = alloc_l3_table(page); - break; - case PGT_l4_page_table: - ASSERT(preemptible); - rc = alloc_l4_table(page); - break; - case PGT_seg_desc_page: - rc = alloc_segdesc_page(page); - break; - default: - printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n", - type, page->u.inuse.type_info, - page->count_info); - rc = -EINVAL; - BUG(); - } - - /* No need for atomic update of type_info here: noone else updates it. */ - wmb(); - switch ( rc ) - { - case 0: - page->u.inuse.type_info |= PGT_validated; - break; - case -EINTR: - ASSERT((page->u.inuse.type_info & - (PGT_count_mask|PGT_validated|PGT_partial)) == 1); - page->u.inuse.type_info &= ~PGT_count_mask; - break; - default: - ASSERT(rc < 0); - gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn - " (pfn %" PRI_pfn ") for type %" PRtype_info - ": caf=%08lx taf=%" PRtype_info "\n", - page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)), - type, page->count_info, page->u.inuse.type_info); - if ( page != current->arch.old_guest_table ) - page->u.inuse.type_info = 0; - else - { - ASSERT((page->u.inuse.type_info & - (PGT_count_mask | PGT_validated)) == 1); - case -ERESTART: - get_page_light(page); - page->u.inuse.type_info |= PGT_partial; - } - break; - } - - return rc; -} - - -int free_page_type(struct page_info *page, unsigned long type, - int preemptible) -{ - struct domain *owner = page_get_owner(page); - unsigned long gmfn; - int rc; - - if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) ) - { - /* A page table is dirtied when its type count becomes zero. */ - paging_mark_dirty(owner, _mfn(page_to_mfn(page))); - - if ( shadow_mode_refcounts(owner) ) - return 0; - - gmfn = mfn_to_gmfn(owner, page_to_mfn(page)); - ASSERT(VALID_M2P(gmfn)); - /* Page sharing not supported for shadowed domains */ - if(!SHARED_M2P(gmfn)) - shadow_remove_all_shadows(owner, _mfn(gmfn)); - } - - if ( !(type & PGT_partial) ) - { - page->nr_validated_ptes = 1U << PAGETABLE_ORDER; - page->partial_pte = 0; - } - - switch ( type & PGT_type_mask ) - { - case PGT_l1_page_table: - free_l1_table(page); - rc = 0; - break; - case PGT_l2_page_table: - rc = free_l2_table(page, preemptible); - break; - case PGT_l3_page_table: - ASSERT(preemptible); - rc = free_l3_table(page); - break; - case PGT_l4_page_table: - ASSERT(preemptible); - rc = free_l4_table(page); - break; - default: - gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n", - type, page_to_mfn(page)); - rc = -EINVAL; - BUG(); - } - - return rc; -} - - -static int __put_final_page_type( - struct page_info *page, unsigned long type, int preemptible) -{ - int rc = free_page_type(page, type, preemptible); - - /* No need for atomic update of type_info here: noone else updates it. */ - if ( rc == 0 ) - { - /* - * Record TLB information for flush later. We do not stamp page tables - * when running in shadow mode: - * 1. Pointless, since it's the shadow pt's which must be tracked. - * 2. Shadow mode reuses this field for shadowed page tables to - * store flags info -- we don't want to conflict with that. - */ - if ( !(shadow_mode_enabled(page_get_owner(page)) && - (page->count_info & PGC_page_table)) ) - page->tlbflush_timestamp = tlbflush_current_time(); - wmb(); - page->u.inuse.type_info--; - } - else if ( rc == -EINTR ) - { - ASSERT((page->u.inuse.type_info & - (PGT_count_mask|PGT_validated|PGT_partial)) == 1); - if ( !(shadow_mode_enabled(page_get_owner(page)) && - (page->count_info & PGC_page_table)) ) - page->tlbflush_timestamp = tlbflush_current_time(); - wmb(); - page->u.inuse.type_info |= PGT_validated; - } - else - { - BUG_ON(rc != -ERESTART); - wmb(); - get_page_light(page); - page->u.inuse.type_info |= PGT_partial; - } - - return rc; -} - - -static int __put_page_type(struct page_info *page, - int preemptible) -{ - unsigned long nx, x, y = page->u.inuse.type_info; - int rc = 0; - - for ( ; ; ) - { - x = y; - nx = x - 1; - - ASSERT((x & PGT_count_mask) != 0); - - if ( unlikely((nx & PGT_count_mask) == 0) ) - { - if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) && - likely(nx & (PGT_validated|PGT_partial)) ) - { - /* - * Page-table pages must be unvalidated when count is zero. The - * 'free' is safe because the refcnt is non-zero and validated - * bit is clear => other ops will spin or fail. - */ - nx = x & ~(PGT_validated|PGT_partial); - if ( unlikely((y = cmpxchg(&page->u.inuse.type_info, - x, nx)) != x) ) - continue; - /* We cleared the 'valid bit' so we do the clean up. */ - rc = __put_final_page_type(page, x, preemptible); - if ( x & PGT_partial ) - put_page(page); - break; - } - - /* - * Record TLB information for flush later. We do not stamp page - * tables when running in shadow mode: - * 1. Pointless, since it's the shadow pt's which must be tracked. - * 2. Shadow mode reuses this field for shadowed page tables to - * store flags info -- we don't want to conflict with that. - */ - if ( !(shadow_mode_enabled(page_get_owner(page)) && - (page->count_info & PGC_page_table)) ) - page->tlbflush_timestamp = tlbflush_current_time(); - } - - if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) ) - break; - - if ( preemptible && hypercall_preempt_check() ) - return -EINTR; - } - - return rc; -} - - -static int __get_page_type(struct page_info *page, unsigned long type, - int preemptible) -{ - unsigned long nx, x, y = page->u.inuse.type_info; - int rc = 0, iommu_ret = 0; - - ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); - ASSERT(!in_irq()); - - for ( ; ; ) - { - x = y; - nx = x + 1; - if ( unlikely((nx & PGT_count_mask) == 0) ) - { - gdprintk(XENLOG_WARNING, - "Type count overflow on mfn %"PRI_mfn"\n", - page_to_mfn(page)); - return -EINVAL; - } - else if ( unlikely((x & PGT_count_mask) == 0) ) - { - struct domain *d = page_get_owner(page); - - /* Normally we should never let a page go from type count 0 - * to type count 1 when it is shadowed. One exception: - * out-of-sync shadowed pages are allowed to become - * writeable. */ - if ( d && shadow_mode_enabled(d) - && (page->count_info & PGC_page_table) - && !((page->shadow_flags & (1u<<29)) - && type == PGT_writable_page) ) - shadow_remove_all_shadows(d, _mfn(page_to_mfn(page))); - - ASSERT(!(x & PGT_pae_xen_l2)); - if ( (x & PGT_type_mask) != type ) - { - /* - * On type change we check to flush stale TLB entries. This - * may be unnecessary (e.g., page was GDT/LDT) but those - * circumstances should be very rare. - */ - cpumask_t *mask = this_cpu(scratch_cpumask); - - BUG_ON(in_irq()); - cpumask_copy(mask, d->domain_dirty_cpumask); - - /* Don't flush if the timestamp is old enough */ - tlbflush_filter(mask, page->tlbflush_timestamp); - - if ( unlikely(!cpumask_empty(mask)) && - /* Shadow mode: track only writable pages. */ - (!shadow_mode_enabled(page_get_owner(page)) || - ((nx & PGT_type_mask) == PGT_writable_page)) ) - { - perfc_incr(need_flush_tlb_flush); - flush_tlb_mask(mask); - } - - /* We lose existing type and validity. */ - nx &= ~(PGT_type_mask | PGT_validated); - nx |= type; - - /* No special validation needed for writable pages. */ - /* Page tables and GDT/LDT need to be scanned for validity. */ - if ( type == PGT_writable_page || type == PGT_shared_page ) - nx |= PGT_validated; - } - } - else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) ) - { - /* Don't log failure if it could be a recursive-mapping attempt. */ - if ( ((x & PGT_type_mask) == PGT_l2_page_table) && - (type == PGT_l1_page_table) ) - return -EINVAL; - if ( ((x & PGT_type_mask) == PGT_l3_page_table) && - (type == PGT_l2_page_table) ) - return -EINVAL; - if ( ((x & PGT_type_mask) == PGT_l4_page_table) && - (type == PGT_l3_page_table) ) - return -EINVAL; - gdprintk(XENLOG_WARNING, - "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") " - "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n", - x, type, page_to_mfn(page), - get_gpfn_from_mfn(page_to_mfn(page))); - return -EINVAL; - } - else if ( unlikely(!(x & PGT_validated)) ) - { - if ( !(x & PGT_partial) ) - { - /* Someone else is updating validation of this page. Wait... */ - while ( (y = page->u.inuse.type_info) == x ) - { - if ( preemptible && hypercall_preempt_check() ) - return -EINTR; - cpu_relax(); - } - continue; - } - /* Type ref count was left at 1 when PGT_partial got set. */ - ASSERT((x & PGT_count_mask) == 1); - nx = x & ~PGT_partial; - } - - if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) ) - break; - - if ( preemptible && hypercall_preempt_check() ) - return -EINTR; - } - - if ( unlikely((x & PGT_type_mask) != type) ) - { - /* Special pages should not be accessible from devices. */ - struct domain *d = page_get_owner(page); - if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) - { - if ( (x & PGT_type_mask) == PGT_writable_page ) - iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); - else if ( type == PGT_writable_page ) - iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), - page_to_mfn(page), - IOMMUF_readable|IOMMUF_writable); - } - } - - if ( unlikely(!(nx & PGT_validated)) ) - { - if ( !(x & PGT_partial) ) - { - page->nr_validated_ptes = 0; - page->partial_pte = 0; - } - rc = alloc_page_type(page, type, preemptible); - } - - if ( (x & PGT_partial) && !(nx & PGT_partial) ) - put_page(page); - - if ( !rc ) - rc = iommu_ret; - - return rc; -} - -void put_page_type(struct page_info *page) -{ - int rc = __put_page_type(page, 0); - ASSERT(rc == 0); - (void)rc; -} - -int get_page_type(struct page_info *page, unsigned long type) -{ - int rc = __get_page_type(page, type, 0); - if ( likely(rc == 0) ) - return 1; - ASSERT(rc != -EINTR && rc != -ERESTART); - return 0; -} - -int put_page_type_preemptible(struct page_info *page) -{ - return __put_page_type(page, 1); -} - -int get_page_type_preemptible(struct page_info *page, unsigned long type) -{ - ASSERT(!current->arch.old_guest_table); - return __get_page_type(page, type, 1); -} - -static int get_spage_pages(struct page_info *page, struct domain *d) -{ - int i; - - for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++) - { - if (!get_page_and_type(page, d, PGT_writable_page)) - { - while (--i >= 0) - put_page_and_type(--page); - return 0; - } - } - return 1; -} - -static void put_spage_pages(struct page_info *page) -{ - int i; - - for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++) - { - put_page_and_type(page); - } - return; -} - -static int mark_superpage(struct spage_info *spage, struct domain *d) -{ - unsigned long x, nx, y = spage->type_info; - int pages_done = 0; - - ASSERT(opt_allow_superpage); - - do { - x = y; - nx = x + 1; - if ( (x & SGT_type_mask) == SGT_mark ) - { - gdprintk(XENLOG_WARNING, - "Duplicate superpage mark attempt mfn %" PRI_mfn "\n", - spage_to_mfn(spage)); - if ( pages_done ) - put_spage_pages(spage_to_page(spage)); - return -EINVAL; - } - if ( (x & SGT_type_mask) == SGT_dynamic ) - { - if ( pages_done ) - { - put_spage_pages(spage_to_page(spage)); - pages_done = 0; - } - } - else if ( !pages_done ) - { - if ( !get_spage_pages(spage_to_page(spage), d) ) - { - gdprintk(XENLOG_WARNING, - "Superpage type conflict in mark attempt mfn %" PRI_mfn "\n", - spage_to_mfn(spage)); - return -EINVAL; - } - pages_done = 1; - } - nx = (nx & ~SGT_type_mask) | SGT_mark; - - } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); - - return 0; -} - -static int unmark_superpage(struct spage_info *spage) -{ - unsigned long x, nx, y = spage->type_info; - unsigned long do_pages = 0; - - ASSERT(opt_allow_superpage); - - do { - x = y; - nx = x - 1; - if ( (x & SGT_type_mask) != SGT_mark ) - { - gdprintk(XENLOG_WARNING, - "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n", - spage_to_mfn(spage)); - return -EINVAL; - } - if ( (nx & SGT_count_mask) == 0 ) - { - nx = (nx & ~SGT_type_mask) | SGT_none; - do_pages = 1; - } - else - { - nx = (nx & ~SGT_type_mask) | SGT_dynamic; - } - } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); - - if ( do_pages ) - put_spage_pages(spage_to_page(spage)); - - return 0; -} - -void clear_superpage_mark(struct page_info *page) -{ - struct spage_info *spage; - - if ( !opt_allow_superpage ) - return; - - spage = page_to_spage(page); - if ((spage->type_info & SGT_type_mask) == SGT_mark) - unmark_superpage(spage); - -} - -int get_superpage(unsigned long mfn, struct domain *d) -{ - struct spage_info *spage; - unsigned long x, nx, y; - int pages_done = 0; - - ASSERT(opt_allow_superpage); - - if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) ) - return -EINVAL; - - spage = mfn_to_spage(mfn); - y = spage->type_info; - do { - x = y; - nx = x + 1; - if ( (x & SGT_type_mask) != SGT_none ) - { - if ( pages_done ) - { - put_spage_pages(spage_to_page(spage)); - pages_done = 0; - } - } - else - { - if ( !get_spage_pages(spage_to_page(spage), d) ) - { - gdprintk(XENLOG_WARNING, - "Type conflict on superpage mapping mfn %" PRI_mfn "\n", - spage_to_mfn(spage)); - return -EINVAL; - } - pages_done = 1; - nx = (nx & ~SGT_type_mask) | SGT_dynamic; - } - } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); - - return 0; -} - -static void put_superpage(unsigned long mfn) -{ - struct spage_info *spage; - unsigned long x, nx, y; - unsigned long do_pages = 0; - - if ( !opt_allow_superpage ) - { - put_spage_pages(mfn_to_page(mfn)); - return; - } - - spage = mfn_to_spage(mfn); - y = spage->type_info; - do { - x = y; - nx = x - 1; - if ((x & SGT_type_mask) == SGT_dynamic) - { - if ((nx & SGT_count_mask) == 0) - { - nx = (nx & ~SGT_type_mask) | SGT_none; - do_pages = 1; - } - } - - } while ((y = cmpxchg(&spage->type_info, x, nx)) != x); - - if (do_pages) - put_spage_pages(spage_to_page(spage)); - - return; -} - -int put_old_guest_table(struct vcpu *v) -{ - int rc; - - if ( !v->arch.old_guest_table ) - return 0; - - switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) ) - { - case -EINTR: - case -ERESTART: - return -ERESTART; - } - - v->arch.old_guest_table = NULL; - - return rc; -} - -int vcpu_destroy_pagetables(struct vcpu *v) -{ - unsigned long mfn = pagetable_get_pfn(v->arch.guest_table); - struct page_info *page; - l4_pgentry_t *l4tab = NULL; - int rc = put_old_guest_table(v); - - if ( rc ) - return rc; - - if ( is_pv_32bit_vcpu(v) ) - { - l4tab = map_domain_page(_mfn(mfn)); - mfn = l4e_get_pfn(*l4tab); - } - - if ( mfn ) - { - page = mfn_to_page(mfn); - if ( paging_mode_refcounts(v->domain) ) - put_page(page); - else - rc = put_page_and_type_preemptible(page); - } - - if ( l4tab ) - { - if ( !rc ) - l4e_write(l4tab, l4e_empty()); - unmap_domain_page(l4tab); - } - else if ( !rc ) - { - v->arch.guest_table = pagetable_null(); - - /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */ - mfn = pagetable_get_pfn(v->arch.guest_table_user); - if ( mfn ) - { - page = mfn_to_page(mfn); - if ( paging_mode_refcounts(v->domain) ) - put_page(page); - else - rc = put_page_and_type_preemptible(page); - } - if ( !rc ) - v->arch.guest_table_user = pagetable_null(); - } - - v->arch.cr3 = 0; - - /* - * put_page_and_type_preemptible() is liable to return -EINTR. The - * callers of us expect -ERESTART so convert it over. - */ - return rc != -EINTR ? rc : -ERESTART; -} - -int new_guest_cr3(unsigned long mfn) -{ - struct vcpu *curr = current; - struct domain *d = curr->domain; - int rc; - unsigned long old_base_mfn; - - if ( is_pv_32bit_domain(d) ) - { - unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table); - l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn)); - - rc = paging_mode_refcounts(d) - ? -EINVAL /* Old code was broken, but what should it be? */ - : mod_l4_entry( - pl4e, - l4e_from_pfn( - mfn, - (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)), - gt_mfn, 0, curr); - unmap_domain_page(pl4e); - switch ( rc ) - { - case 0: - break; - case -EINTR: - case -ERESTART: - return -ERESTART; - default: - gdprintk(XENLOG_WARNING, - "Error while installing new compat baseptr %" PRI_mfn "\n", - mfn); - return rc; - } - - invalidate_shadow_ldt(curr, 0); - write_ptbase(curr); - - return 0; - } - - rc = put_old_guest_table(curr); - if ( unlikely(rc) ) - return rc; - - old_base_mfn = pagetable_get_pfn(curr->arch.guest_table); - /* - * This is particularly important when getting restarted after the - * previous attempt got preempted in the put-old-MFN phase. - */ - if ( old_base_mfn == mfn ) - { - write_ptbase(curr); - return 0; - } - - rc = paging_mode_refcounts(d) - ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL) - : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1); - switch ( rc ) - { - case 0: - break; - case -EINTR: - case -ERESTART: - return -ERESTART; - default: - gdprintk(XENLOG_WARNING, - "Error while installing new baseptr %" PRI_mfn "\n", mfn); - return rc; - } - - invalidate_shadow_ldt(curr, 0); - - if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) - fill_ro_mpt(mfn); - curr->arch.guest_table = pagetable_from_pfn(mfn); - update_cr3(curr); - - write_ptbase(curr); - - if ( likely(old_base_mfn != 0) ) - { - struct page_info *page = mfn_to_page(old_base_mfn); - - if ( paging_mode_refcounts(d) ) - put_page(page); - else - switch ( rc = put_page_and_type_preemptible(page) ) - { - case -EINTR: - rc = -ERESTART; - /* fallthrough */ - case -ERESTART: - curr->arch.old_guest_table = page; - break; - default: - BUG_ON(rc); - break; - } - } - - return rc; -} - -static struct domain *get_pg_owner(domid_t domid) -{ - struct domain *pg_owner = NULL, *curr = current->domain; - - if ( likely(domid == DOMID_SELF) ) - { - pg_owner = rcu_lock_current_domain(); - goto out; - } - - if ( unlikely(domid == curr->domain_id) ) - { - gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n"); - goto out; - } - - if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) ) - { - gdprintk(XENLOG_WARNING, - "Cannot mix foreign mappings with translated domains\n"); - goto out; - } - - switch ( domid ) - { - case DOMID_IO: - pg_owner = rcu_lock_domain(dom_io); - break; - case DOMID_XEN: - pg_owner = rcu_lock_domain(dom_xen); - break; - default: - if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL ) - { - gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid); - break; - } - break; - } - - out: - return pg_owner; -} - -static void put_pg_owner(struct domain *pg_owner) -{ - rcu_unlock_domain(pg_owner); -} - -static inline int vcpumask_to_pcpumask( - struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask) -{ - unsigned int vcpu_id, vcpu_bias, offs; - unsigned long vmask; - struct vcpu *v; - bool_t is_native = !is_pv_32bit_domain(d); - - cpumask_clear(pmask); - for ( vmask = 0, offs = 0; ; ++offs) - { - vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32); - if ( vcpu_bias >= d->max_vcpus ) - return 0; - - if ( unlikely(is_native ? - copy_from_guest_offset(&vmask, bmap, offs, 1) : - copy_from_guest_offset((unsigned int *)&vmask, bmap, - offs, 1)) ) - { - cpumask_clear(pmask); - return -EFAULT; - } - - while ( vmask ) - { - vcpu_id = find_first_set_bit(vmask); - vmask &= ~(1UL << vcpu_id); - vcpu_id += vcpu_bias; - if ( (vcpu_id >= d->max_vcpus) ) - return 0; - if ( ((v = d->vcpu[vcpu_id]) != NULL) ) - cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask); - } - } -} - -long do_mmuext_op( - XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops, - unsigned int count, - XEN_GUEST_HANDLE_PARAM(uint) pdone, - unsigned int foreigndom) -{ - struct mmuext_op op; - unsigned long type; - unsigned int i, done = 0; - struct vcpu *curr = current; - struct domain *d = curr->domain; - struct domain *pg_owner; - int rc = put_old_guest_table(curr); - - if ( unlikely(rc) ) - { - if ( likely(rc == -ERESTART) ) - rc = hypercall_create_continuation( - __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone, - foreigndom); - return rc; - } - - if ( unlikely(count == MMU_UPDATE_PREEMPTED) && - likely(guest_handle_is_null(uops)) ) - { - /* See the curr->arch.old_guest_table related - * hypercall_create_continuation() below. */ - return (int)foreigndom; - } - - if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) - { - count &= ~MMU_UPDATE_PREEMPTED; - if ( unlikely(!guest_handle_is_null(pdone)) ) - (void)copy_from_guest(&done, pdone, 1); - } - else - perfc_incr(calls_to_mmuext_op); - - if ( unlikely(!guest_handle_okay(uops, count)) ) - return -EFAULT; - - if ( (pg_owner = get_pg_owner(foreigndom)) == NULL ) - return -ESRCH; - - if ( !is_pv_domain(pg_owner) ) - { - put_pg_owner(pg_owner); - return -EINVAL; - } - - rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner); - if ( rc ) - { - put_pg_owner(pg_owner); - return rc; - } - - for ( i = 0; i < count; i++ ) - { - if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) - { - rc = -ERESTART; - break; - } - - if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) ) - { - rc = -EFAULT; - break; - } - - if ( is_hvm_domain(d) ) - { - switch ( op.cmd ) - { - case MMUEXT_PIN_L1_TABLE: - case MMUEXT_PIN_L2_TABLE: - case MMUEXT_PIN_L3_TABLE: - case MMUEXT_PIN_L4_TABLE: - case MMUEXT_UNPIN_TABLE: - break; - default: - rc = -EOPNOTSUPP; - goto done; - } - } - - rc = 0; - - switch ( op.cmd ) - { - case MMUEXT_PIN_L1_TABLE: - type = PGT_l1_page_table; - goto pin_page; - - case MMUEXT_PIN_L2_TABLE: - type = PGT_l2_page_table; - goto pin_page; - - case MMUEXT_PIN_L3_TABLE: - type = PGT_l3_page_table; - goto pin_page; - - case MMUEXT_PIN_L4_TABLE: - if ( is_pv_32bit_domain(pg_owner) ) - break; - type = PGT_l4_page_table; - - pin_page: { - struct page_info *page; - - /* Ignore pinning of invalid paging levels. */ - if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) ) - break; - - if ( paging_mode_refcounts(pg_owner) ) - break; - - page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); - if ( unlikely(!page) ) - { - rc = -EINVAL; - break; - } - - rc = get_page_type_preemptible(page, type); - if ( unlikely(rc) ) - { - if ( rc == -EINTR ) - rc = -ERESTART; - else if ( rc != -ERESTART ) - gdprintk(XENLOG_WARNING, - "Error %d while pinning mfn %" PRI_mfn "\n", - rc, page_to_mfn(page)); - if ( page != curr->arch.old_guest_table ) - put_page(page); - break; - } - - rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page); - if ( !rc && unlikely(test_and_set_bit(_PGT_pinned, - &page->u.inuse.type_info)) ) - { - gdprintk(XENLOG_WARNING, - "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page)); - rc = -EINVAL; - } - - if ( unlikely(rc) ) - goto pin_drop; - - /* A page is dirtied when its pin status is set. */ - paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); - - /* We can race domain destruction (domain_relinquish_resources). */ - if ( unlikely(pg_owner != d) ) - { - int drop_ref; - spin_lock(&pg_owner->page_alloc_lock); - drop_ref = (pg_owner->is_dying && - test_and_clear_bit(_PGT_pinned, - &page->u.inuse.type_info)); - spin_unlock(&pg_owner->page_alloc_lock); - if ( drop_ref ) - { - pin_drop: - if ( type == PGT_l1_page_table ) - put_page_and_type(page); - else - curr->arch.old_guest_table = page; - } - } - - break; - } - - case MMUEXT_UNPIN_TABLE: { - struct page_info *page; - - if ( paging_mode_refcounts(pg_owner) ) - break; - - page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); - if ( unlikely(!page) ) - { - gdprintk(XENLOG_WARNING, - "mfn %" PRI_mfn " bad, or bad owner d%d\n", - op.arg1.mfn, pg_owner->domain_id); - rc = -EINVAL; - break; - } - - if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) ) - { - put_page(page); - gdprintk(XENLOG_WARNING, - "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn); - rc = -EINVAL; - break; - } - - switch ( rc = put_page_and_type_preemptible(page) ) - { - case -EINTR: - case -ERESTART: - curr->arch.old_guest_table = page; - rc = 0; - break; - default: - BUG_ON(rc); - break; - } - put_page(page); - - /* A page is dirtied when its pin status is cleared. */ - paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); - - break; - } - - case MMUEXT_NEW_BASEPTR: - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( unlikely(paging_mode_translate(d)) ) - rc = -EINVAL; - else - rc = new_guest_cr3(op.arg1.mfn); - break; - - case MMUEXT_NEW_USER_BASEPTR: { - unsigned long old_mfn; - - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( unlikely(paging_mode_translate(d)) ) - rc = -EINVAL; - if ( unlikely(rc) ) - break; - - old_mfn = pagetable_get_pfn(curr->arch.guest_table_user); - /* - * This is particularly important when getting restarted after the - * previous attempt got preempted in the put-old-MFN phase. - */ - if ( old_mfn == op.arg1.mfn ) - break; - - if ( op.arg1.mfn != 0 ) - { - if ( paging_mode_refcounts(d) ) - rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL; - else - rc = get_page_and_type_from_pagenr( - op.arg1.mfn, PGT_root_page_table, d, 0, 1); - - if ( unlikely(rc) ) - { - if ( rc == -EINTR ) - rc = -ERESTART; - else if ( rc != -ERESTART ) - gdprintk(XENLOG_WARNING, - "Error %d installing new mfn %" PRI_mfn "\n", - rc, op.arg1.mfn); - break; - } - if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) - zap_ro_mpt(op.arg1.mfn); - } - - curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn); - - if ( old_mfn != 0 ) - { - struct page_info *page = mfn_to_page(old_mfn); - - if ( paging_mode_refcounts(d) ) - put_page(page); - else - switch ( rc = put_page_and_type_preemptible(page) ) - { - case -EINTR: - rc = -ERESTART; - /* fallthrough */ - case -ERESTART: - curr->arch.old_guest_table = page; - break; - default: - BUG_ON(rc); - break; - } - } - - break; - } - - case MMUEXT_TLB_FLUSH_LOCAL: - if ( likely(d == pg_owner) ) - flush_tlb_local(); - else - rc = -EPERM; - break; - - case MMUEXT_INVLPG_LOCAL: - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else - paging_invlpg(curr, op.arg1.linear_addr); - break; - - case MMUEXT_TLB_FLUSH_MULTI: - case MMUEXT_INVLPG_MULTI: - { - cpumask_t *mask = this_cpu(scratch_cpumask); - - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( unlikely(vcpumask_to_pcpumask(d, - guest_handle_to_param(op.arg2.vcpumask, - const_void), - mask)) ) - rc = -EINVAL; - if ( unlikely(rc) ) - break; - - if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI ) - flush_tlb_mask(mask); - else if ( __addr_ok(op.arg1.linear_addr) ) - flush_tlb_one_mask(mask, op.arg1.linear_addr); - break; - } - - case MMUEXT_TLB_FLUSH_ALL: - if ( likely(d == pg_owner) ) - flush_tlb_mask(d->domain_dirty_cpumask); - else - rc = -EPERM; - break; - - case MMUEXT_INVLPG_ALL: - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( __addr_ok(op.arg1.linear_addr) ) - flush_tlb_one_mask(d->domain_dirty_cpumask, op.arg1.linear_addr); - break; - - case MMUEXT_FLUSH_CACHE: - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( unlikely(!cache_flush_permitted(d)) ) - rc = -EACCES; - else - wbinvd(); - break; - - case MMUEXT_FLUSH_CACHE_GLOBAL: - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( likely(cache_flush_permitted(d)) ) - { - unsigned int cpu; - cpumask_t *mask = this_cpu(scratch_cpumask); - - cpumask_clear(mask); - for_each_online_cpu(cpu) - if ( !cpumask_intersects(mask, - per_cpu(cpu_sibling_mask, cpu)) ) - __cpumask_set_cpu(cpu, mask); - flush_mask(mask, FLUSH_CACHE); - } - else - rc = -EINVAL; - break; - - case MMUEXT_SET_LDT: - { - unsigned int ents = op.arg2.nr_ents; - unsigned long ptr = ents ? op.arg1.linear_addr : 0; - - if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( paging_mode_external(d) ) - rc = -EINVAL; - else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) || - (ents > 8192) ) - { - gdprintk(XENLOG_WARNING, - "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents); - rc = -EINVAL; - } - else if ( (curr->arch.pv_vcpu.ldt_ents != ents) || - (curr->arch.pv_vcpu.ldt_base != ptr) ) - { - invalidate_shadow_ldt(curr, 0); - flush_tlb_local(); - curr->arch.pv_vcpu.ldt_base = ptr; - curr->arch.pv_vcpu.ldt_ents = ents; - load_LDT(curr); - } - break; - } - - case MMUEXT_CLEAR_PAGE: { - struct page_info *page; - - page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); - if ( !page || !get_page_type(page, PGT_writable_page) ) - { - if ( page ) - put_page(page); - gdprintk(XENLOG_WARNING, - "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn); - rc = -EINVAL; - break; - } - - /* A page is dirtied when it's being cleared. */ - paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); - - clear_domain_page(_mfn(page_to_mfn(page))); - - put_page_and_type(page); - break; - } - - case MMUEXT_COPY_PAGE: - { - struct page_info *src_page, *dst_page; - - src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL, - P2M_ALLOC); - if ( unlikely(!src_page) ) - { - gdprintk(XENLOG_WARNING, - "Error copying from mfn %" PRI_mfn "\n", - op.arg2.src_mfn); - rc = -EINVAL; - break; - } - - dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, - P2M_ALLOC); - rc = (dst_page && - get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL; - if ( unlikely(rc) ) - { - put_page(src_page); - if ( dst_page ) - put_page(dst_page); - gdprintk(XENLOG_WARNING, - "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn); - break; - } - - /* A page is dirtied when it's being copied to. */ - paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page))); - - copy_domain_page(_mfn(page_to_mfn(dst_page)), - _mfn(page_to_mfn(src_page))); - - put_page_and_type(dst_page); - put_page(src_page); - break; - } - - case MMUEXT_MARK_SUPER: - case MMUEXT_UNMARK_SUPER: - { - unsigned long mfn = op.arg1.mfn; - - if ( !opt_allow_superpage ) - rc = -EOPNOTSUPP; - else if ( unlikely(d != pg_owner) ) - rc = -EPERM; - else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) ) - { - gdprintk(XENLOG_WARNING, - "Unaligned superpage mfn %" PRI_mfn "\n", mfn); - rc = -EINVAL; - } - else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) ) - rc = -EINVAL; - else if ( op.cmd == MMUEXT_MARK_SUPER ) - rc = mark_superpage(mfn_to_spage(mfn), d); - else - rc = unmark_superpage(mfn_to_spage(mfn)); - break; - } - - default: - rc = -ENOSYS; - break; - } - - done: - if ( unlikely(rc) ) - break; - - guest_handle_add_offset(uops, 1); - } - - if ( rc == -ERESTART ) - { - ASSERT(i < count); - rc = hypercall_create_continuation( - __HYPERVISOR_mmuext_op, "hihi", - uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); - } - else if ( curr->arch.old_guest_table ) - { - XEN_GUEST_HANDLE_PARAM(void) null; - - ASSERT(rc || i == count); - set_xen_guest_handle(null, NULL); - /* - * In order to have a way to communicate the final return value to - * our continuation, we pass this in place of "foreigndom", building - * on the fact that this argument isn't needed anymore. - */ - rc = hypercall_create_continuation( - __HYPERVISOR_mmuext_op, "hihi", null, - MMU_UPDATE_PREEMPTED, null, rc); - } - - put_pg_owner(pg_owner); - - perfc_add(num_mmuext_ops, i); - - /* Add incremental work we have done to the @done output parameter. */ - if ( unlikely(!guest_handle_is_null(pdone)) ) - { - done += i; - copy_to_guest(pdone, &done, 1); - } - - return rc; -} - -long do_mmu_update( - XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs, - unsigned int count, - XEN_GUEST_HANDLE_PARAM(uint) pdone, - unsigned int foreigndom) -{ - struct mmu_update req; - void *va; - unsigned long gpfn, gmfn, mfn; - struct page_info *page; - unsigned int cmd, i = 0, done = 0, pt_dom; - struct vcpu *curr = current, *v = curr; - struct domain *d = v->domain, *pt_owner = d, *pg_owner; - struct domain_mmap_cache mapcache; - uint32_t xsm_needed = 0; - uint32_t xsm_checked = 0; - int rc = put_old_guest_table(curr); - - if ( unlikely(rc) ) - { - if ( likely(rc == -ERESTART) ) - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone, - foreigndom); - return rc; - } - - if ( unlikely(count == MMU_UPDATE_PREEMPTED) && - likely(guest_handle_is_null(ureqs)) ) - { - /* See the curr->arch.old_guest_table related - * hypercall_create_continuation() below. */ - return (int)foreigndom; - } - - if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) - { - count &= ~MMU_UPDATE_PREEMPTED; - if ( unlikely(!guest_handle_is_null(pdone)) ) - (void)copy_from_guest(&done, pdone, 1); - } - else - perfc_incr(calls_to_mmu_update); - - if ( unlikely(!guest_handle_okay(ureqs, count)) ) - return -EFAULT; - - if ( (pt_dom = foreigndom >> 16) != 0 ) - { - /* Pagetables belong to a foreign domain (PFD). */ - if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL ) - return -ESRCH; - - if ( pt_owner == d ) - rcu_unlock_domain(pt_owner); - else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL ) - { - rc = -EINVAL; - goto out; - } - } - - if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL ) - { - rc = -ESRCH; - goto out; - } - - domain_mmap_cache_init(&mapcache); - - for ( i = 0; i < count; i++ ) - { - if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) - { - rc = -ERESTART; - break; - } - - if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) ) - { - rc = -EFAULT; - break; - } - - cmd = req.ptr & (sizeof(l1_pgentry_t)-1); - - switch ( cmd ) - { - /* - * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table. - * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR) - * current A/D bits. - */ - case MMU_NORMAL_PT_UPDATE: - case MMU_PT_UPDATE_PRESERVE_AD: - { - p2m_type_t p2mt; - - rc = -EOPNOTSUPP; - if ( unlikely(paging_mode_refcounts(pt_owner)) ) - break; - - xsm_needed |= XSM_MMU_NORMAL_UPDATE; - if ( get_pte_flags(req.val) & _PAGE_PRESENT ) - { - xsm_needed |= XSM_MMU_UPDATE_READ; - if ( get_pte_flags(req.val) & _PAGE_RW ) - xsm_needed |= XSM_MMU_UPDATE_WRITE; - } - if ( xsm_needed != xsm_checked ) - { - rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed); - if ( rc ) - break; - xsm_checked = xsm_needed; - } - rc = -EINVAL; - - req.ptr -= cmd; - gmfn = req.ptr >> PAGE_SHIFT; - page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC); - - if ( p2m_is_paged(p2mt) ) - { - ASSERT(!page); - p2m_mem_paging_populate(pg_owner, gmfn); - rc = -ENOENT; - break; - } - - if ( unlikely(!page) ) - { - gdprintk(XENLOG_WARNING, - "Could not get page for normal update\n"); - break; - } - - mfn = page_to_mfn(page); - va = map_domain_page_with_cache(mfn, &mapcache); - va = (void *)((unsigned long)va + - (unsigned long)(req.ptr & ~PAGE_MASK)); - - if ( page_lock(page) ) - { - switch ( page->u.inuse.type_info & PGT_type_mask ) - { - case PGT_l1_page_table: - { - l1_pgentry_t l1e = l1e_from_intpte(req.val); - p2m_type_t l1e_p2mt = p2m_ram_rw; - struct page_info *target = NULL; - p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ? - P2M_UNSHARE : P2M_ALLOC; - - if ( paging_mode_translate(pg_owner) ) - target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e), - &l1e_p2mt, q); - - if ( p2m_is_paged(l1e_p2mt) ) - { - if ( target ) - put_page(target); - p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); - rc = -ENOENT; - break; - } - else if ( p2m_ram_paging_in == l1e_p2mt && !target ) - { - rc = -ENOENT; - break; - } - /* If we tried to unshare and failed */ - else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) ) - { - /* We could not have obtained a page ref. */ - ASSERT(target == NULL); - /* And mem_sharing_notify has already been called. */ - rc = -ENOMEM; - break; - } - - rc = mod_l1_entry(va, l1e, mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v, - pg_owner); - if ( target ) - put_page(target); - } - break; - case PGT_l2_page_table: - rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_l3_page_table: - rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_l4_page_table: - rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, - cmd == MMU_PT_UPDATE_PRESERVE_AD, v); - break; - case PGT_writable_page: - perfc_incr(writable_mmu_updates); - if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) - rc = 0; - break; - } - page_unlock(page); - if ( rc == -EINTR ) - rc = -ERESTART; - } - else if ( get_page_type(page, PGT_writable_page) ) - { - perfc_incr(writable_mmu_updates); - if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) - rc = 0; - put_page_type(page); - } - - unmap_domain_page_with_cache(va, &mapcache); - put_page(page); - } - break; - - case MMU_MACHPHYS_UPDATE: - if ( unlikely(d != pt_owner) ) - { - rc = -EPERM; - break; - } - - if ( unlikely(paging_mode_translate(pg_owner)) ) - { - rc = -EINVAL; - break; - } - - mfn = req.ptr >> PAGE_SHIFT; - gpfn = req.val; - - xsm_needed |= XSM_MMU_MACHPHYS_UPDATE; - if ( xsm_needed != xsm_checked ) - { - rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed); - if ( rc ) - break; - xsm_checked = xsm_needed; - } - - if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) ) + if ( unlikely((nx & PGT_type_mask) <= PGT_l4_page_table) && + likely(nx & (PGT_validated|PGT_partial)) ) { - gdprintk(XENLOG_WARNING, - "Could not get page for mach->phys update\n"); - rc = -EINVAL; + /* + * Page-table pages must be unvalidated when count is zero. The + * 'free' is safe because the refcnt is non-zero and validated + * bit is clear => other ops will spin or fail. + */ + nx = x & ~(PGT_validated|PGT_partial); + if ( unlikely((y = cmpxchg(&page->u.inuse.type_info, + x, nx)) != x) ) + continue; + /* We cleared the 'valid bit' so we do the clean up. */ + rc = __put_final_page_type(page, x, preemptible); + if ( x & PGT_partial ) + put_page(page); break; } - set_gpfn_from_mfn(mfn, gpfn); - - paging_mark_dirty(pg_owner, _mfn(mfn)); - - put_page(mfn_to_page(mfn)); - break; - - default: - rc = -ENOSYS; - break; + /* + * Record TLB information for flush later. We do not stamp page + * tables when running in shadow mode: + * 1. Pointless, since it's the shadow pt's which must be tracked. + * 2. Shadow mode reuses this field for shadowed page tables to + * store flags info -- we don't want to conflict with that. + */ + if ( !(shadow_mode_enabled(page_get_owner(page)) && + (page->count_info & PGC_page_table)) ) + page->tlbflush_timestamp = tlbflush_current_time(); } - if ( unlikely(rc) ) + if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) ) break; - guest_handle_add_offset(ureqs, 1); - } - - if ( rc == -ERESTART ) - { - ASSERT(i < count); - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", - ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); - } - else if ( curr->arch.old_guest_table ) - { - XEN_GUEST_HANDLE_PARAM(void) null; - - ASSERT(rc || i == count); - set_xen_guest_handle(null, NULL); - /* - * In order to have a way to communicate the final return value to - * our continuation, we pass this in place of "foreigndom", building - * on the fact that this argument isn't needed anymore. - */ - rc = hypercall_create_continuation( - __HYPERVISOR_mmu_update, "hihi", null, - MMU_UPDATE_PREEMPTED, null, rc); - } - - put_pg_owner(pg_owner); - - domain_mmap_cache_destroy(&mapcache); - - perfc_add(num_page_updates, i); - - out: - if ( pt_owner != d ) - rcu_unlock_domain(pt_owner); - - /* Add incremental work we have done to the @done output parameter. */ - if ( unlikely(!guest_handle_is_null(pdone)) ) - { - done += i; - copy_to_guest(pdone, &done, 1); + if ( preemptible && hypercall_preempt_check() ) + return -EINTR; } return rc; } -static int create_grant_pte_mapping( - uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v) +static int __get_page_type(struct page_info *page, unsigned long type, + int preemptible) { - int rc = GNTST_okay; - void *va; - unsigned long gmfn, mfn; - struct page_info *page; - l1_pgentry_t ol1e; - struct domain *d = v->domain; - - adjust_guest_l1e(nl1e, d); - - gmfn = pte_addr >> PAGE_SHIFT; - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); - - if ( unlikely(!page) ) - { - gdprintk(XENLOG_WARNING, "Could not get page for normal update\n"); - return GNTST_general_error; - } - - mfn = page_to_mfn(page); - va = map_domain_page(_mfn(mfn)); - va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK)); + unsigned long nx, x, y = page->u.inuse.type_info; + int rc = 0, iommu_ret = 0; - if ( !page_lock(page) ) - { - rc = GNTST_general_error; - goto failed; - } + ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2))); + ASSERT(!in_irq()); - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + for ( ; ; ) { - page_unlock(page); - rc = GNTST_general_error; - goto failed; - } + x = y; + nx = x + 1; + if ( unlikely((nx & PGT_count_mask) == 0) ) + { + gdprintk(XENLOG_WARNING, + "Type count overflow on mfn %"PRI_mfn"\n", + page_to_mfn(page)); + return -EINVAL; + } + else if ( unlikely((x & PGT_count_mask) == 0) ) + { + struct domain *d = page_get_owner(page); - ol1e = *(l1_pgentry_t *)va; - if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) ) - { - page_unlock(page); - rc = GNTST_general_error; - goto failed; - } + /* Normally we should never let a page go from type count 0 + * to type count 1 when it is shadowed. One exception: + * out-of-sync shadowed pages are allowed to become + * writeable. */ + if ( d && shadow_mode_enabled(d) + && (page->count_info & PGC_page_table) + && !((page->shadow_flags & (1u<<29)) + && type == PGT_writable_page) ) + shadow_remove_all_shadows(d, _mfn(page_to_mfn(page))); - page_unlock(page); + ASSERT(!(x & PGT_pae_xen_l2)); + if ( (x & PGT_type_mask) != type ) + { + /* + * On type change we check to flush stale TLB entries. This + * may be unnecessary (e.g., page was GDT/LDT) but those + * circumstances should be very rare. + */ + cpumask_t *mask = this_cpu(scratch_cpumask); - if ( !paging_mode_refcounts(d) ) - put_page_from_l1e(ol1e, d); + BUG_ON(in_irq()); + cpumask_copy(mask, d->domain_dirty_cpumask); - failed: - unmap_domain_page(va); - put_page(page); + /* Don't flush if the timestamp is old enough */ + tlbflush_filter(mask, page->tlbflush_timestamp); - return rc; -} + if ( unlikely(!cpumask_empty(mask)) && + /* Shadow mode: track only writable pages. */ + (!shadow_mode_enabled(page_get_owner(page)) || + ((nx & PGT_type_mask) == PGT_writable_page)) ) + { + perfc_incr(need_flush_tlb_flush); + flush_tlb_mask(mask); + } -static int destroy_grant_pte_mapping( - uint64_t addr, unsigned long frame, struct domain *d) -{ - int rc = GNTST_okay; - void *va; - unsigned long gmfn, mfn; - struct page_info *page; - l1_pgentry_t ol1e; + /* We lose existing type and validity. */ + nx &= ~(PGT_type_mask | PGT_validated); + nx |= type; - gmfn = addr >> PAGE_SHIFT; - page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + /* No special validation needed for writable pages. */ + /* Page tables and GDT/LDT need to be scanned for validity. */ + if ( type == PGT_writable_page || type == PGT_shared_page ) + nx |= PGT_validated; + } + } + else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) ) + { + /* Don't log failure if it could be a recursive-mapping attempt. */ + if ( ((x & PGT_type_mask) == PGT_l2_page_table) && + (type == PGT_l1_page_table) ) + return -EINVAL; + if ( ((x & PGT_type_mask) == PGT_l3_page_table) && + (type == PGT_l2_page_table) ) + return -EINVAL; + if ( ((x & PGT_type_mask) == PGT_l4_page_table) && + (type == PGT_l3_page_table) ) + return -EINVAL; + gdprintk(XENLOG_WARNING, + "Bad type (saw %" PRtype_info " != exp %" PRtype_info ") " + "for mfn %" PRI_mfn " (pfn %" PRI_pfn ")\n", + x, type, page_to_mfn(page), + get_gpfn_from_mfn(page_to_mfn(page))); + return -EINVAL; + } + else if ( unlikely(!(x & PGT_validated)) ) + { + if ( !(x & PGT_partial) ) + { + /* Someone else is updating validation of this page. Wait... */ + while ( (y = page->u.inuse.type_info) == x ) + { + if ( preemptible && hypercall_preempt_check() ) + return -EINTR; + cpu_relax(); + } + continue; + } + /* Type ref count was left at 1 when PGT_partial got set. */ + ASSERT((x & PGT_count_mask) == 1); + nx = x & ~PGT_partial; + } - if ( unlikely(!page) ) - { - gdprintk(XENLOG_WARNING, "Could not get page for normal update\n"); - return GNTST_general_error; - } - - mfn = page_to_mfn(page); - va = map_domain_page(_mfn(mfn)); - va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK)); + if ( likely((y = cmpxchg(&page->u.inuse.type_info, x, nx)) == x) ) + break; - if ( !page_lock(page) ) - { - rc = GNTST_general_error; - goto failed; + if ( preemptible && hypercall_preempt_check() ) + return -EINTR; } - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + if ( unlikely((x & PGT_type_mask) != type) ) { - page_unlock(page); - rc = GNTST_general_error; - goto failed; + /* Special pages should not be accessible from devices. */ + struct domain *d = page_get_owner(page); + if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) ) + { + if ( (x & PGT_type_mask) == PGT_writable_page ) + iommu_ret = iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page))); + else if ( type == PGT_writable_page ) + iommu_ret = iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)), + page_to_mfn(page), + IOMMUF_readable|IOMMUF_writable); + } } - ol1e = *(l1_pgentry_t *)va; - - /* Check that the virtual address supplied is actually mapped to frame. */ - if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + if ( unlikely(!(nx & PGT_validated)) ) { - page_unlock(page); - gdprintk(XENLOG_WARNING, - "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n", - l1e_get_intpte(ol1e), addr, frame); - rc = GNTST_general_error; - goto failed; + if ( !(x & PGT_partial) ) + { + page->nr_validated_ptes = 0; + page->partial_pte = 0; + } + rc = alloc_page_type(page, type, preemptible); } - /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY - (l1, - (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, - d->vcpu[0] /* Change if we go to per-vcpu shadows. */, - 0)) ) - { - page_unlock(page); - gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va); - rc = GNTST_general_error; - goto failed; - } + if ( (x & PGT_partial) && !(nx & PGT_partial) ) + put_page(page); - page_unlock(page); + if ( !rc ) + rc = iommu_ret; - failed: - unmap_domain_page(va); - put_page(page); return rc; } - -static int create_grant_va_mapping( - unsigned long va, l1_pgentry_t nl1e, struct vcpu *v) +void put_page_type(struct page_info *page) { - l1_pgentry_t *pl1e, ol1e; - struct domain *d = v->domain; - unsigned long gl1mfn; - struct page_info *l1pg; - int okay; - - adjust_guest_l1e(nl1e, d); - - pl1e = guest_map_l1e(va, &gl1mfn); - if ( !pl1e ) - { - gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va); - return GNTST_general_error; - } - - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) - { - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - ol1e = *pl1e; - okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0); + int rc = __put_page_type(page, 0); + ASSERT(rc == 0); + (void)rc; +} - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); +int get_page_type(struct page_info *page, unsigned long type) +{ + int rc = __get_page_type(page, type, 0); + if ( likely(rc == 0) ) + return 1; + ASSERT(rc != -EINTR && rc != -ERESTART); + return 0; +} - if ( okay && !paging_mode_refcounts(d) ) - put_page_from_l1e(ol1e, d); +int put_page_type_preemptible(struct page_info *page) +{ + return __put_page_type(page, 1); +} - return okay ? GNTST_okay : GNTST_general_error; +int get_page_type_preemptible(struct page_info *page, unsigned long type) +{ + ASSERT(!current->arch.old_guest_table); + return __get_page_type(page, type, 1); } -static int replace_grant_va_mapping( - unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v) +int vcpu_destroy_pagetables(struct vcpu *v) { - l1_pgentry_t *pl1e, ol1e; - unsigned long gl1mfn; - struct page_info *l1pg; - int rc = 0; - - pl1e = guest_map_l1e(addr, &gl1mfn); - if ( !pl1e ) - { - gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr); - return GNTST_general_error; - } + unsigned long mfn = pagetable_get_pfn(v->arch.guest_table); + struct page_info *page; + l4_pgentry_t *l4tab = NULL; + int rc = put_old_guest_table(v); - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - rc = GNTST_general_error; - goto out; - } + if ( rc ) + return rc; - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) + if ( is_pv_32bit_vcpu(v) ) { - rc = GNTST_general_error; - put_page(l1pg); - goto out; + l4tab = map_domain_page(_mfn(mfn)); + mfn = l4e_get_pfn(*l4tab); } - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + if ( mfn ) { - rc = GNTST_general_error; - goto unlock_and_out; + page = mfn_to_page(mfn); + if ( paging_mode_refcounts(v->domain) ) + put_page(page); + else + rc = put_page_and_type_preemptible(page); } - ol1e = *pl1e; - - /* Check that the virtual address supplied is actually mapped to frame. */ - if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + if ( l4tab ) { - gdprintk(XENLOG_WARNING, - "PTE entry %lx for address %lx doesn't match frame %lx\n", - l1e_get_pfn(ol1e), addr, frame); - rc = GNTST_general_error; - goto unlock_and_out; + if ( !rc ) + l4e_write(l4tab, l4e_empty()); + unmap_domain_page(l4tab); } - - /* Delete pagetable entry. */ - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) ) + else if ( !rc ) { - gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e); - rc = GNTST_general_error; - goto unlock_and_out; + v->arch.guest_table = pagetable_null(); + + /* Drop ref to guest_table_user (from MMUEXT_NEW_USER_BASEPTR) */ + mfn = pagetable_get_pfn(v->arch.guest_table_user); + if ( mfn ) + { + page = mfn_to_page(mfn); + if ( paging_mode_refcounts(v->domain) ) + put_page(page); + else + rc = put_page_and_type_preemptible(page); + } + if ( !rc ) + v->arch.guest_table_user = pagetable_null(); } - unlock_and_out: - page_unlock(l1pg); - put_page(l1pg); - out: - guest_unmap_l1e(pl1e); - return rc; -} + v->arch.cr3 = 0; -static int destroy_grant_va_mapping( - unsigned long addr, unsigned long frame, struct vcpu *v) -{ - return replace_grant_va_mapping(addr, frame, l1e_empty(), v); + /* + * put_page_and_type_preemptible() is liable to return -EINTR. The + * callers of us expect -ERESTART so convert it over. + */ + return rc != -EINTR ? rc : -ERESTART; } static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame, @@ -4267,34 +1032,6 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame, return GNTST_okay; } -static int create_grant_pv_mapping(uint64_t addr, unsigned long frame, - unsigned int flags, unsigned int cache_flags) -{ - l1_pgentry_t pte; - uint32_t grant_pte_flags; - - grant_pte_flags = - _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB; - if ( cpu_has_nx ) - grant_pte_flags |= _PAGE_NX_BIT; - - pte = l1e_from_pfn(frame, grant_pte_flags); - if ( (flags & GNTMAP_application_map) ) - l1e_add_flags(pte,_PAGE_USER); - if ( !(flags & GNTMAP_readonly) ) - l1e_add_flags(pte,_PAGE_RW); - - l1e_add_flags(pte, - ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0) - & _PAGE_AVAIL); - - l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5)); - - if ( flags & GNTMAP_contains_pte ) - return create_grant_pte_mapping(addr, pte, current); - return create_grant_va_mapping(addr, pte, current); -} - int create_grant_host_mapping(uint64_t addr, unsigned long frame, unsigned int flags, unsigned int cache_flags) { @@ -4327,453 +1064,108 @@ static int replace_grant_p2m_mapping( guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K); put_gfn(d, gfn); - return GNTST_okay; -} - -static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, - uint64_t new_addr, unsigned int flags) -{ - struct vcpu *curr = current; - l1_pgentry_t *pl1e, ol1e; - unsigned long gl1mfn; - struct page_info *l1pg; - int rc; - - if ( flags & GNTMAP_contains_pte ) - { - if ( !new_addr ) - return destroy_grant_pte_mapping(addr, frame, curr->domain); - - return GNTST_general_error; - } - - if ( !new_addr ) - return destroy_grant_va_mapping(addr, frame, curr); - - pl1e = guest_map_l1e(new_addr, &gl1mfn); - if ( !pl1e ) - { - gdprintk(XENLOG_WARNING, - "Could not find L1 PTE for address %"PRIx64"\n", new_addr); - return GNTST_general_error; - } - - if ( !get_page_from_pagenr(gl1mfn, current->domain) ) - { - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - l1pg = mfn_to_page(gl1mfn); - if ( !page_lock(l1pg) ) - { - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - ol1e = *pl1e; - - if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), - gl1mfn, curr, 0)) ) - { - page_unlock(l1pg); - put_page(l1pg); - gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e); - guest_unmap_l1e(pl1e); - return GNTST_general_error; - } - - page_unlock(l1pg); - put_page(l1pg); - guest_unmap_l1e(pl1e); - - rc = replace_grant_va_mapping(addr, frame, ol1e, curr); - if ( rc && !paging_mode_refcounts(curr->domain) ) - put_page_from_l1e(ol1e, curr->domain); - - return rc; -} - -int replace_grant_host_mapping(uint64_t addr, unsigned long frame, - uint64_t new_addr, unsigned int flags) -{ - if ( paging_mode_external(current->domain) ) - return replace_grant_p2m_mapping(addr, frame, new_addr, flags); - - return replace_grant_pv_mapping(addr, frame, new_addr, flags); -} - -int donate_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - const struct domain *owner = dom_xen; - - spin_lock(&d->page_alloc_lock); - - if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) ) - goto fail; - - if ( d->is_dying ) - goto fail; - - if ( page->count_info & ~(PGC_allocated | 1) ) - goto fail; - - if ( !(memflags & MEMF_no_refcount) ) - { - if ( d->tot_pages >= d->max_pages ) - goto fail; - domain_adjust_tot_pages(d, 1); - } - - page->count_info = PGC_allocated | 1; - page_set_owner(page, d); - page_list_add_tail(page,&d->page_list); - - spin_unlock(&d->page_alloc_lock); - return 0; - - fail: - spin_unlock(&d->page_alloc_lock); - gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn - " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n", - page_to_mfn(page), d->domain_id, - owner ? owner->domain_id : DOMID_INVALID, - page->count_info, page->u.inuse.type_info); - return -1; -} - -int steal_page( - struct domain *d, struct page_info *page, unsigned int memflags) -{ - unsigned long x, y; - bool_t drop_dom_ref = 0; - const struct domain *owner = dom_xen; - - spin_lock(&d->page_alloc_lock); - - if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) ) - goto fail; - - /* - * We require there is just one reference (PGC_allocated). We temporarily - * drop this reference now so that we can safely swizzle the owner. - */ - y = page->count_info; - do { - x = y; - if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) - goto fail; - y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); - } while ( y != x ); - - /* Swizzle the owner then reinstate the PGC_allocated reference. */ - page_set_owner(page, NULL); - y = page->count_info; - do { - x = y; - BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); - } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); - - /* Unlink from original owner. */ - if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) - drop_dom_ref = 1; - page_list_del(page, &d->page_list); - - spin_unlock(&d->page_alloc_lock); - if ( unlikely(drop_dom_ref) ) - put_domain(d); - return 0; - - fail: - spin_unlock(&d->page_alloc_lock); - gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn - " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n", - page_to_mfn(page), d->domain_id, - owner ? owner->domain_id : DOMID_INVALID, - page->count_info, page->u.inuse.type_info); - return -1; -} - -static int __do_update_va_mapping( - unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) -{ - l1_pgentry_t val = l1e_from_intpte(val64); - struct vcpu *v = current; - struct domain *d = v->domain; - struct page_info *gl1pg; - l1_pgentry_t *pl1e; - unsigned long bmap_ptr, gl1mfn; - cpumask_t *mask = NULL; - int rc; - - perfc_incr(calls_to_update_va); - - rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val); - if ( rc ) - return rc; - - rc = -EINVAL; - pl1e = guest_map_l1e(va, &gl1mfn); - if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) ) - goto out; - - gl1pg = mfn_to_page(gl1mfn); - if ( !page_lock(gl1pg) ) - { - put_page(gl1pg); - goto out; - } - - if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(gl1pg); - put_page(gl1pg); - goto out; - } - - rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner); - - page_unlock(gl1pg); - put_page(gl1pg); - - out: - if ( pl1e ) - guest_unmap_l1e(pl1e); - - switch ( flags & UVMF_FLUSHTYPE_MASK ) - { - case UVMF_TLB_FLUSH: - switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) - { - case UVMF_LOCAL: - flush_tlb_local(); - break; - case UVMF_ALL: - mask = d->domain_dirty_cpumask; - break; - default: - mask = this_cpu(scratch_cpumask); - rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr, - void), - mask); - break; - } - if ( mask ) - flush_tlb_mask(mask); - break; - - case UVMF_INVLPG: - switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) - { - case UVMF_LOCAL: - paging_invlpg(v, va); - break; - case UVMF_ALL: - mask = d->domain_dirty_cpumask; - break; - default: - mask = this_cpu(scratch_cpumask); - rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr, - void), - mask); - break; - } - if ( mask ) - flush_tlb_one_mask(mask, va); - break; - } - - return rc; -} - -long do_update_va_mapping(unsigned long va, u64 val64, - unsigned long flags) -{ - return __do_update_va_mapping(va, val64, flags, current->domain); -} - -long do_update_va_mapping_otherdomain(unsigned long va, u64 val64, - unsigned long flags, - domid_t domid) -{ - struct domain *pg_owner; - int rc; - - if ( (pg_owner = get_pg_owner(domid)) == NULL ) - return -ESRCH; - - rc = __do_update_va_mapping(va, val64, flags, pg_owner); - - put_pg_owner(pg_owner); - - return rc; -} - - - -/************************* - * Descriptor Tables - */ - -void destroy_gdt(struct vcpu *v) -{ - l1_pgentry_t *pl1e; - unsigned int i; - unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page)); - - v->arch.pv_vcpu.gdt_ents = 0; - pl1e = gdt_ldt_ptes(v->domain, v); - for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) - { - pfn = l1e_get_pfn(pl1e[i]); - if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn ) - put_page_and_type(mfn_to_page(pfn)); - l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO)); - v->arch.pv_vcpu.gdt_frames[i] = 0; - } -} - - -long set_gdt(struct vcpu *v, - unsigned long *frames, - unsigned int entries) -{ - struct domain *d = v->domain; - l1_pgentry_t *pl1e; - /* NB. There are 512 8-byte entries per GDT page. */ - unsigned int i, nr_pages = (entries + 511) / 512; - - if ( entries > FIRST_RESERVED_GDT_ENTRY ) - return -EINVAL; - - /* Check the pages in the new GDT. */ - for ( i = 0; i < nr_pages; i++ ) - { - struct page_info *page; - - page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC); - if ( !page ) - goto fail; - if ( !get_page_type(page, PGT_seg_desc_page) ) - { - put_page(page); - goto fail; - } - frames[i] = page_to_mfn(page); - } - - /* Tear down the old GDT. */ - destroy_gdt(v); - - /* Install the new GDT. */ - v->arch.pv_vcpu.gdt_ents = entries; - pl1e = gdt_ldt_ptes(d, v); - for ( i = 0; i < nr_pages; i++ ) - { - v->arch.pv_vcpu.gdt_frames[i] = frames[i]; - l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW)); - } - - return 0; - - fail: - while ( i-- > 0 ) - { - put_page_and_type(mfn_to_page(frames[i])); - } - return -EINVAL; + return GNTST_okay; } - -long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list, - unsigned int entries) +int replace_grant_host_mapping(uint64_t addr, unsigned long frame, + uint64_t new_addr, unsigned int flags) { - int nr_pages = (entries + 511) / 512; - unsigned long frames[16]; - struct vcpu *curr = current; - long ret; + if ( paging_mode_external(current->domain) ) + return replace_grant_p2m_mapping(addr, frame, new_addr, flags); - /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */ - if ( entries > FIRST_RESERVED_GDT_ENTRY ) - return -EINVAL; - - if ( copy_from_guest(frames, frame_list, nr_pages) ) - return -EFAULT; + return replace_grant_pv_mapping(addr, frame, new_addr, flags); +} + +int donate_page( + struct domain *d, struct page_info *page, unsigned int memflags) +{ + const struct domain *owner = dom_xen; - domain_lock(curr->domain); + spin_lock(&d->page_alloc_lock); - if ( (ret = set_gdt(curr, frames, entries)) == 0 ) - flush_tlb_local(); + if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) ) + goto fail; - domain_unlock(curr->domain); + if ( d->is_dying ) + goto fail; - return ret; -} + if ( page->count_info & ~(PGC_allocated | 1) ) + goto fail; + if ( !(memflags & MEMF_no_refcount) ) + { + if ( d->tot_pages >= d->max_pages ) + goto fail; + domain_adjust_tot_pages(d, 1); + } -long do_update_descriptor(u64 pa, u64 desc) -{ - struct domain *dom = current->domain; - unsigned long gmfn = pa >> PAGE_SHIFT; - unsigned long mfn; - unsigned int offset; - struct desc_struct *gdt_pent, d; - struct page_info *page; - long ret = -EINVAL; + page->count_info = PGC_allocated | 1; + page_set_owner(page, d); + page_list_add_tail(page,&d->page_list); - offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct); + spin_unlock(&d->page_alloc_lock); + return 0; - *(u64 *)&d = desc; + fail: + spin_unlock(&d->page_alloc_lock); + gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn + " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n", + page_to_mfn(page), d->domain_id, + owner ? owner->domain_id : DOMID_INVALID, + page->count_info, page->u.inuse.type_info); + return -1; +} - page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC); - if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) || - !page || - !check_descriptor(dom, &d) ) - { - if ( page ) - put_page(page); - return -EINVAL; - } - mfn = page_to_mfn(page); +int steal_page( + struct domain *d, struct page_info *page, unsigned int memflags) +{ + unsigned long x, y; + bool_t drop_dom_ref = 0; + const struct domain *owner = dom_xen; - /* Check if the given frame is in use in an unsafe context. */ - switch ( page->u.inuse.type_info & PGT_type_mask ) - { - case PGT_seg_desc_page: - if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) ) - goto out; - break; - default: - if ( unlikely(!get_page_type(page, PGT_writable_page)) ) - goto out; - break; - } + spin_lock(&d->page_alloc_lock); - paging_mark_dirty(dom, _mfn(mfn)); + if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) ) + goto fail; - /* All is good so make the update. */ - gdt_pent = map_domain_page(_mfn(mfn)); - write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d); - unmap_domain_page(gdt_pent); + /* + * We require there is just one reference (PGC_allocated). We temporarily + * drop this reference now so that we can safely swizzle the owner. + */ + y = page->count_info; + do { + x = y; + if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) ) + goto fail; + y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask); + } while ( y != x ); - put_page_type(page); + /* Swizzle the owner then reinstate the PGC_allocated reference. */ + page_set_owner(page, NULL); + y = page->count_info; + do { + x = y; + BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated); + } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x ); - ret = 0; /* success */ + /* Unlink from original owner. */ + if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) ) + drop_dom_ref = 1; + page_list_del(page, &d->page_list); - out: - put_page(page); + spin_unlock(&d->page_alloc_lock); + if ( unlikely(drop_dom_ref) ) + put_domain(d); + return 0; - return ret; + fail: + spin_unlock(&d->page_alloc_lock); + gdprintk(XENLOG_WARNING, "Bad steal mfn %" PRI_mfn + " from d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n", + page_to_mfn(page), d->domain_id, + owner ? owner->domain_id : DOMID_INVALID, + page->count_info, page->u.inuse.type_info); + return -1; } typedef struct e820entry e820entry_t; @@ -5181,466 +1573,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return 0; } - -/************************* - * Writable Pagetables - */ - -struct ptwr_emulate_ctxt { - struct x86_emulate_ctxt ctxt; - unsigned long cr2; - l1_pgentry_t pte; -}; - -static int ptwr_emulated_read( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - unsigned int rc = bytes; - unsigned long addr = offset; - - if ( !__addr_ok(addr) || - (rc = __copy_from_user(p_data, (void *)addr, bytes)) ) - { - x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */ - return X86EMUL_EXCEPTION; - } - - return X86EMUL_OKAY; -} - -static int ptwr_emulated_update( - unsigned long addr, - paddr_t old, - paddr_t val, - unsigned int bytes, - unsigned int do_cmpxchg, - struct ptwr_emulate_ctxt *ptwr_ctxt) -{ - unsigned long mfn; - unsigned long unaligned_addr = addr; - struct page_info *page; - l1_pgentry_t pte, ol1e, nl1e, *pl1e; - struct vcpu *v = current; - struct domain *d = v->domain; - int ret; - - /* Only allow naturally-aligned stores within the original %cr2 page. */ - if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) ) - { - gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n", - ptwr_ctxt->cr2, addr, bytes); - return X86EMUL_UNHANDLEABLE; - } - - /* Turn a sub-word access into a full-word access. */ - if ( bytes != sizeof(paddr_t) ) - { - paddr_t full; - unsigned int rc, offset = addr & (sizeof(paddr_t)-1); - - /* Align address; read full word. */ - addr &= ~(sizeof(paddr_t)-1); - if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 ) - { - x86_emul_pagefault(0, /* Read fault. */ - addr + sizeof(paddr_t) - rc, - &ptwr_ctxt->ctxt); - return X86EMUL_EXCEPTION; - } - /* Mask out bits provided by caller. */ - full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8)); - /* Shift the caller value and OR in the missing bits. */ - val &= (((paddr_t)1 << (bytes*8)) - 1); - val <<= (offset)*8; - val |= full; - /* Also fill in missing parts of the cmpxchg old value. */ - old &= (((paddr_t)1 << (bytes*8)) - 1); - old <<= (offset)*8; - old |= full; - } - - pte = ptwr_ctxt->pte; - mfn = l1e_get_pfn(pte); - page = mfn_to_page(mfn); - - /* We are looking only for read-only mappings of p.t. pages. */ - ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT); - ASSERT(mfn_valid(_mfn(mfn))); - ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table); - ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0); - ASSERT(page_get_owner(page) == d); - - /* Check the new PTE. */ - nl1e = l1e_from_intpte(val); - switch ( ret = get_page_from_l1e(nl1e, d, d) ) - { - default: - if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) && - !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) - { - /* - * If this is an upper-half write to a PAE PTE then we assume that - * the guest has simply got the two writes the wrong way round. We - * zap the PRESENT bit on the assumption that the bottom half will - * be written immediately after we return to the guest. - */ - gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %" - PRIpte"\n", l1e_get_intpte(nl1e)); - l1e_remove_flags(nl1e, _PAGE_PRESENT); - } - else - { - gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n"); - return X86EMUL_UNHANDLEABLE; - } - break; - case 0: - break; - case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: - ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); - l1e_flip_flags(nl1e, ret); - break; - } - - adjust_guest_l1e(nl1e, d); - - /* Checked successfully: do the update (write or cmpxchg). */ - pl1e = map_domain_page(_mfn(mfn)); - pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK)); - if ( do_cmpxchg ) - { - int okay; - intpte_t t = old; - ol1e = l1e_from_intpte(old); - - okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), - &t, l1e_get_intpte(nl1e), _mfn(mfn)); - okay = (okay && t == old); - - if ( !okay ) - { - unmap_domain_page(pl1e); - put_page_from_l1e(nl1e, d); - return X86EMUL_RETRY; - } - } - else - { - ol1e = *pl1e; - if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) ) - BUG(); - } - - trace_ptwr_emulation(addr, nl1e); - - unmap_domain_page(pl1e); - - /* Finally, drop the old PTE. */ - put_page_from_l1e(ol1e, d); - - return X86EMUL_OKAY; -} - -static int ptwr_emulated_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - paddr_t val = 0; - - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes ) - { - gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n", - offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - memcpy(&val, p_data, bytes); - - return ptwr_emulated_update( - offset, 0, val, bytes, 0, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); -} - -static int ptwr_emulated_cmpxchg( - enum x86_segment seg, - unsigned long offset, - void *p_old, - void *p_new, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - paddr_t old = 0, new = 0; - - if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) ) - { - gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n", - offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - memcpy(&old, p_old, bytes); - memcpy(&new, p_new, bytes); - - return ptwr_emulated_update( - offset, old, new, bytes, 1, - container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); -} - -static int pv_emul_is_mem_write(const struct x86_emulate_state *state, - struct x86_emulate_ctxt *ctxt) -{ - return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY - : X86EMUL_UNHANDLEABLE; -} - -static const struct x86_emulate_ops ptwr_emulate_ops = { - .read = ptwr_emulated_read, - .insn_fetch = ptwr_emulated_read, - .write = ptwr_emulated_write, - .cmpxchg = ptwr_emulated_cmpxchg, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -/* Write page fault handler: check if guest is trying to modify a PTE. */ -int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, - struct cpu_user_regs *regs) -{ - struct domain *d = v->domain; - struct page_info *page; - l1_pgentry_t pte; - struct ptwr_emulate_ctxt ptwr_ctxt = { - .ctxt = { - .regs = regs, - .vendor = d->arch.cpuid->x86_vendor, - .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, - .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, - .swint_emulate = x86_swint_emulate_none, - }, - }; - int rc; - - /* Attempt to read the PTE that maps the VA being accessed. */ - guest_get_eff_l1e(addr, &pte); - - /* We are looking only for read-only mappings of p.t. pages. */ - if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) || - rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) || - !get_page_from_pagenr(l1e_get_pfn(pte), d) ) - goto bail; - - page = l1e_get_page(pte); - if ( !page_lock(page) ) - { - put_page(page); - goto bail; - } - - if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) - { - page_unlock(page); - put_page(page); - goto bail; - } - - ptwr_ctxt.cr2 = addr; - ptwr_ctxt.pte = pte; - - rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops); - - page_unlock(page); - put_page(page); - - switch ( rc ) - { - case X86EMUL_EXCEPTION: - /* - * This emulation only covers writes to pagetables which are marked - * read-only by Xen. We tolerate #PF (in case a concurrent pagetable - * update has succeeded on a different vcpu). Anything else is an - * emulation bug, or a guest playing with the instruction stream under - * Xen's feet. - */ - if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && - ptwr_ctxt.ctxt.event.vector == TRAP_page_fault ) - pv_inject_event(&ptwr_ctxt.ctxt.event); - else - gdprintk(XENLOG_WARNING, - "Unexpected event (type %u, vector %#x) from emulation\n", - ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector); - - /* Fallthrough */ - case X86EMUL_OKAY: - - if ( ptwr_ctxt.ctxt.retire.singlestep ) - pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); - - /* Fallthrough */ - case X86EMUL_RETRY: - perfc_incr(ptwr_emulations); - return EXCRET_fault_fixed; - } - - bail: - return 0; -} - -/************************* - * fault handling for read-only MMIO pages - */ - -int mmio_ro_emulated_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data; - - /* Only allow naturally-aligned stores at the original %cr2 address. */ - if ( ((bytes | offset) & (bytes - 1)) || !bytes || - offset != mmio_ro_ctxt->cr2 ) - { - gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n", - mmio_ro_ctxt->cr2, offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - return X86EMUL_OKAY; -} - -static const struct x86_emulate_ops mmio_ro_emulate_ops = { - .read = x86emul_unhandleable_rw, - .insn_fetch = ptwr_emulated_read, - .write = mmio_ro_emulated_write, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -int mmcfg_intercept_write( - enum x86_segment seg, - unsigned long offset, - void *p_data, - unsigned int bytes, - struct x86_emulate_ctxt *ctxt) -{ - struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data; - - /* - * Only allow naturally-aligned stores no wider than 4 bytes to the - * original %cr2 address. - */ - if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes || - offset != mmio_ctxt->cr2 ) - { - gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n", - mmio_ctxt->cr2, offset, bytes); - return X86EMUL_UNHANDLEABLE; - } - - offset &= 0xfff; - if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf, - offset, bytes, p_data) >= 0 ) - pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf), - PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes, - *(uint32_t *)p_data); - - return X86EMUL_OKAY; -} - -static const struct x86_emulate_ops mmcfg_intercept_ops = { - .read = x86emul_unhandleable_rw, - .insn_fetch = ptwr_emulated_read, - .write = mmcfg_intercept_write, - .validate = pv_emul_is_mem_write, - .cpuid = pv_emul_cpuid, -}; - -/* Check if guest is trying to modify a r/o MMIO page. */ -int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, - struct cpu_user_regs *regs) -{ - l1_pgentry_t pte; - unsigned long mfn; - unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG; - struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr }; - struct x86_emulate_ctxt ctxt = { - .regs = regs, - .vendor = v->domain->arch.cpuid->x86_vendor, - .addr_size = addr_size, - .sp_size = addr_size, - .swint_emulate = x86_swint_emulate_none, - .data = &mmio_ro_ctxt - }; - int rc; - - /* Attempt to read the PTE that maps the VA being accessed. */ - guest_get_eff_l1e(addr, &pte); - - /* We are looking only for read-only mappings of MMIO pages. */ - if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) - return 0; - - mfn = l1e_get_pfn(pte); - if ( mfn_valid(_mfn(mfn)) ) - { - struct page_info *page = mfn_to_page(mfn); - struct domain *owner = page_get_owner_and_reference(page); - - if ( owner ) - put_page(page); - if ( owner != dom_io ) - return 0; - } - - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) - return 0; - - if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) ) - rc = x86_emulate(&ctxt, &mmcfg_intercept_ops); - else - rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops); - - switch ( rc ) - { - case X86EMUL_EXCEPTION: - /* - * This emulation only covers writes to MMCFG space or read-only MFNs. - * We tolerate #PF (from hitting an adjacent page or a successful - * concurrent pagetable update). Anything else is an emulation bug, - * or a guest playing with the instruction stream under Xen's feet. - */ - if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && - ctxt.event.vector == TRAP_page_fault ) - pv_inject_event(&ctxt.event); - else - gdprintk(XENLOG_WARNING, - "Unexpected event (type %u, vector %#x) from emulation\n", - ctxt.event.type, ctxt.event.vector); - - /* Fallthrough */ - case X86EMUL_OKAY: - - if ( ctxt.retire.singlestep ) - pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); - - /* Fallthrough */ - case X86EMUL_RETRY: - perfc_incr(ptwr_emulations); - return EXCRET_fault_fixed; - } - - return 0; -} - void *alloc_xen_pagetable(void) { if ( system_state != SYS_STATE_early_boot ) diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile index ea94599438..665be5536c 100644 --- a/xen/arch/x86/pv/Makefile +++ b/xen/arch/x86/pv/Makefile @@ -1,2 +1,3 @@ obj-y += hypercall.o obj-bin-y += dom0_build.init.o +obj-y += mm.o diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c new file mode 100644 index 0000000000..b5277b5d28 --- /dev/null +++ b/xen/arch/x86/pv/mm.c @@ -0,0 +1,4118 @@ +/****************************************************************************** + * arch/x86/pv/mm.c + * + * Copyright (c) 2002-2005 K A Fraser + * Copyright (c) 2004 Christian Limpach + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; If not, see <http://www.gnu.org/licenses/>. + */ + +/* + * A description of the x86 page table API: + * + * Domains trap to do_mmu_update with a list of update requests. + * This is a list of (ptr, val) pairs, where the requested operation + * is *ptr = val. + * + * Reference counting of pages: + * ---------------------------- + * Each page has two refcounts: tot_count and type_count. + * + * TOT_COUNT is the obvious reference count. It counts all uses of a + * physical page frame by a domain, including uses as a page directory, + * a page table, or simple mappings via a PTE. This count prevents a + * domain from releasing a frame back to the free pool when it still holds + * a reference to it. + * + * TYPE_COUNT is more subtle. A frame can be put to one of three + * mutually-exclusive uses: it might be used as a page directory, or a + * page table, or it may be mapped writable by the domain [of course, a + * frame may not be used in any of these three ways!]. + * So, type_count is a count of the number of times a frame is being + * referred to in its current incarnation. Therefore, a page can only + * change its type when its type count is zero. + * + * Pinning the page type: + * ---------------------- + * The type of a page can be pinned/unpinned with the commands + * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is, + * pinning is not reference counted, so it can't be nested). + * This is useful to prevent a page's type count falling to zero, at which + * point safety checks would need to be carried out next time the count + * is increased again. + * + * A further note on writable page mappings: + * ----------------------------------------- + * For simplicity, the count of writable mappings for a page may not + * correspond to reality. The 'writable count' is incremented for every + * PTE which maps the page with the _PAGE_RW flag set. However, for + * write access to be possible the page directory entry must also have + * its _PAGE_RW bit set. We do not check this as it complicates the + * reference counting considerably [consider the case of multiple + * directory entries referencing a single page table, some with the RW + * bit set, others not -- it starts getting a bit messy]. + * In normal use, this simplification shouldn't be a problem. + * However, the logic can be added if required. + * + * One more note on read-only page mappings: + * ----------------------------------------- + * We want domains to be able to map pages for read-only access. The + * main reason is that page tables and directories should be readable + * by a domain, but it would not be safe for them to be writable. + * However, domains have free access to rings 1 & 2 of the Intel + * privilege model. In terms of page protection, these are considered + * to be part of 'supervisor mode'. The WP bit in CR0 controls whether + * read-only restrictions are respected in supervisor mode -- if the + * bit is clear then any mapped page is writable. + * + * We get round this by always setting the WP bit and disallowing + * updates to it. This is very unlikely to cause a problem for guest + * OS's, which will generally use the WP bit to simplify copy-on-write + * implementation (in that case, OS wants a fault when it writes to + * an application-supplied buffer). + */ + +#include <xen/event.h> +#include <xen/guest_access.h> +#include <xen/hypercall.h> +#include <xen/iocap.h> +#include <xen/mm.h> +#include <xen/sched.h> +#include <xen/trace.h> +#include <xsm/xsm.h> + +#include <asm/ldt.h> +#include <asm/p2m.h> +#include <asm/paging.h> +#include <asm/shadow.h> +#include <asm/x86_emulate.h> + +extern s8 __read_mostly opt_mmio_relax; + +extern uint32_t base_disallow_mask; +/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */ +#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL) + +#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \ + ? base_disallow_mask & ~_PAGE_PSE \ + : base_disallow_mask) + +#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \ + base_disallow_mask : 0xFFFFF198U) + +#define L4_DISALLOW_MASK (base_disallow_mask) + +#define l1_disallow_mask(d) \ + ((d != dom_io) && \ + (rangeset_is_empty((d)->iomem_caps) && \ + rangeset_is_empty((d)->arch.ioport_caps) && \ + !has_arch_pdevs(d) && \ + is_pv_domain(d)) ? \ + L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS)) + +/* Get a mapping of a PV guest's l1e for this virtual address. */ +static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn) +{ + l2_pgentry_t l2e; + + ASSERT(!paging_mode_translate(current->domain)); + ASSERT(!paging_mode_external(current->domain)); + + if ( unlikely(!__addr_ok(addr)) ) + return NULL; + + /* Find this l1e and its enclosing l1mfn in the linear map. */ + if ( __copy_from_user(&l2e, + &__linear_l2_table[l2_linear_offset(addr)], + sizeof(l2_pgentry_t)) ) + return NULL; + + /* Check flags that it will be safe to read the l1e. */ + if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT ) + return NULL; + + *gl1mfn = l2e_get_pfn(l2e); + + return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) + + l1_table_offset(addr); +} + +/* Pull down the mapping we got from guest_map_l1e(). */ +static inline void guest_unmap_l1e(void *p) +{ + unmap_domain_page(p); +} + +/* Read a PV guest's l1e that maps this virtual address. */ +static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e) +{ + ASSERT(!paging_mode_translate(current->domain)); + ASSERT(!paging_mode_external(current->domain)); + + if ( unlikely(!__addr_ok(addr)) || + __copy_from_user(eff_l1e, + &__linear_l1_table[l1_linear_offset(addr)], + sizeof(l1_pgentry_t)) ) + *eff_l1e = l1e_empty(); +} + +/* + * Read the guest's l1e that maps this address, from the kernel-mode + * page tables. + */ +static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr, + void *eff_l1e) +{ + bool_t user_mode = !(v->arch.flags & TF_kernel_mode); +#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v) + + TOGGLE_MODE(); + guest_get_eff_l1e(addr, eff_l1e); + TOGGLE_MODE(); +} + +const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE) + zero_page[PAGE_SIZE]; + +static void invalidate_shadow_ldt(struct vcpu *v, int flush) +{ + l1_pgentry_t *pl1e; + unsigned int i; + struct page_info *page; + + BUG_ON(unlikely(in_irq())); + + spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); + + if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 ) + goto out; + + v->arch.pv_vcpu.shadow_ldt_mapcnt = 0; + pl1e = gdt_ldt_ptes(v->domain, v); + + for ( i = 16; i < 32; i++ ) + { + if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) ) + continue; + page = l1e_get_page(pl1e[i]); + l1e_write(&pl1e[i], l1e_empty()); + ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page); + ASSERT_PAGE_IS_DOMAIN(page, v->domain); + put_page_and_type(page); + } + + /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */ + if ( flush ) + flush_tlb_mask(v->vcpu_dirty_cpumask); + + out: + spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); +} + + +static int alloc_segdesc_page(struct page_info *page) +{ + const struct domain *owner = page_get_owner(page); + struct desc_struct *descs = __map_domain_page(page); + unsigned i; + + for ( i = 0; i < 512; i++ ) + if ( unlikely(!check_descriptor(owner, &descs[i])) ) + break; + + unmap_domain_page(descs); + + return i == 512 ? 0 : -EINVAL; +} + + +/* Map shadow page at offset @off. */ +int map_ldt_shadow_page(unsigned int off) +{ + struct vcpu *v = current; + struct domain *d = v->domain; + unsigned long gmfn; + struct page_info *page; + l1_pgentry_t l1e, nl1e; + unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT); + int okay; + + BUG_ON(unlikely(in_irq())); + + if ( is_pv_32bit_domain(d) ) + gva = (u32)gva; + guest_get_eff_kern_l1e(v, gva, &l1e); + if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) ) + return 0; + + gmfn = l1e_get_pfn(l1e); + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + return 0; + + okay = get_page_type(page, PGT_seg_desc_page); + if ( unlikely(!okay) ) + { + put_page(page); + return 0; + } + + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW); + + spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock); + l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e); + v->arch.pv_vcpu.shadow_ldt_mapcnt++; + spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock); + + return 1; +} + + +/* + * We allow root tables to map each other (a.k.a. linear page tables). It + * needs some special care with reference counts and access permissions: + * 1. The mapping entry must be read-only, or the guest may get write access + * to its own PTEs. + * 2. We must only bump the reference counts for an *already validated* + * L2 table, or we can end up in a deadlock in get_page_type() by waiting + * on a validation that is required to complete that validation. + * 3. We only need to increment the reference counts for the mapped page + * frame if it is mapped by a different root table. This is sufficient and + * also necessary to allow validation of a root table mapping itself. + */ +#define define_get_linear_pagetable(level) \ +static int \ +get_##level##_linear_pagetable( \ + level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d) \ +{ \ + unsigned long x, y; \ + struct page_info *page; \ + unsigned long pfn; \ + \ + if ( (level##e_get_flags(pde) & _PAGE_RW) ) \ + { \ + gdprintk(XENLOG_WARNING, \ + "Attempt to create linear p.t. with write perms\n"); \ + return 0; \ + } \ + \ + if ( (pfn = level##e_get_pfn(pde)) != pde_pfn ) \ + { \ + /* Make sure the mapped frame belongs to the correct domain. */ \ + if ( unlikely(!get_page_from_pagenr(pfn, d)) ) \ + return 0; \ + \ + /* \ + * Ensure that the mapped frame is an already-validated page table. \ + * If so, atomically increment the count (checking for overflow). \ + */ \ + page = mfn_to_page(pfn); \ + y = page->u.inuse.type_info; \ + do { \ + x = y; \ + if ( unlikely((x & PGT_count_mask) == PGT_count_mask) || \ + unlikely((x & (PGT_type_mask|PGT_validated)) != \ + (PGT_##level##_page_table|PGT_validated)) ) \ + { \ + put_page(page); \ + return 0; \ + } \ + } \ + while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x ); \ + } \ + \ + return 1; \ +} + +#ifndef NDEBUG +struct mmio_emul_range_ctxt { + const struct domain *d; + unsigned long mfn; +}; + +static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg) +{ + const struct mmio_emul_range_ctxt *ctxt = arg; + + if ( ctxt->mfn > e ) + return 0; + + if ( ctxt->mfn >= s ) + { + static DEFINE_SPINLOCK(last_lock); + static const struct domain *last_d; + static unsigned long last_s = ~0UL, last_e; + bool_t print = 0; + + spin_lock(&last_lock); + if ( last_d != ctxt->d || last_s != s || last_e != e ) + { + last_d = ctxt->d; + last_s = s; + last_e = e; + print = 1; + } + spin_unlock(&last_lock); + + if ( print ) + printk(XENLOG_G_INFO + "d%d: Forcing write emulation on MFNs %lx-%lx\n", + ctxt->d->domain_id, s, e); + } + + return 1; +} +#endif + +int +get_page_from_l1e( + l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner) +{ + unsigned long mfn = l1e_get_pfn(l1e); + struct page_info *page = mfn_to_page(mfn); + uint32_t l1f = l1e_get_flags(l1e); + struct vcpu *curr = current; + struct domain *real_pg_owner; + bool_t write; + + if ( !(l1f & _PAGE_PRESENT) ) + return 0; + + if ( unlikely(l1f & l1_disallow_mask(l1e_owner)) ) + { + gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n", + l1f & l1_disallow_mask(l1e_owner)); + return -EINVAL; + } + + if ( !mfn_valid(_mfn(mfn)) || + (real_pg_owner = page_get_owner_and_reference(page)) == dom_io ) + { + int flip = 0; + + /* Only needed the reference to confirm dom_io ownership. */ + if ( mfn_valid(_mfn(mfn)) ) + put_page(page); + + /* DOMID_IO reverts to caller for privilege checks. */ + if ( pg_owner == dom_io ) + pg_owner = curr->domain; + + if ( !iomem_access_permitted(pg_owner, mfn, mfn) ) + { + if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */ + { + gdprintk(XENLOG_WARNING, + "d%d non-privileged attempt to map MMIO space %"PRI_mfn"\n", + pg_owner->domain_id, mfn); + return -EPERM; + } + return -EINVAL; + } + + if ( pg_owner != l1e_owner && + !iomem_access_permitted(l1e_owner, mfn, mfn) ) + { + if ( mfn != (PADDR_MASK >> PAGE_SHIFT) ) /* INVALID_MFN? */ + { + gdprintk(XENLOG_WARNING, + "d%d attempted to map MMIO space %"PRI_mfn" in d%d to d%d\n", + curr->domain->domain_id, mfn, pg_owner->domain_id, + l1e_owner->domain_id); + return -EPERM; + } + return -EINVAL; + } + + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + { + /* MMIO pages must not be mapped cachable unless requested so. */ + switch ( opt_mmio_relax ) + { + case 0: + break; + case 1: + if ( !is_hardware_domain(l1e_owner) ) + break; + /* fallthrough */ + case -1: + return 0; + default: + ASSERT_UNREACHABLE(); + } + } + else if ( l1f & _PAGE_RW ) + { +#ifndef NDEBUG + const unsigned long *ro_map; + unsigned int seg, bdf; + + if ( !pci_mmcfg_decode(mfn, &seg, &bdf) || + ((ro_map = pci_get_ro_map(seg)) != NULL && + test_bit(bdf, ro_map)) ) + printk(XENLOG_G_WARNING + "d%d: Forcing read-only access to MFN %lx\n", + l1e_owner->domain_id, mfn); + else + rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL, + print_mmio_emul_range, + &(struct mmio_emul_range_ctxt){ + .d = l1e_owner, + .mfn = mfn }); +#endif + flip = _PAGE_RW; + } + + switch ( l1f & PAGE_CACHE_ATTRS ) + { + case 0: /* WB */ + flip |= _PAGE_PWT | _PAGE_PCD; + break; + case _PAGE_PWT: /* WT */ + case _PAGE_PWT | _PAGE_PAT: /* WP */ + flip |= _PAGE_PCD | (l1f & _PAGE_PAT); + break; + } + + return flip; + } + + if ( unlikely( (real_pg_owner != pg_owner) && + (real_pg_owner != dom_cow) ) ) + { + /* + * Let privileged domains transfer the right to map their target + * domain's pages. This is used to allow stub-domain pvfb export to + * dom0, until pvfb supports granted mappings. At that time this + * minor hack can go away. + */ + if ( (real_pg_owner == NULL) || (pg_owner == l1e_owner) || + xsm_priv_mapping(XSM_TARGET, pg_owner, real_pg_owner) ) + { + gdprintk(XENLOG_WARNING, + "pg_owner d%d l1e_owner d%d, but real_pg_owner d%d\n", + pg_owner->domain_id, l1e_owner->domain_id, + real_pg_owner ? real_pg_owner->domain_id : -1); + goto could_not_pin; + } + pg_owner = real_pg_owner; + } + + /* Extra paranoid check for shared memory. Writable mappings + * disallowed (unshare first!) */ + if ( (l1f & _PAGE_RW) && (real_pg_owner == dom_cow) ) + goto could_not_pin; + + /* Foreign mappings into guests in shadow external mode don't + * contribute to writeable mapping refcounts. (This allows the + * qemu-dm helper process in dom0 to map the domain's memory without + * messing up the count of "real" writable mappings.) */ + write = (l1f & _PAGE_RW) && + ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)); + if ( write && !get_page_type(page, PGT_writable_page) ) + { + gdprintk(XENLOG_WARNING, "Could not get page type PGT_writable_page\n"); + goto could_not_pin; + } + + if ( pte_flags_to_cacheattr(l1f) != + ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) ) + { + unsigned long x, nx, y = page->count_info; + unsigned long cacheattr = pte_flags_to_cacheattr(l1f); + int err; + + if ( is_xen_heap_page(page) ) + { + if ( write ) + put_page_type(page); + put_page(page); + gdprintk(XENLOG_WARNING, + "Attempt to change cache attributes of Xen heap page\n"); + return -EACCES; + } + + do { + x = y; + nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base); + } while ( (y = cmpxchg(&page->count_info, x, nx)) != x ); + + err = update_xen_mappings(mfn, cacheattr); + if ( unlikely(err) ) + { + cacheattr = y & PGC_cacheattr_mask; + do { + x = y; + nx = (x & ~PGC_cacheattr_mask) | cacheattr; + } while ( (y = cmpxchg(&page->count_info, x, nx)) != x ); + + if ( write ) + put_page_type(page); + put_page(page); + + gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn + " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n", + mfn, get_gpfn_from_mfn(mfn), + l1e_get_intpte(l1e), l1e_owner->domain_id); + return err; + } + } + + return 0; + + could_not_pin: + gdprintk(XENLOG_WARNING, "Error getting mfn %" PRI_mfn " (pfn %" PRI_pfn + ") from L1 entry %" PRIpte " for l1e_owner d%d, pg_owner d%d", + mfn, get_gpfn_from_mfn(mfn), + l1e_get_intpte(l1e), l1e_owner->domain_id, pg_owner->domain_id); + if ( real_pg_owner != NULL ) + put_page(page); + return -EBUSY; +} + + +/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */ +define_get_linear_pagetable(l2); +static int +get_page_from_l2e( + l2_pgentry_t l2e, unsigned long pfn, struct domain *d) +{ + unsigned long mfn = l2e_get_pfn(l2e); + int rc; + + if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) ) + return 1; + + if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) ) + { + gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n", + l2e_get_flags(l2e) & L2_DISALLOW_MASK); + return -EINVAL; + } + + if ( !(l2e_get_flags(l2e) & _PAGE_PSE) ) + { + rc = get_page_and_type_from_pagenr(mfn, PGT_l1_page_table, d, 0, 0); + if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) ) + rc = 0; + return rc; + } + + if ( !opt_allow_superpage ) + { + gdprintk(XENLOG_WARNING, "PV superpages disabled in hypervisor\n"); + return -EINVAL; + } + + if ( mfn & (L1_PAGETABLE_ENTRIES-1) ) + { + gdprintk(XENLOG_WARNING, + "Unaligned superpage map attempt mfn %" PRI_mfn "\n", mfn); + return -EINVAL; + } + + return get_superpage(mfn, d); +} + + +define_get_linear_pagetable(l3); +static int +get_page_from_l3e( + l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial) +{ + int rc; + + if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) ) + return 1; + + if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) ) + { + gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n", + l3e_get_flags(l3e) & l3_disallow_mask(d)); + return -EINVAL; + } + + rc = get_page_and_type_from_pagenr( + l3e_get_pfn(l3e), PGT_l2_page_table, d, partial, 1); + if ( unlikely(rc == -EINVAL) && + !is_pv_32bit_domain(d) && + get_l3_linear_pagetable(l3e, pfn, d) ) + rc = 0; + + return rc; +} + +define_get_linear_pagetable(l4); +static int +get_page_from_l4e( + l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial) +{ + int rc; + + if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) ) + return 1; + + if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) ) + { + gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n", + l4e_get_flags(l4e) & L4_DISALLOW_MASK); + return -EINVAL; + } + + rc = get_page_and_type_from_pagenr( + l4e_get_pfn(l4e), PGT_l3_page_table, d, partial, 1); + if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) ) + rc = 0; + + return rc; +} + +#define adjust_guest_l1e(pl1e, d) \ + do { \ + if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) && \ + likely(!is_pv_32bit_domain(d)) ) \ + { \ + /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */ \ + if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \ + == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) ) \ + gdprintk(XENLOG_WARNING, \ + "Global bit is set to kernel page %lx\n", \ + l1e_get_pfn((pl1e))); \ + if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) ) \ + l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER)); \ + if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) ) \ + l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER)); \ + } \ + } while ( 0 ) + +#define adjust_guest_l2e(pl2e, d) \ + do { \ + if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) && \ + likely(!is_pv_32bit_domain(d)) ) \ + l2e_add_flags((pl2e), _PAGE_USER); \ + } while ( 0 ) + +#define adjust_guest_l3e(pl3e, d) \ + do { \ + if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) ) \ + l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ? \ + _PAGE_USER : \ + _PAGE_USER|_PAGE_RW); \ + } while ( 0 ) + +#define adjust_guest_l4e(pl4e, d) \ + do { \ + if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) && \ + likely(!is_pv_32bit_domain(d)) ) \ + l4e_add_flags((pl4e), _PAGE_USER); \ + } while ( 0 ) + +#define unadjust_guest_l3e(pl3e, d) \ + do { \ + if ( unlikely(is_pv_32bit_domain(d)) && \ + likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) ) \ + l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED); \ + } while ( 0 ) + +void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner) +{ + unsigned long pfn = l1e_get_pfn(l1e); + struct page_info *page; + struct domain *pg_owner; + struct vcpu *v; + + if ( !(l1e_get_flags(l1e) & _PAGE_PRESENT) || is_iomem_page(_mfn(pfn)) ) + return; + + page = mfn_to_page(pfn); + pg_owner = page_get_owner(page); + + /* + * Check if this is a mapping that was established via a grant reference. + * If it was then we should not be here: we require that such mappings are + * explicitly destroyed via the grant-table interface. + * + * The upshot of this is that the guest can end up with active grants that + * it cannot destroy (because it no longer has a PTE to present to the + * grant-table interface). This can lead to subtle hard-to-catch bugs, + * hence a special grant PTE flag can be enabled to catch the bug early. + * + * (Note that the undestroyable active grants are not a security hole in + * Xen. All active grants can safely be cleaned up when the domain dies.) + */ + if ( (l1e_get_flags(l1e) & _PAGE_GNTTAB) && + !l1e_owner->is_shutting_down && !l1e_owner->is_dying ) + { + gdprintk(XENLOG_WARNING, + "Attempt to implicitly unmap a granted PTE %" PRIpte "\n", + l1e_get_intpte(l1e)); + domain_crash(l1e_owner); + } + + /* Remember we didn't take a type-count of foreign writable mappings + * to paging-external domains */ + if ( (l1e_get_flags(l1e) & _PAGE_RW) && + ((l1e_owner == pg_owner) || !paging_mode_external(pg_owner)) ) + { + put_page_and_type(page); + } + else + { + /* We expect this is rare so we blow the entire shadow LDT. */ + if ( unlikely(((page->u.inuse.type_info & PGT_type_mask) == + PGT_seg_desc_page)) && + unlikely(((page->u.inuse.type_info & PGT_count_mask) != 0)) && + (l1e_owner == pg_owner) ) + { + for_each_vcpu ( pg_owner, v ) + invalidate_shadow_ldt(v, 1); + } + put_page(page); + } +} + +static void put_superpage(unsigned long mfn); +/* + * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. + * Note also that this automatically deals correctly with linear p.t.'s. + */ +static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn) +{ + if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) ) + return 1; + + if ( l2e_get_flags(l2e) & _PAGE_PSE ) + put_superpage(l2e_get_pfn(l2e)); + else + put_page_and_type(l2e_get_page(l2e)); + + return 0; +} + +static void put_data_page( + struct page_info *page, int writeable) +{ + if ( writeable ) + put_page_and_type(page); + else + put_page(page); +} + +extern int __put_page_type(struct page_info *, int preemptible); + +static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, + int partial, bool_t defer) +{ + struct page_info *pg; + + if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) ) + return 1; + + if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) ) + { + unsigned long mfn = l3e_get_pfn(l3e); + int writeable = l3e_get_flags(l3e) & _PAGE_RW; + + ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1))); + do { + put_data_page(mfn_to_page(mfn), writeable); + } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) ); + + return 0; + } + + pg = l3e_get_page(l3e); + + if ( unlikely(partial > 0) ) + { + ASSERT(!defer); + return __put_page_type(pg, 1); + } + + if ( defer ) + { + current->arch.old_guest_table = pg; + return 0; + } + + return put_page_and_type_preemptible(pg); +} + +static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, + int partial, bool_t defer) +{ + if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && + (l4e_get_pfn(l4e) != pfn) ) + { + struct page_info *pg = l4e_get_page(l4e); + + if ( unlikely(partial > 0) ) + { + ASSERT(!defer); + return __put_page_type(pg, 1); + } + + if ( defer ) + { + current->arch.old_guest_table = pg; + return 0; + } + + return put_page_and_type_preemptible(pg); + } + return 1; +} + +static int alloc_l1_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l1_pgentry_t *pl1e; + unsigned int i; + int ret = 0; + + pl1e = map_domain_page(_mfn(pfn)); + + for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) + { + if ( is_guest_l1_slot(i) ) + switch ( ret = get_page_from_l1e(pl1e[i], d, d) ) + { + default: + goto fail; + case 0: + break; + case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: + ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); + l1e_flip_flags(pl1e[i], ret); + break; + } + + adjust_guest_l1e(pl1e[i], d); + } + + unmap_domain_page(pl1e); + return 0; + + fail: + gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i); + while ( i-- > 0 ) + if ( is_guest_l1_slot(i) ) + put_page_from_l1e(pl1e[i], d); + + unmap_domain_page(pl1e); + return ret; +} + +static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e) +{ + struct page_info *page; + l3_pgentry_t l3e3; + + if ( !is_pv_32bit_domain(d) ) + return 1; + + pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK); + + /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */ + l3e3 = pl3e[3]; + if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) ) + { + gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n"); + return 0; + } + + /* + * The Xen-private mappings include linear mappings. The L2 thus cannot + * be shared by multiple L3 tables. The test here is adequate because: + * 1. Cannot appear in slots != 3 because get_page_type() checks the + * PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3 + * 2. Cannot appear in another page table's L3: + * a. alloc_l3_table() calls this function and this check will fail + * b. mod_l3_entry() disallows updates to slot 3 in an existing table + */ + page = l3e_get_page(l3e3); + BUG_ON(page->u.inuse.type_info & PGT_pinned); + BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0); + BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2)); + if ( (page->u.inuse.type_info & PGT_count_mask) != 1 ) + { + gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n"); + return 0; + } + + return 1; +} + +static int alloc_l2_table(struct page_info *page, unsigned long type, + int preemptible) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l2_pgentry_t *pl2e; + unsigned int i; + int rc = 0; + + pl2e = map_domain_page(_mfn(pfn)); + + for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ ) + { + if ( preemptible && i > page->nr_validated_ptes + && hypercall_preempt_check() ) + { + page->nr_validated_ptes = i; + rc = -ERESTART; + break; + } + + if ( !is_guest_l2_slot(d, type, i) || + (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 ) + continue; + + if ( rc < 0 ) + { + gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i); + while ( i-- > 0 ) + if ( is_guest_l2_slot(d, type, i) ) + put_page_from_l2e(pl2e[i], pfn); + break; + } + + adjust_guest_l2e(pl2e[i], d); + } + + if ( rc >= 0 && (type & PGT_pae_xen_l2) ) + { + /* Xen private mappings. */ + memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)], + &compat_idle_pg_table_l2[ + l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)], + COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e)); + } + + unmap_domain_page(pl2e); + return rc > 0 ? 0 : rc; +} + +static int alloc_l3_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l3_pgentry_t *pl3e; + unsigned int i; + int rc = 0, partial = page->partial_pte; + + pl3e = map_domain_page(_mfn(pfn)); + + /* + * PAE guests allocate full pages, but aren't required to initialize + * more than the first four entries; when running in compatibility + * mode, however, the full page is visible to the MMU, and hence all + * 512 entries must be valid/verified, which is most easily achieved + * by clearing them out. + */ + if ( is_pv_32bit_domain(d) ) + memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e)); + + for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES; + i++, partial = 0 ) + { + if ( is_pv_32bit_domain(d) && (i == 3) ) + { + if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) || + (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) ) + rc = -EINVAL; + else + rc = get_page_and_type_from_pagenr(l3e_get_pfn(pl3e[i]), + PGT_l2_page_table | + PGT_pae_xen_l2, + d, partial, 1); + } + else if ( !is_guest_l3_slot(i) || + (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 ) + continue; + + if ( rc == -ERESTART ) + { + page->nr_validated_ptes = i; + page->partial_pte = partial ?: 1; + } + else if ( rc == -EINTR && i ) + { + page->nr_validated_ptes = i; + page->partial_pte = 0; + rc = -ERESTART; + } + if ( rc < 0 ) + break; + + adjust_guest_l3e(pl3e[i], d); + } + + if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) ) + rc = -EINVAL; + if ( rc < 0 && rc != -ERESTART && rc != -EINTR ) + { + gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i); + if ( i ) + { + page->nr_validated_ptes = i; + page->partial_pte = 0; + current->arch.old_guest_table = page; + } + while ( i-- > 0 ) + { + if ( !is_guest_l3_slot(i) ) + continue; + unadjust_guest_l3e(pl3e[i], d); + } + } + + unmap_domain_page(pl3e); + return rc > 0 ? 0 : rc; +} + +#ifndef NDEBUG +static unsigned int __read_mostly root_pgt_pv_xen_slots + = ROOT_PAGETABLE_PV_XEN_SLOTS; +static l4_pgentry_t __read_mostly split_l4e; +#else +#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS +#endif + +void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d, + bool_t zap_ro_mpt) +{ + /* Xen private mappings. */ + memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT], + &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT], + root_pgt_pv_xen_slots * sizeof(l4_pgentry_t)); +#ifndef NDEBUG + if ( l4e_get_intpte(split_l4e) ) + l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] = + split_l4e; +#endif + l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] = + l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR); + l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] = + l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR); + if ( zap_ro_mpt || is_pv_32bit_domain(d) || paging_mode_refcounts(d) ) + l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); +} + +static int alloc_l4_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn)); + unsigned int i; + int rc = 0, partial = page->partial_pte; + + for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES; + i++, partial = 0 ) + { + if ( !is_guest_l4_slot(d, i) || + (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 ) + continue; + + if ( rc == -ERESTART ) + { + page->nr_validated_ptes = i; + page->partial_pte = partial ?: 1; + } + else if ( rc < 0 ) + { + if ( rc != -EINTR ) + gdprintk(XENLOG_WARNING, + "Failure in alloc_l4_table: slot %#x\n", i); + if ( i ) + { + page->nr_validated_ptes = i; + page->partial_pte = 0; + if ( rc == -EINTR ) + rc = -ERESTART; + else + { + if ( current->arch.old_guest_table ) + page->nr_validated_ptes++; + current->arch.old_guest_table = page; + } + } + } + if ( rc < 0 ) + { + unmap_domain_page(pl4e); + return rc; + } + + adjust_guest_l4e(pl4e[i], d); + } + + if ( rc >= 0 ) + { + init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict)); + atomic_inc(&d->arch.pv_domain.nr_l4_pages); + rc = 0; + } + unmap_domain_page(pl4e); + + return rc; +} + +static void free_l1_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l1_pgentry_t *pl1e; + unsigned int i; + + pl1e = map_domain_page(_mfn(pfn)); + + for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ ) + if ( is_guest_l1_slot(i) ) + put_page_from_l1e(pl1e[i], d); + + unmap_domain_page(pl1e); +} + +static int free_l2_table(struct page_info *page, int preemptible) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l2_pgentry_t *pl2e; + unsigned int i = page->nr_validated_ptes - 1; + int err = 0; + + pl2e = map_domain_page(_mfn(pfn)); + + ASSERT(page->nr_validated_ptes); + do { + if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) && + put_page_from_l2e(pl2e[i], pfn) == 0 && + preemptible && i && hypercall_preempt_check() ) + { + page->nr_validated_ptes = i; + err = -ERESTART; + } + } while ( !err && i-- ); + + unmap_domain_page(pl2e); + + if ( !err ) + page->u.inuse.type_info &= ~PGT_pae_xen_l2; + + return err; +} + +static int free_l3_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l3_pgentry_t *pl3e; + int rc = 0, partial = page->partial_pte; + unsigned int i = page->nr_validated_ptes - !partial; + + pl3e = map_domain_page(_mfn(pfn)); + + do { + if ( is_guest_l3_slot(i) ) + { + rc = put_page_from_l3e(pl3e[i], pfn, partial, 0); + if ( rc < 0 ) + break; + partial = 0; + if ( rc > 0 ) + continue; + unadjust_guest_l3e(pl3e[i], d); + } + } while ( i-- ); + + unmap_domain_page(pl3e); + + if ( rc == -ERESTART ) + { + page->nr_validated_ptes = i; + page->partial_pte = partial ?: -1; + } + else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 ) + { + page->nr_validated_ptes = i + 1; + page->partial_pte = 0; + rc = -ERESTART; + } + return rc > 0 ? 0 : rc; +} + +static int free_l4_table(struct page_info *page) +{ + struct domain *d = page_get_owner(page); + unsigned long pfn = page_to_mfn(page); + l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn)); + int rc = 0, partial = page->partial_pte; + unsigned int i = page->nr_validated_ptes - !partial; + + do { + if ( is_guest_l4_slot(d, i) ) + rc = put_page_from_l4e(pl4e[i], pfn, partial, 0); + if ( rc < 0 ) + break; + partial = 0; + } while ( i-- ); + + if ( rc == -ERESTART ) + { + page->nr_validated_ptes = i; + page->partial_pte = partial ?: -1; + } + else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 ) + { + page->nr_validated_ptes = i + 1; + page->partial_pte = 0; + rc = -ERESTART; + } + + unmap_domain_page(pl4e); + + if ( rc >= 0 ) + { + atomic_dec(&d->arch.pv_domain.nr_l4_pages); + rc = 0; + } + + return rc; +} + + +/* How to write an entry to the guest pagetables. + * Returns 0 for failure (pointer not valid), 1 for success. */ +static inline int update_intpte(intpte_t *p, + intpte_t old, + intpte_t new, + unsigned long mfn, + struct vcpu *v, + int preserve_ad) +{ + int rv = 1; +#ifndef PTE_UPDATE_WITH_CMPXCHG + if ( !preserve_ad ) + { + rv = paging_write_guest_entry(v, p, new, _mfn(mfn)); + } + else +#endif + { + intpte_t t = old; + for ( ; ; ) + { + intpte_t _new = new; + if ( preserve_ad ) + _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY); + + rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn)); + if ( unlikely(rv == 0) ) + { + gdprintk(XENLOG_WARNING, + "Failed to update %" PRIpte " -> %" PRIpte + ": saw %" PRIpte "\n", old, _new, t); + break; + } + + if ( t == old ) + break; + + /* Allowed to change in Accessed/Dirty flags only. */ + BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY)); + + old = t; + } + } + return rv; +} + +/* Macro that wraps the appropriate type-changes around update_intpte(). + * Arguments are: type, ptr, old, new, mfn, vcpu */ +#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \ + update_intpte(&_t ## e_get_intpte(*(_p)), \ + _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \ + (_m), (_v), (_ad)) + +/* + * PTE flags that a guest may change without re-validating the PTE. + * All other bits affect translation, caching, or Xen's safety. + */ +#define FASTPATH_FLAG_WHITELIST \ + (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \ + _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER) + +/* Update the L1 entry at pl1e to new value nl1e. */ +static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e, + unsigned long gl1mfn, int preserve_ad, + struct vcpu *pt_vcpu, struct domain *pg_dom) +{ + l1_pgentry_t ol1e; + struct domain *pt_dom = pt_vcpu->domain; + int rc = 0; + + if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) ) + return -EFAULT; + + if ( unlikely(paging_mode_refcounts(pt_dom)) ) + { + if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) ) + return 0; + return -EBUSY; + } + + if ( l1e_get_flags(nl1e) & _PAGE_PRESENT ) + { + /* Translate foreign guest addresses. */ + struct page_info *page = NULL; + + if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) ) + { + gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n", + l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)); + return -EINVAL; + } + + if ( paging_mode_translate(pg_dom) ) + { + page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC); + if ( !page ) + return -EINVAL; + nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e)); + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l1e(nl1e, pt_dom); + rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad); + if ( page ) + put_page(page); + return rc ? 0 : -EBUSY; + } + + switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) ) + { + default: + if ( page ) + put_page(page); + return rc; + case 0: + break; + case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: + ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); + l1e_flip_flags(nl1e, rc); + rc = 0; + break; + } + if ( page ) + put_page(page); + + adjust_guest_l1e(nl1e, pt_dom); + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad)) ) + { + ol1e = nl1e; + rc = -EBUSY; + } + } + else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, + preserve_ad)) ) + { + return -EBUSY; + } + + put_page_from_l1e(ol1e, pt_dom); + return rc; +} + + +/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */ +static int mod_l2_entry(l2_pgentry_t *pl2e, + l2_pgentry_t nl2e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + l2_pgentry_t ol2e; + struct domain *d = vcpu->domain; + struct page_info *l2pg = mfn_to_page(pfn); + unsigned long type = l2pg->u.inuse.type_info; + int rc = 0; + + if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) ) + { + gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n", + pgentry_ptr_to_slot(pl2e)); + return -EPERM; + } + + if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) ) + return -EFAULT; + + if ( l2e_get_flags(nl2e) & _PAGE_PRESENT ) + { + if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) ) + { + gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n", + l2e_get_flags(nl2e) & L2_DISALLOW_MASK); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l2e(nl2e, d); + if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) ) + return 0; + return -EBUSY; + } + + if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) ) + return rc; + + adjust_guest_l2e(nl2e, d); + if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, + preserve_ad)) ) + { + ol2e = nl2e; + rc = -EBUSY; + } + } + else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, + preserve_ad)) ) + { + return -EBUSY; + } + + put_page_from_l2e(ol2e, pfn); + return rc; +} + +/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */ +static int mod_l3_entry(l3_pgentry_t *pl3e, + l3_pgentry_t nl3e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + l3_pgentry_t ol3e; + struct domain *d = vcpu->domain; + int rc = 0; + + if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) ) + { + gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n", + pgentry_ptr_to_slot(pl3e)); + return -EINVAL; + } + + /* + * Disallow updates to final L3 slot. It contains Xen mappings, and it + * would be a pain to ensure they remain continuously valid throughout. + */ + if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) ) + return -EINVAL; + + if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) ) + return -EFAULT; + + if ( l3e_get_flags(nl3e) & _PAGE_PRESENT ) + { + if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) ) + { + gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n", + l3e_get_flags(nl3e) & l3_disallow_mask(d)); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l3e(nl3e, d); + rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad); + return rc ? 0 : -EFAULT; + } + + rc = get_page_from_l3e(nl3e, pfn, d, 0); + if ( unlikely(rc < 0) ) + return rc; + rc = 0; + + adjust_guest_l3e(nl3e, d); + if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, + preserve_ad)) ) + { + ol3e = nl3e; + rc = -EFAULT; + } + } + else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, + preserve_ad)) ) + { + return -EFAULT; + } + + if ( likely(rc == 0) ) + if ( !create_pae_xen_mappings(d, pl3e) ) + BUG(); + + put_page_from_l3e(ol3e, pfn, 0, 1); + return rc; +} + +/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */ +static int mod_l4_entry(l4_pgentry_t *pl4e, + l4_pgentry_t nl4e, + unsigned long pfn, + int preserve_ad, + struct vcpu *vcpu) +{ + struct domain *d = vcpu->domain; + l4_pgentry_t ol4e; + int rc = 0; + + if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) ) + { + gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n", + pgentry_ptr_to_slot(pl4e)); + return -EINVAL; + } + + if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) ) + return -EFAULT; + + if ( l4e_get_flags(nl4e) & _PAGE_PRESENT ) + { + if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) ) + { + gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n", + l4e_get_flags(nl4e) & L4_DISALLOW_MASK); + return -EINVAL; + } + + /* Fast path for sufficiently-similar mappings. */ + if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) ) + { + adjust_guest_l4e(nl4e, d); + rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad); + return rc ? 0 : -EFAULT; + } + + rc = get_page_from_l4e(nl4e, pfn, d, 0); + if ( unlikely(rc < 0) ) + return rc; + rc = 0; + + adjust_guest_l4e(nl4e, d); + if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, + preserve_ad)) ) + { + ol4e = nl4e; + rc = -EFAULT; + } + } + else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, + preserve_ad)) ) + { + return -EFAULT; + } + + put_page_from_l4e(ol4e, pfn, 0, 1); + return rc; +} + + +int alloc_page_type(struct page_info *page, unsigned long type, + int preemptible) +{ + struct domain *owner = page_get_owner(page); + int rc; + + /* A page table is dirtied when its type count becomes non-zero. */ + if ( likely(owner != NULL) ) + paging_mark_dirty(owner, _mfn(page_to_mfn(page))); + + switch ( type & PGT_type_mask ) + { + case PGT_l1_page_table: + rc = alloc_l1_table(page); + break; + case PGT_l2_page_table: + rc = alloc_l2_table(page, type, preemptible); + break; + case PGT_l3_page_table: + ASSERT(preemptible); + rc = alloc_l3_table(page); + break; + case PGT_l4_page_table: + ASSERT(preemptible); + rc = alloc_l4_table(page); + break; + case PGT_seg_desc_page: + rc = alloc_segdesc_page(page); + break; + default: + printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n", + type, page->u.inuse.type_info, + page->count_info); + rc = -EINVAL; + BUG(); + } + + /* No need for atomic update of type_info here: noone else updates it. */ + wmb(); + switch ( rc ) + { + case 0: + page->u.inuse.type_info |= PGT_validated; + break; + case -EINTR: + ASSERT((page->u.inuse.type_info & + (PGT_count_mask|PGT_validated|PGT_partial)) == 1); + page->u.inuse.type_info &= ~PGT_count_mask; + break; + default: + ASSERT(rc < 0); + gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn + " (pfn %" PRI_pfn ") for type %" PRtype_info + ": caf=%08lx taf=%" PRtype_info "\n", + page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)), + type, page->count_info, page->u.inuse.type_info); + if ( page != current->arch.old_guest_table ) + page->u.inuse.type_info = 0; + else + { + ASSERT((page->u.inuse.type_info & + (PGT_count_mask | PGT_validated)) == 1); + case -ERESTART: + get_page_light(page); + page->u.inuse.type_info |= PGT_partial; + } + break; + } + + return rc; +} + +int free_page_type(struct page_info *page, unsigned long type, + int preemptible) +{ + struct domain *owner = page_get_owner(page); + unsigned long gmfn; + int rc; + + if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) ) + { + /* A page table is dirtied when its type count becomes zero. */ + paging_mark_dirty(owner, _mfn(page_to_mfn(page))); + + if ( shadow_mode_refcounts(owner) ) + return 0; + + gmfn = mfn_to_gmfn(owner, page_to_mfn(page)); + ASSERT(VALID_M2P(gmfn)); + /* Page sharing not supported for shadowed domains */ + if(!SHARED_M2P(gmfn)) + shadow_remove_all_shadows(owner, _mfn(gmfn)); + } + + if ( !(type & PGT_partial) ) + { + page->nr_validated_ptes = 1U << PAGETABLE_ORDER; + page->partial_pte = 0; + } + + switch ( type & PGT_type_mask ) + { + case PGT_l1_page_table: + free_l1_table(page); + rc = 0; + break; + case PGT_l2_page_table: + rc = free_l2_table(page, preemptible); + break; + case PGT_l3_page_table: + ASSERT(preemptible); + rc = free_l3_table(page); + break; + case PGT_l4_page_table: + ASSERT(preemptible); + rc = free_l4_table(page); + break; + default: + gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n", + type, page_to_mfn(page)); + rc = -EINVAL; + BUG(); + } + + return rc; +} + +static int get_spage_pages(struct page_info *page, struct domain *d) +{ + int i; + + for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++) + { + if (!get_page_and_type(page, d, PGT_writable_page)) + { + while (--i >= 0) + put_page_and_type(--page); + return 0; + } + } + return 1; +} + +static void put_spage_pages(struct page_info *page) +{ + int i; + + for (i = 0; i < (1<<PAGETABLE_ORDER); i++, page++) + { + put_page_and_type(page); + } + return; +} + +static int mark_superpage(struct spage_info *spage, struct domain *d) +{ + unsigned long x, nx, y = spage->type_info; + int pages_done = 0; + + ASSERT(opt_allow_superpage); + + do { + x = y; + nx = x + 1; + if ( (x & SGT_type_mask) == SGT_mark ) + { + gdprintk(XENLOG_WARNING, + "Duplicate superpage mark attempt mfn %" PRI_mfn "\n", + spage_to_mfn(spage)); + if ( pages_done ) + put_spage_pages(spage_to_page(spage)); + return -EINVAL; + } + if ( (x & SGT_type_mask) == SGT_dynamic ) + { + if ( pages_done ) + { + put_spage_pages(spage_to_page(spage)); + pages_done = 0; + } + } + else if ( !pages_done ) + { + if ( !get_spage_pages(spage_to_page(spage), d) ) + { + gdprintk(XENLOG_WARNING, + "Superpage type conflict in mark attempt mfn %" PRI_mfn "\n", + spage_to_mfn(spage)); + return -EINVAL; + } + pages_done = 1; + } + nx = (nx & ~SGT_type_mask) | SGT_mark; + + } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); + + return 0; +} + +static int unmark_superpage(struct spage_info *spage) +{ + unsigned long x, nx, y = spage->type_info; + unsigned long do_pages = 0; + + ASSERT(opt_allow_superpage); + + do { + x = y; + nx = x - 1; + if ( (x & SGT_type_mask) != SGT_mark ) + { + gdprintk(XENLOG_WARNING, + "Attempt to unmark unmarked superpage mfn %" PRI_mfn "\n", + spage_to_mfn(spage)); + return -EINVAL; + } + if ( (nx & SGT_count_mask) == 0 ) + { + nx = (nx & ~SGT_type_mask) | SGT_none; + do_pages = 1; + } + else + { + nx = (nx & ~SGT_type_mask) | SGT_dynamic; + } + } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); + + if ( do_pages ) + put_spage_pages(spage_to_page(spage)); + + return 0; +} + +void clear_superpage_mark(struct page_info *page) +{ + struct spage_info *spage; + + if ( !opt_allow_superpage ) + return; + + spage = page_to_spage(page); + if ((spage->type_info & SGT_type_mask) == SGT_mark) + unmark_superpage(spage); + +} + +int get_superpage(unsigned long mfn, struct domain *d) +{ + struct spage_info *spage; + unsigned long x, nx, y; + int pages_done = 0; + + ASSERT(opt_allow_superpage); + + if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) ) + return -EINVAL; + + spage = mfn_to_spage(mfn); + y = spage->type_info; + do { + x = y; + nx = x + 1; + if ( (x & SGT_type_mask) != SGT_none ) + { + if ( pages_done ) + { + put_spage_pages(spage_to_page(spage)); + pages_done = 0; + } + } + else + { + if ( !get_spage_pages(spage_to_page(spage), d) ) + { + gdprintk(XENLOG_WARNING, + "Type conflict on superpage mapping mfn %" PRI_mfn "\n", + spage_to_mfn(spage)); + return -EINVAL; + } + pages_done = 1; + nx = (nx & ~SGT_type_mask) | SGT_dynamic; + } + } while ( (y = cmpxchg(&spage->type_info, x, nx)) != x ); + + return 0; +} + +static void put_superpage(unsigned long mfn) +{ + struct spage_info *spage; + unsigned long x, nx, y; + unsigned long do_pages = 0; + + if ( !opt_allow_superpage ) + { + put_spage_pages(mfn_to_page(mfn)); + return; + } + + spage = mfn_to_spage(mfn); + y = spage->type_info; + do { + x = y; + nx = x - 1; + if ((x & SGT_type_mask) == SGT_dynamic) + { + if ((nx & SGT_count_mask) == 0) + { + nx = (nx & ~SGT_type_mask) | SGT_none; + do_pages = 1; + } + } + + } while ((y = cmpxchg(&spage->type_info, x, nx)) != x); + + if (do_pages) + put_spage_pages(spage_to_page(spage)); + + return; +} + +int put_old_guest_table(struct vcpu *v) +{ + int rc; + + if ( !v->arch.old_guest_table ) + return 0; + + switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) ) + { + case -EINTR: + case -ERESTART: + return -ERESTART; + } + + v->arch.old_guest_table = NULL; + + return rc; +} + +int new_guest_cr3(unsigned long mfn) +{ + struct vcpu *curr = current; + struct domain *d = curr->domain; + int rc; + unsigned long old_base_mfn; + + if ( is_pv_32bit_domain(d) ) + { + unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table); + l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn)); + + rc = paging_mode_refcounts(d) + ? -EINVAL /* Old code was broken, but what should it be? */ + : mod_l4_entry( + pl4e, + l4e_from_pfn( + mfn, + (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)), + gt_mfn, 0, curr); + unmap_domain_page(pl4e); + switch ( rc ) + { + case 0: + break; + case -EINTR: + case -ERESTART: + return -ERESTART; + default: + gdprintk(XENLOG_WARNING, + "Error while installing new compat baseptr %" PRI_mfn "\n", + mfn); + return rc; + } + + invalidate_shadow_ldt(curr, 0); + write_ptbase(curr); + + return 0; + } + + rc = put_old_guest_table(curr); + if ( unlikely(rc) ) + return rc; + + old_base_mfn = pagetable_get_pfn(curr->arch.guest_table); + /* + * This is particularly important when getting restarted after the + * previous attempt got preempted in the put-old-MFN phase. + */ + if ( old_base_mfn == mfn ) + { + write_ptbase(curr); + return 0; + } + + rc = paging_mode_refcounts(d) + ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL) + : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1); + switch ( rc ) + { + case 0: + break; + case -EINTR: + case -ERESTART: + return -ERESTART; + default: + gdprintk(XENLOG_WARNING, + "Error while installing new baseptr %" PRI_mfn "\n", mfn); + return rc; + } + + invalidate_shadow_ldt(curr, 0); + + if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) + fill_ro_mpt(mfn); + curr->arch.guest_table = pagetable_from_pfn(mfn); + update_cr3(curr); + + write_ptbase(curr); + + if ( likely(old_base_mfn != 0) ) + { + struct page_info *page = mfn_to_page(old_base_mfn); + + if ( paging_mode_refcounts(d) ) + put_page(page); + else + switch ( rc = put_page_and_type_preemptible(page) ) + { + case -EINTR: + rc = -ERESTART; + /* fallthrough */ + case -ERESTART: + curr->arch.old_guest_table = page; + break; + default: + BUG_ON(rc); + break; + } + } + + return rc; +} + +static struct domain *get_pg_owner(domid_t domid) +{ + struct domain *pg_owner = NULL, *curr = current->domain; + + if ( likely(domid == DOMID_SELF) ) + { + pg_owner = rcu_lock_current_domain(); + goto out; + } + + if ( unlikely(domid == curr->domain_id) ) + { + gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n"); + goto out; + } + + if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) ) + { + gdprintk(XENLOG_WARNING, + "Cannot mix foreign mappings with translated domains\n"); + goto out; + } + + switch ( domid ) + { + case DOMID_IO: + pg_owner = rcu_lock_domain(dom_io); + break; + case DOMID_XEN: + pg_owner = rcu_lock_domain(dom_xen); + break; + default: + if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL ) + { + gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid); + break; + } + break; + } + + out: + return pg_owner; +} + +static void put_pg_owner(struct domain *pg_owner) +{ + rcu_unlock_domain(pg_owner); +} + +static inline int vcpumask_to_pcpumask( + struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask) +{ + unsigned int vcpu_id, vcpu_bias, offs; + unsigned long vmask; + struct vcpu *v; + bool_t is_native = !is_pv_32bit_domain(d); + + cpumask_clear(pmask); + for ( vmask = 0, offs = 0; ; ++offs) + { + vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32); + if ( vcpu_bias >= d->max_vcpus ) + return 0; + + if ( unlikely(is_native ? + copy_from_guest_offset(&vmask, bmap, offs, 1) : + copy_from_guest_offset((unsigned int *)&vmask, bmap, + offs, 1)) ) + { + cpumask_clear(pmask); + return -EFAULT; + } + + while ( vmask ) + { + vcpu_id = find_first_set_bit(vmask); + vmask &= ~(1UL << vcpu_id); + vcpu_id += vcpu_bias; + if ( (vcpu_id >= d->max_vcpus) ) + return 0; + if ( ((v = d->vcpu[vcpu_id]) != NULL) ) + cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask); + } + } +} + +long do_mmuext_op( + XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops, + unsigned int count, + XEN_GUEST_HANDLE_PARAM(uint) pdone, + unsigned int foreigndom) +{ + struct mmuext_op op; + unsigned long type; + unsigned int i, done = 0; + struct vcpu *curr = current; + struct domain *d = curr->domain; + struct domain *pg_owner; + int rc = put_old_guest_table(curr); + + if ( unlikely(rc) ) + { + if ( likely(rc == -ERESTART) ) + rc = hypercall_create_continuation( + __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone, + foreigndom); + return rc; + } + + if ( unlikely(count == MMU_UPDATE_PREEMPTED) && + likely(guest_handle_is_null(uops)) ) + { + /* See the curr->arch.old_guest_table related + * hypercall_create_continuation() below. */ + return (int)foreigndom; + } + + if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) + { + count &= ~MMU_UPDATE_PREEMPTED; + if ( unlikely(!guest_handle_is_null(pdone)) ) + (void)copy_from_guest(&done, pdone, 1); + } + else + perfc_incr(calls_to_mmuext_op); + + if ( unlikely(!guest_handle_okay(uops, count)) ) + return -EFAULT; + + if ( (pg_owner = get_pg_owner(foreigndom)) == NULL ) + return -ESRCH; + + if ( !is_pv_domain(pg_owner) ) + { + put_pg_owner(pg_owner); + return -EINVAL; + } + + rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner); + if ( rc ) + { + put_pg_owner(pg_owner); + return rc; + } + + for ( i = 0; i < count; i++ ) + { + if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) + { + rc = -ERESTART; + break; + } + + if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) ) + { + rc = -EFAULT; + break; + } + + if ( is_hvm_domain(d) ) + { + switch ( op.cmd ) + { + case MMUEXT_PIN_L1_TABLE: + case MMUEXT_PIN_L2_TABLE: + case MMUEXT_PIN_L3_TABLE: + case MMUEXT_PIN_L4_TABLE: + case MMUEXT_UNPIN_TABLE: + break; + default: + rc = -EOPNOTSUPP; + goto done; + } + } + + rc = 0; + + switch ( op.cmd ) + { + case MMUEXT_PIN_L1_TABLE: + type = PGT_l1_page_table; + goto pin_page; + + case MMUEXT_PIN_L2_TABLE: + type = PGT_l2_page_table; + goto pin_page; + + case MMUEXT_PIN_L3_TABLE: + type = PGT_l3_page_table; + goto pin_page; + + case MMUEXT_PIN_L4_TABLE: + if ( is_pv_32bit_domain(pg_owner) ) + break; + type = PGT_l4_page_table; + + pin_page: { + struct page_info *page; + + /* Ignore pinning of invalid paging levels. */ + if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) ) + break; + + if ( paging_mode_refcounts(pg_owner) ) + break; + + page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + { + rc = -EINVAL; + break; + } + + rc = get_page_type_preemptible(page, type); + if ( unlikely(rc) ) + { + if ( rc == -EINTR ) + rc = -ERESTART; + else if ( rc != -ERESTART ) + gdprintk(XENLOG_WARNING, + "Error %d while pinning mfn %" PRI_mfn "\n", + rc, page_to_mfn(page)); + if ( page != curr->arch.old_guest_table ) + put_page(page); + break; + } + + rc = xsm_memory_pin_page(XSM_HOOK, d, pg_owner, page); + if ( !rc && unlikely(test_and_set_bit(_PGT_pinned, + &page->u.inuse.type_info)) ) + { + gdprintk(XENLOG_WARNING, + "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page)); + rc = -EINVAL; + } + + if ( unlikely(rc) ) + goto pin_drop; + + /* A page is dirtied when its pin status is set. */ + paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); + + /* We can race domain destruction (domain_relinquish_resources). */ + if ( unlikely(pg_owner != d) ) + { + int drop_ref; + spin_lock(&pg_owner->page_alloc_lock); + drop_ref = (pg_owner->is_dying && + test_and_clear_bit(_PGT_pinned, + &page->u.inuse.type_info)); + spin_unlock(&pg_owner->page_alloc_lock); + if ( drop_ref ) + { + pin_drop: + if ( type == PGT_l1_page_table ) + put_page_and_type(page); + else + curr->arch.old_guest_table = page; + } + } + + break; + } + + case MMUEXT_UNPIN_TABLE: { + struct page_info *page; + + if ( paging_mode_refcounts(pg_owner) ) + break; + + page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); + if ( unlikely(!page) ) + { + gdprintk(XENLOG_WARNING, + "mfn %" PRI_mfn " bad, or bad owner d%d\n", + op.arg1.mfn, pg_owner->domain_id); + rc = -EINVAL; + break; + } + + if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) ) + { + put_page(page); + gdprintk(XENLOG_WARNING, + "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn); + rc = -EINVAL; + break; + } + + switch ( rc = put_page_and_type_preemptible(page) ) + { + case -EINTR: + case -ERESTART: + curr->arch.old_guest_table = page; + rc = 0; + break; + default: + BUG_ON(rc); + break; + } + put_page(page); + + /* A page is dirtied when its pin status is cleared. */ + paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); + + break; + } + + case MMUEXT_NEW_BASEPTR: + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( unlikely(paging_mode_translate(d)) ) + rc = -EINVAL; + else + rc = new_guest_cr3(op.arg1.mfn); + break; + + case MMUEXT_NEW_USER_BASEPTR: { + unsigned long old_mfn; + + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( unlikely(paging_mode_translate(d)) ) + rc = -EINVAL; + if ( unlikely(rc) ) + break; + + old_mfn = pagetable_get_pfn(curr->arch.guest_table_user); + /* + * This is particularly important when getting restarted after the + * previous attempt got preempted in the put-old-MFN phase. + */ + if ( old_mfn == op.arg1.mfn ) + break; + + if ( op.arg1.mfn != 0 ) + { + if ( paging_mode_refcounts(d) ) + rc = get_page_from_pagenr(op.arg1.mfn, d) ? 0 : -EINVAL; + else + rc = get_page_and_type_from_pagenr( + op.arg1.mfn, PGT_root_page_table, d, 0, 1); + + if ( unlikely(rc) ) + { + if ( rc == -EINTR ) + rc = -ERESTART; + else if ( rc != -ERESTART ) + gdprintk(XENLOG_WARNING, + "Error %d installing new mfn %" PRI_mfn "\n", + rc, op.arg1.mfn); + break; + } + if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) + zap_ro_mpt(op.arg1.mfn); + } + + curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn); + + if ( old_mfn != 0 ) + { + struct page_info *page = mfn_to_page(old_mfn); + + if ( paging_mode_refcounts(d) ) + put_page(page); + else + switch ( rc = put_page_and_type_preemptible(page) ) + { + case -EINTR: + rc = -ERESTART; + /* fallthrough */ + case -ERESTART: + curr->arch.old_guest_table = page; + break; + default: + BUG_ON(rc); + break; + } + } + + break; + } + + case MMUEXT_TLB_FLUSH_LOCAL: + if ( likely(d == pg_owner) ) + flush_tlb_local(); + else + rc = -EPERM; + break; + + case MMUEXT_INVLPG_LOCAL: + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else + paging_invlpg(curr, op.arg1.linear_addr); + break; + + case MMUEXT_TLB_FLUSH_MULTI: + case MMUEXT_INVLPG_MULTI: + { + cpumask_t *mask = this_cpu(scratch_cpumask); + + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( unlikely(vcpumask_to_pcpumask(d, + guest_handle_to_param(op.arg2.vcpumask, + const_void), + mask)) ) + rc = -EINVAL; + if ( unlikely(rc) ) + break; + + if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI ) + flush_tlb_mask(mask); + else if ( __addr_ok(op.arg1.linear_addr) ) + flush_tlb_one_mask(mask, op.arg1.linear_addr); + break; + } + + case MMUEXT_TLB_FLUSH_ALL: + if ( likely(d == pg_owner) ) + flush_tlb_mask(d->domain_dirty_cpumask); + else + rc = -EPERM; + break; + + case MMUEXT_INVLPG_ALL: + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( __addr_ok(op.arg1.linear_addr) ) + flush_tlb_one_mask(d->domain_dirty_cpumask, op.arg1.linear_addr); + break; + + case MMUEXT_FLUSH_CACHE: + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( unlikely(!cache_flush_permitted(d)) ) + rc = -EACCES; + else + wbinvd(); + break; + + case MMUEXT_FLUSH_CACHE_GLOBAL: + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( likely(cache_flush_permitted(d)) ) + { + unsigned int cpu; + cpumask_t *mask = this_cpu(scratch_cpumask); + + cpumask_clear(mask); + for_each_online_cpu(cpu) + if ( !cpumask_intersects(mask, + per_cpu(cpu_sibling_mask, cpu)) ) + __cpumask_set_cpu(cpu, mask); + flush_mask(mask, FLUSH_CACHE); + } + else + rc = -EINVAL; + break; + + case MMUEXT_SET_LDT: + { + unsigned int ents = op.arg2.nr_ents; + unsigned long ptr = ents ? op.arg1.linear_addr : 0; + + if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( paging_mode_external(d) ) + rc = -EINVAL; + else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) || + (ents > 8192) ) + { + gdprintk(XENLOG_WARNING, + "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents); + rc = -EINVAL; + } + else if ( (curr->arch.pv_vcpu.ldt_ents != ents) || + (curr->arch.pv_vcpu.ldt_base != ptr) ) + { + invalidate_shadow_ldt(curr, 0); + flush_tlb_local(); + curr->arch.pv_vcpu.ldt_base = ptr; + curr->arch.pv_vcpu.ldt_ents = ents; + load_LDT(curr); + } + break; + } + + case MMUEXT_CLEAR_PAGE: { + struct page_info *page; + + page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC); + if ( !page || !get_page_type(page, PGT_writable_page) ) + { + if ( page ) + put_page(page); + gdprintk(XENLOG_WARNING, + "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn); + rc = -EINVAL; + break; + } + + /* A page is dirtied when it's being cleared. */ + paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page))); + + clear_domain_page(_mfn(page_to_mfn(page))); + + put_page_and_type(page); + break; + } + + case MMUEXT_COPY_PAGE: + { + struct page_info *src_page, *dst_page; + + src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, NULL, + P2M_ALLOC); + if ( unlikely(!src_page) ) + { + gdprintk(XENLOG_WARNING, + "Error copying from mfn %" PRI_mfn "\n", + op.arg2.src_mfn); + rc = -EINVAL; + break; + } + + dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, + P2M_ALLOC); + rc = (dst_page && + get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL; + if ( unlikely(rc) ) + { + put_page(src_page); + if ( dst_page ) + put_page(dst_page); + gdprintk(XENLOG_WARNING, + "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn); + break; + } + + /* A page is dirtied when it's being copied to. */ + paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page))); + + copy_domain_page(_mfn(page_to_mfn(dst_page)), + _mfn(page_to_mfn(src_page))); + + put_page_and_type(dst_page); + put_page(src_page); + break; + } + + case MMUEXT_MARK_SUPER: + case MMUEXT_UNMARK_SUPER: + { + unsigned long mfn = op.arg1.mfn; + + if ( !opt_allow_superpage ) + rc = -EOPNOTSUPP; + else if ( unlikely(d != pg_owner) ) + rc = -EPERM; + else if ( mfn & (L1_PAGETABLE_ENTRIES - 1) ) + { + gdprintk(XENLOG_WARNING, + "Unaligned superpage mfn %" PRI_mfn "\n", mfn); + rc = -EINVAL; + } + else if ( !mfn_valid(_mfn(mfn | (L1_PAGETABLE_ENTRIES - 1))) ) + rc = -EINVAL; + else if ( op.cmd == MMUEXT_MARK_SUPER ) + rc = mark_superpage(mfn_to_spage(mfn), d); + else + rc = unmark_superpage(mfn_to_spage(mfn)); + break; + } + + default: + rc = -ENOSYS; + break; + } + + done: + if ( unlikely(rc) ) + break; + + guest_handle_add_offset(uops, 1); + } + + if ( rc == -ERESTART ) + { + ASSERT(i < count); + rc = hypercall_create_continuation( + __HYPERVISOR_mmuext_op, "hihi", + uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); + } + else if ( curr->arch.old_guest_table ) + { + XEN_GUEST_HANDLE_PARAM(void) null; + + ASSERT(rc || i == count); + set_xen_guest_handle(null, NULL); + /* + * In order to have a way to communicate the final return value to + * our continuation, we pass this in place of "foreigndom", building + * on the fact that this argument isn't needed anymore. + */ + rc = hypercall_create_continuation( + __HYPERVISOR_mmuext_op, "hihi", null, + MMU_UPDATE_PREEMPTED, null, rc); + } + + put_pg_owner(pg_owner); + + perfc_add(num_mmuext_ops, i); + + /* Add incremental work we have done to the @done output parameter. */ + if ( unlikely(!guest_handle_is_null(pdone)) ) + { + done += i; + copy_to_guest(pdone, &done, 1); + } + + return rc; +} + +long do_mmu_update( + XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs, + unsigned int count, + XEN_GUEST_HANDLE_PARAM(uint) pdone, + unsigned int foreigndom) +{ + struct mmu_update req; + void *va; + unsigned long gpfn, gmfn, mfn; + struct page_info *page; + unsigned int cmd, i = 0, done = 0, pt_dom; + struct vcpu *curr = current, *v = curr; + struct domain *d = v->domain, *pt_owner = d, *pg_owner; + struct domain_mmap_cache mapcache; + uint32_t xsm_needed = 0; + uint32_t xsm_checked = 0; + int rc = put_old_guest_table(curr); + + if ( unlikely(rc) ) + { + if ( likely(rc == -ERESTART) ) + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone, + foreigndom); + return rc; + } + + if ( unlikely(count == MMU_UPDATE_PREEMPTED) && + likely(guest_handle_is_null(ureqs)) ) + { + /* See the curr->arch.old_guest_table related + * hypercall_create_continuation() below. */ + return (int)foreigndom; + } + + if ( unlikely(count & MMU_UPDATE_PREEMPTED) ) + { + count &= ~MMU_UPDATE_PREEMPTED; + if ( unlikely(!guest_handle_is_null(pdone)) ) + (void)copy_from_guest(&done, pdone, 1); + } + else + perfc_incr(calls_to_mmu_update); + + if ( unlikely(!guest_handle_okay(ureqs, count)) ) + return -EFAULT; + + if ( (pt_dom = foreigndom >> 16) != 0 ) + { + /* Pagetables belong to a foreign domain (PFD). */ + if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL ) + return -ESRCH; + + if ( pt_owner == d ) + rcu_unlock_domain(pt_owner); + else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL ) + { + rc = -EINVAL; + goto out; + } + } + + if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL ) + { + rc = -ESRCH; + goto out; + } + + domain_mmap_cache_init(&mapcache); + + for ( i = 0; i < count; i++ ) + { + if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) ) + { + rc = -ERESTART; + break; + } + + if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) ) + { + rc = -EFAULT; + break; + } + + cmd = req.ptr & (sizeof(l1_pgentry_t)-1); + + switch ( cmd ) + { + /* + * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table. + * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR) + * current A/D bits. + */ + case MMU_NORMAL_PT_UPDATE: + case MMU_PT_UPDATE_PRESERVE_AD: + { + p2m_type_t p2mt; + + rc = -EOPNOTSUPP; + if ( unlikely(paging_mode_refcounts(pt_owner)) ) + break; + + xsm_needed |= XSM_MMU_NORMAL_UPDATE; + if ( get_pte_flags(req.val) & _PAGE_PRESENT ) + { + xsm_needed |= XSM_MMU_UPDATE_READ; + if ( get_pte_flags(req.val) & _PAGE_RW ) + xsm_needed |= XSM_MMU_UPDATE_WRITE; + } + if ( xsm_needed != xsm_checked ) + { + rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed); + if ( rc ) + break; + xsm_checked = xsm_needed; + } + rc = -EINVAL; + + req.ptr -= cmd; + gmfn = req.ptr >> PAGE_SHIFT; + page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC); + + if ( p2m_is_paged(p2mt) ) + { + ASSERT(!page); + p2m_mem_paging_populate(pg_owner, gmfn); + rc = -ENOENT; + break; + } + + if ( unlikely(!page) ) + { + gdprintk(XENLOG_WARNING, + "Could not get page for normal update\n"); + break; + } + + mfn = page_to_mfn(page); + va = map_domain_page_with_cache(mfn, &mapcache); + va = (void *)((unsigned long)va + + (unsigned long)(req.ptr & ~PAGE_MASK)); + + if ( page_lock(page) ) + { + switch ( page->u.inuse.type_info & PGT_type_mask ) + { + case PGT_l1_page_table: + { + l1_pgentry_t l1e = l1e_from_intpte(req.val); + p2m_type_t l1e_p2mt = p2m_ram_rw; + struct page_info *target = NULL; + p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ? + P2M_UNSHARE : P2M_ALLOC; + + if ( paging_mode_translate(pg_owner) ) + target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e), + &l1e_p2mt, q); + + if ( p2m_is_paged(l1e_p2mt) ) + { + if ( target ) + put_page(target); + p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e)); + rc = -ENOENT; + break; + } + else if ( p2m_ram_paging_in == l1e_p2mt && !target ) + { + rc = -ENOENT; + break; + } + /* If we tried to unshare and failed */ + else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) ) + { + /* We could not have obtained a page ref. */ + ASSERT(target == NULL); + /* And mem_sharing_notify has already been called. */ + rc = -ENOMEM; + break; + } + + rc = mod_l1_entry(va, l1e, mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v, + pg_owner); + if ( target ) + put_page(target); + } + break; + case PGT_l2_page_table: + rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_l3_page_table: + rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_l4_page_table: + rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn, + cmd == MMU_PT_UPDATE_PRESERVE_AD, v); + break; + case PGT_writable_page: + perfc_incr(writable_mmu_updates); + if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) + rc = 0; + break; + } + page_unlock(page); + if ( rc == -EINTR ) + rc = -ERESTART; + } + else if ( get_page_type(page, PGT_writable_page) ) + { + perfc_incr(writable_mmu_updates); + if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) ) + rc = 0; + put_page_type(page); + } + + unmap_domain_page_with_cache(va, &mapcache); + put_page(page); + } + break; + + case MMU_MACHPHYS_UPDATE: + if ( unlikely(d != pt_owner) ) + { + rc = -EPERM; + break; + } + + if ( unlikely(paging_mode_translate(pg_owner)) ) + { + rc = -EINVAL; + break; + } + + mfn = req.ptr >> PAGE_SHIFT; + gpfn = req.val; + + xsm_needed |= XSM_MMU_MACHPHYS_UPDATE; + if ( xsm_needed != xsm_checked ) + { + rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed); + if ( rc ) + break; + xsm_checked = xsm_needed; + } + + if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) ) + { + gdprintk(XENLOG_WARNING, + "Could not get page for mach->phys update\n"); + rc = -EINVAL; + break; + } + + set_gpfn_from_mfn(mfn, gpfn); + + paging_mark_dirty(pg_owner, _mfn(mfn)); + + put_page(mfn_to_page(mfn)); + break; + + default: + rc = -ENOSYS; + break; + } + + if ( unlikely(rc) ) + break; + + guest_handle_add_offset(ureqs, 1); + } + + if ( rc == -ERESTART ) + { + ASSERT(i < count); + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", + ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom); + } + else if ( curr->arch.old_guest_table ) + { + XEN_GUEST_HANDLE_PARAM(void) null; + + ASSERT(rc || i == count); + set_xen_guest_handle(null, NULL); + /* + * In order to have a way to communicate the final return value to + * our continuation, we pass this in place of "foreigndom", building + * on the fact that this argument isn't needed anymore. + */ + rc = hypercall_create_continuation( + __HYPERVISOR_mmu_update, "hihi", null, + MMU_UPDATE_PREEMPTED, null, rc); + } + + put_pg_owner(pg_owner); + + domain_mmap_cache_destroy(&mapcache); + + perfc_add(num_page_updates, i); + + out: + if ( pt_owner != d ) + rcu_unlock_domain(pt_owner); + + /* Add incremental work we have done to the @done output parameter. */ + if ( unlikely(!guest_handle_is_null(pdone)) ) + { + done += i; + copy_to_guest(pdone, &done, 1); + } + + return rc; +} + + +static int create_grant_pte_mapping( + uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v) +{ + int rc = GNTST_okay; + void *va; + unsigned long gmfn, mfn; + struct page_info *page; + l1_pgentry_t ol1e; + struct domain *d = v->domain; + + adjust_guest_l1e(nl1e, d); + + gmfn = pte_addr >> PAGE_SHIFT; + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + + if ( unlikely(!page) ) + { + gdprintk(XENLOG_WARNING, "Could not get page for normal update\n"); + return GNTST_general_error; + } + + mfn = page_to_mfn(page); + va = map_domain_page(_mfn(mfn)); + va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK)); + + if ( !page_lock(page) ) + { + rc = GNTST_general_error; + goto failed; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + ol1e = *(l1_pgentry_t *)va; + if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + page_unlock(page); + + if ( !paging_mode_refcounts(d) ) + put_page_from_l1e(ol1e, d); + + failed: + unmap_domain_page(va); + put_page(page); + + return rc; +} + +static int destroy_grant_pte_mapping( + uint64_t addr, unsigned long frame, struct domain *d) +{ + int rc = GNTST_okay; + void *va; + unsigned long gmfn, mfn; + struct page_info *page; + l1_pgentry_t ol1e; + + gmfn = addr >> PAGE_SHIFT; + page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC); + + if ( unlikely(!page) ) + { + gdprintk(XENLOG_WARNING, "Could not get page for normal update\n"); + return GNTST_general_error; + } + + mfn = page_to_mfn(page); + va = map_domain_page(_mfn(mfn)); + va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK)); + + if ( !page_lock(page) ) + { + rc = GNTST_general_error; + goto failed; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + rc = GNTST_general_error; + goto failed; + } + + ol1e = *(l1_pgentry_t *)va; + + /* Check that the virtual address supplied is actually mapped to frame. */ + if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + { + page_unlock(page); + gdprintk(XENLOG_WARNING, + "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n", + l1e_get_intpte(ol1e), addr, frame); + rc = GNTST_general_error; + goto failed; + } + + /* Delete pagetable entry. */ + if ( unlikely(!UPDATE_ENTRY + (l1, + (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, + d->vcpu[0] /* Change if we go to per-vcpu shadows. */, + 0)) ) + { + page_unlock(page); + gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va); + rc = GNTST_general_error; + goto failed; + } + + page_unlock(page); + + failed: + unmap_domain_page(va); + put_page(page); + return rc; +} + + +static int create_grant_va_mapping( + unsigned long va, l1_pgentry_t nl1e, struct vcpu *v) +{ + l1_pgentry_t *pl1e, ol1e; + struct domain *d = v->domain; + unsigned long gl1mfn; + struct page_info *l1pg; + int okay; + + adjust_guest_l1e(nl1e, d); + + pl1e = guest_map_l1e(va, &gl1mfn); + if ( !pl1e ) + { + gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + ol1e = *pl1e; + okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0); + + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + + if ( okay && !paging_mode_refcounts(d) ) + put_page_from_l1e(ol1e, d); + + return okay ? GNTST_okay : GNTST_general_error; +} + +static int replace_grant_va_mapping( + unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v) +{ + l1_pgentry_t *pl1e, ol1e; + unsigned long gl1mfn; + struct page_info *l1pg; + int rc = 0; + + pl1e = guest_map_l1e(addr, &gl1mfn); + if ( !pl1e ) + { + gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + rc = GNTST_general_error; + goto out; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + rc = GNTST_general_error; + put_page(l1pg); + goto out; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + rc = GNTST_general_error; + goto unlock_and_out; + } + + ol1e = *pl1e; + + /* Check that the virtual address supplied is actually mapped to frame. */ + if ( unlikely(l1e_get_pfn(ol1e) != frame) ) + { + gdprintk(XENLOG_WARNING, + "PTE entry %lx for address %lx doesn't match frame %lx\n", + l1e_get_pfn(ol1e), addr, frame); + rc = GNTST_general_error; + goto unlock_and_out; + } + + /* Delete pagetable entry. */ + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) ) + { + gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e); + rc = GNTST_general_error; + goto unlock_and_out; + } + + unlock_and_out: + page_unlock(l1pg); + put_page(l1pg); + out: + guest_unmap_l1e(pl1e); + return rc; +} + +static int destroy_grant_va_mapping( + unsigned long addr, unsigned long frame, struct vcpu *v) +{ + return replace_grant_va_mapping(addr, frame, l1e_empty(), v); +} + +int create_grant_pv_mapping(uint64_t addr, unsigned long frame, + unsigned int flags, unsigned int cache_flags) +{ + l1_pgentry_t pte; + uint32_t grant_pte_flags; + + grant_pte_flags = + _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB; + if ( cpu_has_nx ) + grant_pte_flags |= _PAGE_NX_BIT; + + pte = l1e_from_pfn(frame, grant_pte_flags); + if ( (flags & GNTMAP_application_map) ) + l1e_add_flags(pte,_PAGE_USER); + if ( !(flags & GNTMAP_readonly) ) + l1e_add_flags(pte,_PAGE_RW); + + l1e_add_flags(pte, + ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0) + & _PAGE_AVAIL); + + l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5)); + + if ( flags & GNTMAP_contains_pte ) + return create_grant_pte_mapping(addr, pte, current); + return create_grant_va_mapping(addr, pte, current); +} + +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, + uint64_t new_addr, unsigned int flags) +{ + struct vcpu *curr = current; + l1_pgentry_t *pl1e, ol1e; + unsigned long gl1mfn; + struct page_info *l1pg; + int rc; + + if ( flags & GNTMAP_contains_pte ) + { + if ( !new_addr ) + return destroy_grant_pte_mapping(addr, frame, curr->domain); + + return GNTST_general_error; + } + + if ( !new_addr ) + return destroy_grant_va_mapping(addr, frame, curr); + + pl1e = guest_map_l1e(new_addr, &gl1mfn); + if ( !pl1e ) + { + gdprintk(XENLOG_WARNING, + "Could not find L1 PTE for address %"PRIx64"\n", new_addr); + return GNTST_general_error; + } + + if ( !get_page_from_pagenr(gl1mfn, current->domain) ) + { + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + l1pg = mfn_to_page(gl1mfn); + if ( !page_lock(l1pg) ) + { + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + ol1e = *pl1e; + + if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), + gl1mfn, curr, 0)) ) + { + page_unlock(l1pg); + put_page(l1pg); + gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e); + guest_unmap_l1e(pl1e); + return GNTST_general_error; + } + + page_unlock(l1pg); + put_page(l1pg); + guest_unmap_l1e(pl1e); + + rc = replace_grant_va_mapping(addr, frame, ol1e, curr); + if ( rc && !paging_mode_refcounts(curr->domain) ) + put_page_from_l1e(ol1e, curr->domain); + + return rc; +} + +static int __do_update_va_mapping( + unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner) +{ + l1_pgentry_t val = l1e_from_intpte(val64); + struct vcpu *v = current; + struct domain *d = v->domain; + struct page_info *gl1pg; + l1_pgentry_t *pl1e; + unsigned long bmap_ptr, gl1mfn; + cpumask_t *mask = NULL; + int rc; + + perfc_incr(calls_to_update_va); + + rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val); + if ( rc ) + return rc; + + rc = -EINVAL; + pl1e = guest_map_l1e(va, &gl1mfn); + if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) ) + goto out; + + gl1pg = mfn_to_page(gl1mfn); + if ( !page_lock(gl1pg) ) + { + put_page(gl1pg); + goto out; + } + + if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(gl1pg); + put_page(gl1pg); + goto out; + } + + rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner); + + page_unlock(gl1pg); + put_page(gl1pg); + + out: + if ( pl1e ) + guest_unmap_l1e(pl1e); + + switch ( flags & UVMF_FLUSHTYPE_MASK ) + { + case UVMF_TLB_FLUSH: + switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) + { + case UVMF_LOCAL: + flush_tlb_local(); + break; + case UVMF_ALL: + mask = d->domain_dirty_cpumask; + break; + default: + mask = this_cpu(scratch_cpumask); + rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr, + void), + mask); + break; + } + if ( mask ) + flush_tlb_mask(mask); + break; + + case UVMF_INVLPG: + switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) ) + { + case UVMF_LOCAL: + paging_invlpg(v, va); + break; + case UVMF_ALL: + mask = d->domain_dirty_cpumask; + break; + default: + mask = this_cpu(scratch_cpumask); + rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr, + void), + mask); + break; + } + if ( mask ) + flush_tlb_one_mask(mask, va); + break; + } + + return rc; +} + +long do_update_va_mapping(unsigned long va, u64 val64, + unsigned long flags) +{ + return __do_update_va_mapping(va, val64, flags, current->domain); +} + +long do_update_va_mapping_otherdomain(unsigned long va, u64 val64, + unsigned long flags, + domid_t domid) +{ + struct domain *pg_owner; + int rc; + + if ( (pg_owner = get_pg_owner(domid)) == NULL ) + return -ESRCH; + + rc = __do_update_va_mapping(va, val64, flags, pg_owner); + + put_pg_owner(pg_owner); + + return rc; +} + + +long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list, + unsigned int entries) +{ + int nr_pages = (entries + 511) / 512; + unsigned long frames[16]; + struct vcpu *curr = current; + long ret; + + /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */ + if ( entries > FIRST_RESERVED_GDT_ENTRY ) + return -EINVAL; + + if ( copy_from_guest(frames, frame_list, nr_pages) ) + return -EFAULT; + + domain_lock(curr->domain); + + if ( (ret = set_gdt(curr, frames, entries)) == 0 ) + flush_tlb_local(); + + domain_unlock(curr->domain); + + return ret; +} + + +long do_update_descriptor(u64 pa, u64 desc) +{ + struct domain *dom = current->domain; + unsigned long gmfn = pa >> PAGE_SHIFT; + unsigned long mfn; + unsigned int offset; + struct desc_struct *gdt_pent, d; + struct page_info *page; + long ret = -EINVAL; + + offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct); + + *(u64 *)&d = desc; + + page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC); + if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) || + !page || + !check_descriptor(dom, &d) ) + { + if ( page ) + put_page(page); + return -EINVAL; + } + mfn = page_to_mfn(page); + + /* Check if the given frame is in use in an unsafe context. */ + switch ( page->u.inuse.type_info & PGT_type_mask ) + { + case PGT_seg_desc_page: + if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) ) + goto out; + break; + default: + if ( unlikely(!get_page_type(page, PGT_writable_page)) ) + goto out; + break; + } + + paging_mark_dirty(dom, _mfn(mfn)); + + /* All is good so make the update. */ + gdt_pent = map_domain_page(_mfn(mfn)); + write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d); + unmap_domain_page(gdt_pent); + + put_page_type(page); + + ret = 0; /* success */ + + out: + put_page(page); + + return ret; +} + + +/************************* + * Descriptor Tables + */ + +void destroy_gdt(struct vcpu *v) +{ + l1_pgentry_t *pl1e; + unsigned int i; + unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page)); + + v->arch.pv_vcpu.gdt_ents = 0; + pl1e = gdt_ldt_ptes(v->domain, v); + for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ ) + { + pfn = l1e_get_pfn(pl1e[i]); + if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn ) + put_page_and_type(mfn_to_page(pfn)); + l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO)); + v->arch.pv_vcpu.gdt_frames[i] = 0; + } +} + + +long set_gdt(struct vcpu *v, + unsigned long *frames, + unsigned int entries) +{ + struct domain *d = v->domain; + l1_pgentry_t *pl1e; + /* NB. There are 512 8-byte entries per GDT page. */ + unsigned int i, nr_pages = (entries + 511) / 512; + + if ( entries > FIRST_RESERVED_GDT_ENTRY ) + return -EINVAL; + + /* Check the pages in the new GDT. */ + for ( i = 0; i < nr_pages; i++ ) + { + struct page_info *page; + + page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC); + if ( !page ) + goto fail; + if ( !get_page_type(page, PGT_seg_desc_page) ) + { + put_page(page); + goto fail; + } + frames[i] = page_to_mfn(page); + } + + /* Tear down the old GDT. */ + destroy_gdt(v); + + /* Install the new GDT. */ + v->arch.pv_vcpu.gdt_ents = entries; + pl1e = gdt_ldt_ptes(d, v); + for ( i = 0; i < nr_pages; i++ ) + { + v->arch.pv_vcpu.gdt_frames[i] = frames[i]; + l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW)); + } + + return 0; + + fail: + while ( i-- > 0 ) + { + put_page_and_type(mfn_to_page(frames[i])); + } + return -EINVAL; +} + +/************************* + * Writable Pagetables + */ + +struct ptwr_emulate_ctxt { + struct x86_emulate_ctxt ctxt; + unsigned long cr2; + l1_pgentry_t pte; +}; + +static int ptwr_emulated_read( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + unsigned int rc = bytes; + unsigned long addr = offset; + + if ( !__addr_ok(addr) || + (rc = __copy_from_user(p_data, (void *)addr, bytes)) ) + { + x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */ + return X86EMUL_EXCEPTION; + } + + return X86EMUL_OKAY; +} + +static int ptwr_emulated_update( + unsigned long addr, + paddr_t old, + paddr_t val, + unsigned int bytes, + unsigned int do_cmpxchg, + struct ptwr_emulate_ctxt *ptwr_ctxt) +{ + unsigned long mfn; + unsigned long unaligned_addr = addr; + struct page_info *page; + l1_pgentry_t pte, ol1e, nl1e, *pl1e; + struct vcpu *v = current; + struct domain *d = v->domain; + int ret; + + /* Only allow naturally-aligned stores within the original %cr2 page. */ + if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) ) + { + gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n", + ptwr_ctxt->cr2, addr, bytes); + return X86EMUL_UNHANDLEABLE; + } + + /* Turn a sub-word access into a full-word access. */ + if ( bytes != sizeof(paddr_t) ) + { + paddr_t full; + unsigned int rc, offset = addr & (sizeof(paddr_t)-1); + + /* Align address; read full word. */ + addr &= ~(sizeof(paddr_t)-1); + if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 ) + { + x86_emul_pagefault(0, /* Read fault. */ + addr + sizeof(paddr_t) - rc, + &ptwr_ctxt->ctxt); + return X86EMUL_EXCEPTION; + } + /* Mask out bits provided by caller. */ + full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8)); + /* Shift the caller value and OR in the missing bits. */ + val &= (((paddr_t)1 << (bytes*8)) - 1); + val <<= (offset)*8; + val |= full; + /* Also fill in missing parts of the cmpxchg old value. */ + old &= (((paddr_t)1 << (bytes*8)) - 1); + old <<= (offset)*8; + old |= full; + } + + pte = ptwr_ctxt->pte; + mfn = l1e_get_pfn(pte); + page = mfn_to_page(mfn); + + /* We are looking only for read-only mappings of p.t. pages. */ + ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT); + ASSERT(mfn_valid(_mfn(mfn))); + ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table); + ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0); + ASSERT(page_get_owner(page) == d); + + /* Check the new PTE. */ + nl1e = l1e_from_intpte(val); + switch ( ret = get_page_from_l1e(nl1e, d, d) ) + { + default: + if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) && + !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) ) + { + /* + * If this is an upper-half write to a PAE PTE then we assume that + * the guest has simply got the two writes the wrong way round. We + * zap the PRESENT bit on the assumption that the bottom half will + * be written immediately after we return to the guest. + */ + gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %" + PRIpte"\n", l1e_get_intpte(nl1e)); + l1e_remove_flags(nl1e, _PAGE_PRESENT); + } + else + { + gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n"); + return X86EMUL_UNHANDLEABLE; + } + break; + case 0: + break; + case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS: + ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS))); + l1e_flip_flags(nl1e, ret); + break; + } + + adjust_guest_l1e(nl1e, d); + + /* Checked successfully: do the update (write or cmpxchg). */ + pl1e = map_domain_page(_mfn(mfn)); + pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK)); + if ( do_cmpxchg ) + { + int okay; + intpte_t t = old; + ol1e = l1e_from_intpte(old); + + okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e), + &t, l1e_get_intpte(nl1e), _mfn(mfn)); + okay = (okay && t == old); + + if ( !okay ) + { + unmap_domain_page(pl1e); + put_page_from_l1e(nl1e, d); + return X86EMUL_RETRY; + } + } + else + { + ol1e = *pl1e; + if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) ) + BUG(); + } + + trace_ptwr_emulation(addr, nl1e); + + unmap_domain_page(pl1e); + + /* Finally, drop the old PTE. */ + put_page_from_l1e(ol1e, d); + + return X86EMUL_OKAY; +} + +static int ptwr_emulated_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t val = 0; + + if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes ) + { + gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n", + offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + memcpy(&val, p_data, bytes); + + return ptwr_emulated_update( + offset, 0, val, bytes, 0, + container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); +} + +static int ptwr_emulated_cmpxchg( + enum x86_segment seg, + unsigned long offset, + void *p_old, + void *p_new, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + paddr_t old = 0, new = 0; + + if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) ) + { + gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n", + offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + memcpy(&old, p_old, bytes); + memcpy(&new, p_new, bytes); + + return ptwr_emulated_update( + offset, old, new, bytes, 1, + container_of(ctxt, struct ptwr_emulate_ctxt, ctxt)); +} + +static int pv_emul_is_mem_write(const struct x86_emulate_state *state, + struct x86_emulate_ctxt *ctxt) +{ + return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY + : X86EMUL_UNHANDLEABLE; +} + +static const struct x86_emulate_ops ptwr_emulate_ops = { + .read = ptwr_emulated_read, + .insn_fetch = ptwr_emulated_read, + .write = ptwr_emulated_write, + .cmpxchg = ptwr_emulated_cmpxchg, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +/* Write page fault handler: check if guest is trying to modify a PTE. */ +int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, + struct cpu_user_regs *regs) +{ + struct domain *d = v->domain; + struct page_info *page; + l1_pgentry_t pte; + struct ptwr_emulate_ctxt ptwr_ctxt = { + .ctxt = { + .regs = regs, + .vendor = d->arch.cpuid->x86_vendor, + .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, + .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG, + .swint_emulate = x86_swint_emulate_none, + }, + }; + int rc; + + /* Attempt to read the PTE that maps the VA being accessed. */ + guest_get_eff_l1e(addr, &pte); + + /* We are looking only for read-only mappings of p.t. pages. */ + if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) || + rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) || + !get_page_from_pagenr(l1e_get_pfn(pte), d) ) + goto bail; + + page = l1e_get_page(pte); + if ( !page_lock(page) ) + { + put_page(page); + goto bail; + } + + if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table ) + { + page_unlock(page); + put_page(page); + goto bail; + } + + ptwr_ctxt.cr2 = addr; + ptwr_ctxt.pte = pte; + + rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops); + + page_unlock(page); + put_page(page); + + switch ( rc ) + { + case X86EMUL_EXCEPTION: + /* + * This emulation only covers writes to pagetables which are marked + * read-only by Xen. We tolerate #PF (in case a concurrent pagetable + * update has succeeded on a different vcpu). Anything else is an + * emulation bug, or a guest playing with the instruction stream under + * Xen's feet. + */ + if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && + ptwr_ctxt.ctxt.event.vector == TRAP_page_fault ) + pv_inject_event(&ptwr_ctxt.ctxt.event); + else + gdprintk(XENLOG_WARNING, + "Unexpected event (type %u, vector %#x) from emulation\n", + ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector); + + /* Fallthrough */ + case X86EMUL_OKAY: + + if ( ptwr_ctxt.ctxt.retire.singlestep ) + pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); + + /* Fallthrough */ + case X86EMUL_RETRY: + perfc_incr(ptwr_emulations); + return EXCRET_fault_fixed; + } + + bail: + return 0; +} + +/************************* + * fault handling for read-only MMIO pages + */ + +int mmio_ro_emulated_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data; + + /* Only allow naturally-aligned stores at the original %cr2 address. */ + if ( ((bytes | offset) & (bytes - 1)) || !bytes || + offset != mmio_ro_ctxt->cr2 ) + { + gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n", + mmio_ro_ctxt->cr2, offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + return X86EMUL_OKAY; +} + +static const struct x86_emulate_ops mmio_ro_emulate_ops = { + .read = x86emul_unhandleable_rw, + .insn_fetch = ptwr_emulated_read, + .write = mmio_ro_emulated_write, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +int mmcfg_intercept_write( + enum x86_segment seg, + unsigned long offset, + void *p_data, + unsigned int bytes, + struct x86_emulate_ctxt *ctxt) +{ + struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data; + + /* + * Only allow naturally-aligned stores no wider than 4 bytes to the + * original %cr2 address. + */ + if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes || + offset != mmio_ctxt->cr2 ) + { + gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n", + mmio_ctxt->cr2, offset, bytes); + return X86EMUL_UNHANDLEABLE; + } + + offset &= 0xfff; + if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf, + offset, bytes, p_data) >= 0 ) + pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf), + PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes, + *(uint32_t *)p_data); + + return X86EMUL_OKAY; +} + +static const struct x86_emulate_ops mmcfg_intercept_ops = { + .read = x86emul_unhandleable_rw, + .insn_fetch = ptwr_emulated_read, + .write = mmcfg_intercept_write, + .validate = pv_emul_is_mem_write, + .cpuid = pv_emul_cpuid, +}; + +/* Check if guest is trying to modify a r/o MMIO page. */ +int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr, + struct cpu_user_regs *regs) +{ + l1_pgentry_t pte; + unsigned long mfn; + unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG; + struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr }; + struct x86_emulate_ctxt ctxt = { + .regs = regs, + .vendor = v->domain->arch.cpuid->x86_vendor, + .addr_size = addr_size, + .sp_size = addr_size, + .swint_emulate = x86_swint_emulate_none, + .data = &mmio_ro_ctxt + }; + int rc; + + /* Attempt to read the PTE that maps the VA being accessed. */ + guest_get_eff_l1e(addr, &pte); + + /* We are looking only for read-only mappings of MMIO pages. */ + if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ) + return 0; + + mfn = l1e_get_pfn(pte); + if ( mfn_valid(_mfn(mfn)) ) + { + struct page_info *page = mfn_to_page(mfn); + struct domain *owner = page_get_owner_and_reference(page); + + if ( owner ) + put_page(page); + if ( owner != dom_io ) + return 0; + } + + if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + return 0; + + if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) ) + rc = x86_emulate(&ctxt, &mmcfg_intercept_ops); + else + rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops); + + switch ( rc ) + { + case X86EMUL_EXCEPTION: + /* + * This emulation only covers writes to MMCFG space or read-only MFNs. + * We tolerate #PF (from hitting an adjacent page or a successful + * concurrent pagetable update). Anything else is an emulation bug, + * or a guest playing with the instruction stream under Xen's feet. + */ + if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION && + ctxt.event.vector == TRAP_page_fault ) + pv_inject_event(&ctxt.event); + else + gdprintk(XENLOG_WARNING, + "Unexpected event (type %u, vector %#x) from emulation\n", + ctxt.event.type, ctxt.event.vector); + + /* Fallthrough */ + case X86EMUL_OKAY: + + if ( ctxt.retire.singlestep ) + pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC); + + /* Fallthrough */ + case X86EMUL_RETRY: + perfc_incr(ptwr_emulations); + return EXCRET_fault_fixed; + } + + return 0; +} + + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h index e1b3391efc..9580dc32dc 100644 --- a/xen/include/asm-x86/grant_table.h +++ b/xen/include/asm-x86/grant_table.h @@ -17,6 +17,10 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame, unsigned int flags, unsigned int cache_flags); int replace_grant_host_mapping( uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags); +int create_grant_pv_mapping(uint64_t addr, unsigned long frame, + unsigned int flags, unsigned int cache_flags); +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame, + uint64_t new_addr, unsigned int flags); #define gnttab_create_shared_page(d, t, i) \ do { \ diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index 8e55593154..8e2bf91070 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -319,6 +319,8 @@ static inline void *__page_to_virt(const struct page_info *pg) (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg)))); } +int alloc_page_type(struct page_info *page, unsigned long type, + int preemptible); int free_page_type(struct page_info *page, unsigned long type, int preemptible); @@ -364,6 +366,13 @@ int put_old_guest_table(struct vcpu *); int get_page_from_l1e( l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner); void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner); +int get_page_and_type_from_pagenr(unsigned long page_nr, + unsigned long type, + struct domain *d, + int partial, + int preemptible); +int get_page_from_pagenr(unsigned long page_nr, struct domain *d); +void get_page_light(struct page_info *page); static inline void put_page_and_type(struct page_info *page) { diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h index c92cba41a0..8929a7e01c 100644 --- a/xen/include/xen/mm.h +++ b/xen/include/xen/mm.h @@ -161,6 +161,7 @@ int map_pages_to_xen( /* Alter the permissions of a range of Xen virtual address space. */ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags); int destroy_xen_mappings(unsigned long v, unsigned long e); +int update_xen_mappings(unsigned long mfn, unsigned int cacheattr); /* * Create only non-leaf page table entries for the * page range in Xen virtual address space. -- 2.11.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |