[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH V4] xen/privcmd: improve-performance-of-mapping-of-guest
Thanks, i'm try. 2012/12/13 Mats Petersson <mats.petersson@xxxxxxxxxx>: > On 13/12/12 11:27, Vasiliy Tolstov wrote: >> >> If i apply this patch to kernel 3.6.7 does it needs to apply another >> patches to work? > > No, it should "just work". Obviously, if it doesn't, let me know. > > -- > Mats > >> >> 2012/12/13 Mats Petersson <mats.petersson@xxxxxxxxxx>: >>> >>> One comment asked for more details on the improvements: >>> Using a small test program to map Guest memory into Dom0 (repeatedly >>> for "Iterations" mapping the same first "Num Pages") >>> Iterations Num Pages Time 3.7rc4 Time With this patch >>> 5000 4096 76.107 37.027 >>> 10000 2048 75.703 37.177 >>> 20000 1024 75.893 37.247 >>> So a little better than twice as fast. >>> >>> Using this patch in migration, using "time" to measure the overall >>> time it take to migrate a guest (Debian Squeeze 6.0, 1024MB guest >>> memory, one network card, one disk, guest at login prompt): >>> Time 3.7rc5 Time With this patch >>> 6.697 5.667 >>> Since migration involves a whole lot of other things, it's only about >>> 15% faster - but still a good improvement. Similar measurement with a >>> guest that is running code to "dirty" memory shows about 23% >>> improvement, as it spends more time copying dirtied memory. >>> >>> As discussed elsewhere, a good deal more can be had from improving the >>> munmap system call, but it is a little tricky to get this in without >>> worsening non-PVOPS kernel, so I will have another look at this. >>> >>> --- >>> Update since last posting: >>> I have just run some benchmarks of a 16GB guest, and the improvement >>> with this patch is around 23-30% for the overall copy time, and 42% >>> shorter downtime on a "busy" guest (writing in a loop to about 8GB of >>> memory). I think this is definitely worth having. >>> >>> The V4 patch consists of cosmetic adjustments (spelling mistake in >>> comment and reversing condition in an if-statement to avoid having an >>> "else" branch, a random empty line added by accident now reverted back >>> to previous state). Thanks to Jan Beulich for the comments. >>> >>> The V3 of the patch contains suggested improvements from: >>> David Vrabel - make it two distinct external functions, doc-comments. >>> Ian Campbell - use one common function for the main work. >>> Jan Beulich - found a bug and pointed out some whitespace problems. >>> >>> >>> >>> Signed-off-by: Mats Petersson <mats.petersson@xxxxxxxxxx> >>> >>> --- >>> arch/x86/xen/mmu.c | 132 >>> +++++++++++++++++++++++++++++++++++++++++++------ >>> drivers/xen/privcmd.c | 55 +++++++++++++++++---- >>> include/xen/xen-ops.h | 5 ++ >>> 3 files changed, 169 insertions(+), 23 deletions(-) >>> >>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c >>> index dcf5f2d..a67774f 100644 >>> --- a/arch/x86/xen/mmu.c >>> +++ b/arch/x86/xen/mmu.c >>> @@ -2477,7 +2477,8 @@ void __init xen_hvm_init_mmu_ops(void) >>> #define REMAP_BATCH_SIZE 16 >>> >>> struct remap_data { >>> - unsigned long mfn; >>> + unsigned long *mfn; >>> + bool contiguous; >>> pgprot_t prot; >>> struct mmu_update *mmu_update; >>> }; >>> @@ -2486,7 +2487,13 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, >>> pgtable_t token, >>> unsigned long addr, void *data) >>> { >>> struct remap_data *rmd = data; >>> - pte_t pte = pte_mkspecial(pfn_pte(rmd->mfn++, rmd->prot)); >>> + pte_t pte = pte_mkspecial(pfn_pte(*rmd->mfn, rmd->prot)); >>> + /* If we have a contigious range, just update the mfn itself, >>> + else update pointer to be "next mfn". */ >>> + if (rmd->contiguous) >>> + (*rmd->mfn)++; >>> + else >>> + rmd->mfn++; >>> >>> rmd->mmu_update->ptr = virt_to_machine(ptep).maddr; >>> rmd->mmu_update->val = pte_val_ma(pte); >>> @@ -2495,16 +2502,34 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, >>> pgtable_t token, >>> return 0; >>> } >>> >>> -int xen_remap_domain_mfn_range(struct vm_area_struct *vma, >>> - unsigned long addr, >>> - unsigned long mfn, int nr, >>> - pgprot_t prot, unsigned domid) >>> -{ >>> +/* do_remap_mfn() - helper function to map foreign pages >>> + * @vma: the vma for the pages to be mapped into >>> + * @addr: the address at which to map the pages >>> + * @mfn: pointer to array of MFNs to map >>> + * @nr: the number entries in the MFN array >>> + * @err_ptr: pointer to array >>> + * @prot: page protection mask >>> + * @domid: id of the domain that we are mapping from >>> + * >>> + * This function takes an array of mfns and maps nr pages from that into >>> + * this kernel's memory. The owner of the pages is defined by domid. >>> Where the >>> + * pages are mapped is determined by addr, and vma is used for >>> "accounting" of >>> + * the pages. >>> + * >>> + * Return value is zero for success, negative for failure. >>> + * >>> + * Note that err_ptr is used to indicate whether *mfn >>> + * is a list or a "first mfn of a contiguous range". */ >>> +static int do_remap_mfn(struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long *mfn, int nr, >>> + int *err_ptr, pgprot_t prot, >>> + unsigned domid) >>> +{ >>> + int err, last_err = 0; >>> struct remap_data rmd; >>> struct mmu_update mmu_update[REMAP_BATCH_SIZE]; >>> - int batch; >>> unsigned long range; >>> - int err = 0; >>> >>> if (xen_feature(XENFEAT_auto_translated_physmap)) >>> return -EINVAL; >>> @@ -2515,9 +2540,15 @@ int xen_remap_domain_mfn_range(struct >>> vm_area_struct *vma, >>> >>> rmd.mfn = mfn; >>> rmd.prot = prot; >>> + /* We use the err_ptr to indicate if there we are doing a >>> contigious >>> + * mapping or a discontigious mapping. */ >>> + rmd.contiguous = !err_ptr; >>> >>> while (nr) { >>> - batch = min(REMAP_BATCH_SIZE, nr); >>> + int index = 0; >>> + int done = 0; >>> + int batch = min(REMAP_BATCH_SIZE, nr); >>> + int batch_left = batch; >>> range = (unsigned long)batch << PAGE_SHIFT; >>> >>> rmd.mmu_update = mmu_update; >>> @@ -2526,19 +2557,92 @@ int xen_remap_domain_mfn_range(struct >>> vm_area_struct *vma, >>> if (err) >>> goto out; >>> >>> - err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, >>> domid); >>> - if (err < 0) >>> - goto out; >>> + /* We record the error for each page that gives an error, >>> but >>> + * continue mapping until the whole set is done */ >>> + do { >>> + err = HYPERVISOR_mmu_update(&mmu_update[index], >>> + batch_left, &done, >>> domid); >>> + if (err < 0) { >>> + if (!err_ptr) >>> + goto out; >>> + /* increment done so we skip the error >>> item */ >>> + done++; >>> + last_err = err_ptr[index] = err; >>> + } >>> + batch_left -= done; >>> + index += done; >>> + } while (batch_left); >>> >>> nr -= batch; >>> addr += range; >>> + if (err_ptr) >>> + err_ptr += batch; >>> } >>> >>> - err = 0; >>> + err = last_err; >>> out: >>> >>> xen_flush_tlb_all(); >>> >>> return err; >>> } >>> + >>> +/* xen_remap_domain_mfn_range() - Used to map foreign pages >>> + * @vma: the vma for the pages to be mapped into >>> + * @addr: the address at which to map the pages >>> + * @mfn: the first MFN to map >>> + * @nr: the number of consecutive mfns to map >>> + * @prot: page protection mask >>> + * @domid: id of the domain that we are mapping from >>> + * >>> + * This function takes a mfn and maps nr pages on from that into this >>> kernel's >>> + * memory. The owner of the pages is defined by domid. Where the pages >>> are >>> + * mapped is determined by addr, and vma is used for "accounting" of the >>> + * pages. >>> + * >>> + * Return value is zero for success, negative ERRNO value for failure. >>> + */ >>> +int xen_remap_domain_mfn_range(struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long mfn, int nr, >>> + pgprot_t prot, unsigned domid) >>> +{ >>> + return do_remap_mfn(vma, addr, &mfn, nr, NULL, prot, domid); >>> +} >>> EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range); >>> + >>> +/* xen_remap_domain_mfn_array() - Used to map an array of foreign pages >>> + * @vma: the vma for the pages to be mapped into >>> + * @addr: the address at which to map the pages >>> + * @mfn: pointer to array of MFNs to map >>> + * @nr: the number entries in the MFN array >>> + * @err_ptr: pointer to array of integers, one per MFN, for an error >>> + * value for each page. The err_ptr must not be NULL. >>> + * @prot: page protection mask >>> + * @domid: id of the domain that we are mapping from >>> + * >>> + * This function takes an array of mfns and maps nr pages from that into >>> this >>> + * kernel's memory. The owner of the pages is defined by domid. Where >>> the pages >>> + * are mapped is determined by addr, and vma is used for "accounting" of >>> the >>> + * pages. The err_ptr array is filled in on any page that is not >>> sucessfully >>> + * mapped in. >>> + * >>> + * Return value is zero for success, negative ERRNO value for failure. >>> + * Note that the error value -ENOENT is considered a "retry", so when >>> this >>> + * error code is seen, another call should be made with the list of >>> pages that >>> + * are marked as -ENOENT in the err_ptr array. >>> + */ >>> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long *mfn, int nr, >>> + int *err_ptr, pgprot_t prot, >>> + unsigned domid) >>> +{ >>> + /* We BUG_ON because it's a programmer error to pass a NULL >>> err_ptr, >>> + * and the consequences later is quite hard to detect what the >>> actual >>> + * cause of "wrong memory was mapped in". >>> + */ >>> + BUG_ON(err_ptr == NULL); >>> + return do_remap_mfn(vma, addr, mfn, nr, err_ptr, prot, domid); >>> +} >>> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array); >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >>> index 71f5c45..75f6e86 100644 >>> --- a/drivers/xen/privcmd.c >>> +++ b/drivers/xen/privcmd.c >>> @@ -151,6 +151,40 @@ static int traverse_pages(unsigned nelem, size_t >>> size, >>> return ret; >>> } >>> >>> +/* >>> + * Similar to traverse_pages, but use each page as a "block" of >>> + * data to be processed as one unit. >>> + */ >>> +static int traverse_pages_block(unsigned nelem, size_t size, >>> + struct list_head *pos, >>> + int (*fn)(void *data, int nr, void >>> *state), >>> + void *state) >>> +{ >>> + void *pagedata; >>> + unsigned pageidx; >>> + int ret = 0; >>> + >>> + BUG_ON(size > PAGE_SIZE); >>> + >>> + pageidx = PAGE_SIZE; >>> + >>> + while (nelem) { >>> + int nr = (PAGE_SIZE/size); >>> + struct page *page; >>> + if (nr > nelem) >>> + nr = nelem; >>> + pos = pos->next; >>> + page = list_entry(pos, struct page, lru); >>> + pagedata = page_address(page); >>> + ret = (*fn)(pagedata, nr, state); >>> + if (ret) >>> + break; >>> + nelem -= nr; >>> + } >>> + >>> + return ret; >>> +} >>> + >>> struct mmap_mfn_state { >>> unsigned long va; >>> struct vm_area_struct *vma; >>> @@ -250,7 +284,7 @@ struct mmap_batch_state { >>> * 0 for no errors >>> * 1 if at least one error has happened (and no >>> * -ENOENT errors have happened) >>> - * -ENOENT if at least 1 -ENOENT has happened. >>> + * -ENOENT if at least one -ENOENT has happened. >>> */ >>> int global_error; >>> /* An array for individual errors */ >>> @@ -260,17 +294,20 @@ struct mmap_batch_state { >>> xen_pfn_t __user *user_mfn; >>> }; >>> >>> -static int mmap_batch_fn(void *data, void *state) >>> +static int mmap_batch_fn(void *data, int nr, void *state) >>> { >>> xen_pfn_t *mfnp = data; >>> struct mmap_batch_state *st = state; >>> int ret; >>> >>> - ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, >>> *mfnp, 1, >>> - st->vma->vm_page_prot, >>> st->domain); >>> + BUG_ON(nr < 0); >>> >>> - /* Store error code for second pass. */ >>> - *(st->err++) = ret; >>> + ret = xen_remap_domain_mfn_array(st->vma, >>> + st->va & PAGE_MASK, >>> + mfnp, nr, >>> + st->err, >>> + st->vma->vm_page_prot, >>> + st->domain); >>> >>> /* And see if it affects the global_error. */ >>> if (ret < 0) { >>> @@ -282,7 +319,7 @@ static int mmap_batch_fn(void *data, void *state) >>> st->global_error = 1; >>> } >>> } >>> - st->va += PAGE_SIZE; >>> + st->va += PAGE_SIZE * nr; >>> >>> return 0; >>> } >>> @@ -378,8 +415,8 @@ static long privcmd_ioctl_mmap_batch(void __user >>> *udata, int version) >>> state.err = err_array; >>> >>> /* mmap_batch_fn guarantees ret == 0 */ >>> - BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t), >>> - &pagelist, mmap_batch_fn, &state)); >>> + BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t), >>> + &pagelist, mmap_batch_fn, &state)); >>> >>> up_write(&mm->mmap_sem); >>> >>> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h >>> index 6a198e4..22cad75 100644 >>> --- a/include/xen/xen-ops.h >>> +++ b/include/xen/xen-ops.h >>> @@ -29,4 +29,9 @@ int xen_remap_domain_mfn_range(struct vm_area_struct >>> *vma, >>> unsigned long mfn, int nr, >>> pgprot_t prot, unsigned domid); >>> >>> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma, >>> + unsigned long addr, >>> + unsigned long *mfn, int nr, >>> + int *err_ptr, pgprot_t prot, >>> + unsigned domid); >>> #endif /* INCLUDE_XEN_OPS_H */ >>> -- >>> 1.7.9.5 >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@xxxxxxxxxxxxx >>> http://lists.xen.org/xen-devel >> >> >> >> -- >> Vasiliy Tolstov, >> Clodo.ru >> e-mail: v.tolstov@xxxxxxxxx >> jabber: vase@xxxxxxxxx > > -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@xxxxxxxxx jabber: vase@xxxxxxxxx _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |