[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [RFC/PATCH] Improve speed of mapping guest memory into Dom0
On 14/11/12 13:08, David Vrabel wrote:
Mats,
Your patch has been white-space damaged and will not apply. You should
use git send-email which will do the right thing. I've also CC'd Konrad
who is the maintainer for the Xen parts of the kernel.
Thanks for the advice. Will use git send-email for V2 of this patch.
On 14/11/12 11:13, Mats Petersson wrote:
[a long, rambling commit message?]
The text as-is isn't really suitable for a commit message (too much
irrelevant stuff) and there is no suitable subject line.
I have also found that the munmap() call used to unmap the guest memory
from Dom0 is about 35% slower in 3.7 kernel than in the 2.6 kernel (3.8M
cycles vs 2.8M cycles).
This performance reduction only occurs with 32-bit guests is the Xen
then traps-and-emulates both halves of the PTE write.
I think this could be made quicker by using a
direct write of zero rather than the compare exchange operation that is
currently used [which traps into Xen, performs the compare & exchange] -
This is something I noticed but never got around to producing a patch.
How about this (uncomplied!) patch?
-- a/mm/memory.c
+++ b/mm/memory.c
@@ -1146,8 +1146,16 @@ again:
page->index > details->last_index))
continue;
}
- ptent = ptep_get_and_clear_full(mm, addr, pte,
- tlb->fullmm);
+ /*
+ * No need for the expensive atomic get and
+ * clear for anonymous mappings as the dirty
+ * and young bits are not used.
+ */
+ if (PageAnon(page))
+ pte_clear(mm, addr, pte);
+ else
+ ptent = ptep_get_and_clear_full(mm, addr, pte,
+ tlb->fullmm);
tlb_remove_tlb_entry(tlb, pte, addr);
if (unlikely(!page))
continue;
I came up with something very similar, however found that page is
sometimes NULL which causes problems. if (page && PageAnon(page)) will
probably work... Trying that now.
Now for the patch itself. On the whole, the approach looks good and the
real-word performance improvements are nice. Specific comments inline.
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2542,3 +2561,77 @@ out:
return err;
}
EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_range);
+
+/* like xen_remap_domain_mfn_range, but does a list of mfn's, rather
+ * than the, for xen, quite useless, consecutive pages.
+ */
/*
* Like xen_remap_domain_mfn_range(), but more efficiently handles MFNs
* that are not contiguous (which is common for a domain's memory).
*/
Will update.
+int xen_remap_domain_mfn_list(struct vm_area_struct *vma,
+ unsigned long addr,
+ unsigned long *mfn, int nr,
+ int *err_ptr,
+ pgprot_t prot, unsigned domid)
xen_remap_domain_mfn_array() ? It's not taking a list.
Yes, better name. Although, perhaps following Ian Campbell's suggestion
and changing the xen_remain_domain_mfn_range to "do the right thing" is
even better. I will have a quick look at how difficult that may be - I
feel it may be rather simpler than I first thought, as there is only one
call to this function in privcmd.c.
The arm/xen/enlighten.c will probably also need fixing, I suppose...
+{
+ struct remap_list_data rmd;
+ struct mmu_update mmu_update[REMAP_BATCH_SIZE];
This is surprisingly large (256 bytes) but I note that the existing
xen_remap_domain_mfn_range() does the same thing so I guess it's ok.
I would actually like to increase it, but that would probably require
allocation. One HV call per 32
+ int batch;
+ int done;
+ unsigned long range;
+ int err = 0;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return -EINVAL;
+
+ prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
+
+ BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP |
VM_IO)));
+
+ rmd.mfn = mfn;
+ rmd.prot = prot;
+
+ while (nr) {
+ batch = min(REMAP_BATCH_SIZE, nr);
+ range = (unsigned long)batch << PAGE_SHIFT;
+
+ rmd.mmu_update = mmu_update;
+ err = apply_to_page_range(vma->vm_mm, addr, range,
+ remap_area_mfn_list_pte_fn, &rmd);
+ if (err)
+ {
+ printk("xen_remap_domain_mfn_list: apply_to_range:
err=%d\n", err);
Stray printk?
Yes.
+ goto out;
+ }
+
+ err = HYPERVISOR_mmu_update(mmu_update, batch, &done, domid);
+ if (err < 0)
+ {
+ int i;
+ /* TODO: We should remove this printk later */
+ printk("xen_remap_domain_mfn_list: mmu_update: err=%d,
done=%d, batch=%d\n", err, done, batch);
Yes, you should...
+ err_ptr[done] = err;
+
+ /* now do the remaining part of this batch */
+ for(i = done+1; i < batch; i++)
+ {
+ int tmp_err = HYPERVISOR_mmu_update(&mmu_update[i], 1,
NULL, domid);
+ if (tmp_err < 0)
+ {
+ err_ptr[i] = tmp_err;
+ }
+ }
There's no need to fall back to doing it page-by-page here. You can
still batch the remainder.
Yes, I agree.
+
+ goto out;
+ }
+
+ nr -= batch;
+ addr += range;
+ err_ptr += batch;
+ }
+
+ err = 0;
+out:
+
+ xen_flush_tlb_all();
Probably not that important anymore since we would now do far fewer TLB
flushes, but this TLB flush is only needed by the PTEs being updated
were already present -- if they're all clear then TLB flush can be
omitted entirely.
The 2.6 kernel actually had a check BUG_ON(!pte_none(pte)) [or something
like that], but the 3.x kernel seems to have "lost" that. Either way,
once per, say, 1024 pages isn't measurable in my benchmarks.
+
+ return err;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_list);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 8adb9cc..b39a7b7 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -151,6 +151,41 @@ static int traverse_pages(unsigned nelem, size_t size,
return ret;
}
+/*
+ * Similar to traverse_pages, but use each page as a "block" of
+ * data to be processed as one unit.
+ */
+static int traverse_pages_block(unsigned nelem, size_t size,
+ struct list_head *pos,
+ int (*fn)(void *data, int nr, void *state),
+ void *state)
+{
+ void *pagedata;
+ unsigned pageidx;
+ int ret = 0;
+
+ BUG_ON(size > PAGE_SIZE);
+
+ pageidx = PAGE_SIZE;
+ pagedata = NULL; /* hush, gcc */
What was gcc upset about? I don't see anything it could get confused about.
This comes from the original traverse_pages code, which does this. I
will see if it compiles without warnings if I remove the initialization.
@@ -260,17 +295,17 @@ struct mmap_batch_state {
xen_pfn_t __user *user_mfn;
};
-static int mmap_batch_fn(void *data, void *state)
+static int mmap_batch_fn(void *data, int nr, void *state)
{
xen_pfn_t *mfnp = data;
+
struct mmap_batch_state *st = state;
int ret;
- ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK,
*mfnp, 1,
- st->vma->vm_page_prot, st->domain);
+ BUG_ON(nr < 0);
Is this BUG_ON() useful?
I think very bad things will happen when nr is less than zero, but I
guess they will be pretty obvious... So perhaps just letting it die from
the consequences. It does "while(nr)" in the remap function, and then
min(REMAP_BATCH_SIZE, nr) an various other things that will definitely
not work as expected if nr is negative. I thought it's much easier to
detect when it's gone wrong if you assert early tho'.
--
Mats
David
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|