Xen project Mailing List

Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write fault for dirty-page tracing

To: 'Ian Campbell' <ian.campbell@xxxxxxxxxx>

From: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>

Date: Tue, 20 Aug 2013 19:15:31 +0900

Cc: xen-devel@xxxxxxxxxxxxx, 'Stefano Stabellini' <stefano.stabellini@xxxxxxxxxxxxx>

Delivery-date: Tue, 20 Aug 2013 10:16:00 +0000

Dlp-filter: Pass

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac6bl5yZT/BgnmBlQkKJ6+oraK8SvwB9XCOA

> -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > bounces@xxxxxxxxxxxxx] On Behalf Of Ian Campbell > Sent: Sunday, August 18, 2013 7:16 AM > To: Jaeyong Yoo > Cc: 'Stefano Stabellini'; xen-devel@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] [PATCH v3 07/10] xen/arm: Add handling write > fault for dirty-page tracing > > On Thu, 2013-08-15 at 13:24 +0900, Jaeyong Yoo wrote: > > > > Why don't we just context switch the slots for now, only for domains > > > where log dirty is enabled, and then we can measure and see how bad it > is etc. > > > > > > Here goes the measurement results: > > Wow, that was quick, thanks. Your explanation with ascii art does help a lot. Thanks again! > > > For better understanding of trade-off between vlpt and page-table walk > > in dirty-page handling, let's consider the following two cases: > > - Migrating a single domain at a time: > > - Migrating multiple domains concurrently: > > > > For each case, the metrics that we are going to see is the following: > > - page-table walk overhead: for handling a single dirty-page, > > page-table requires 6us and vlpt (improved version) requires 1.5us. > > From this, we consider 4.5 us for pure overhead compared to vlpt. > > And it happens every dirty-pages. > > map_domain_page is has a hash table structure in which the PTE entires are > reference counted, however we don't clear the pte when the ref reaches 0 > so if we immediately use it again we don't need to flush. But we may need > to flush if there is a hash table collision. So in practice there will be > a bit more overhead, I'm not sure how significant that will be. I suppose > the chance of collision depends on the side of the guest. Yes, right. Overhead for unmap_domain_page may be under-estimated. > > > - vlpt overhead: the only vlpt overhead is the flushes at context > > switch. And flushing 34MB (which is for supporting 16GB domU) > > virtual address range requires 130us. And it happens when two > > migrating domUs are contexted switched. > > > > Here goes the results: > > > > - Migrating a domain at a time: > > * page-table walk overhead: 4.5us * 611 times = 2.7ms > > * vlpt overhead: 0 (no flush required) > > > > - Migrating two domains concurrently: > > * page-table walk overhead: 4.5us * 8653 times = 39 ms > > * vlpt overhead: 130us * 357 times = 46 ms > > The 611, 8653 and 357's in here are from an actual test, right? > > Out of interest what was the total time for each case? > > > Although page-table walk gives little bit better performance in > > migrating two domains, I think it is better to choose vlpt due to the > > following reasons: > > - In the above tests, I did not run any workloads at migrating domU, > > and IIRC, when I run gzip or bonnie++ in domU, the dirty-pages grow > > to few thousands. Then, page-table walk overhead becomes few hundred > > milli-seconds even in migrating a domain. > > - I would expect that migrating a single domain would be used more > > Frequently than migrating multiple domains at a time. > > Both of those seem like sound arguments to me. > > > One more thing: regarding your comments about tlb lockdown, which is: > > > It occurs to me now that with 16 slots changing on context switch > > > and a further 16 aliasing them (and hence requiring maintenance too) > > > for the super pages it is possible that the TLB maintenance at > > > context switch might get prohibitively expensive. We could address > > > this by firstly only doing it when switching to/from domains which > > > have log dirty mode enabled and then secondly by seeing if we can > > > make use of global or locked down mappings for the static Xen > > > .text/.data/.xenheap mappings and therefore allow us to use a bigger > global flush. > > > > Unfortunately Cortex A15 looks like not supporting tlb lockdown. > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438d/C > > HDGEDA > > E.html > > Oh well. > > > And, I am not sure that setting global of page table entry prevents > > being flushed from TLB flush operation. > > If it works, we may decrease the vlpt overhead a lot. > > yes, this is something to investigate, but not urgently I don't think. Got it. Making it absolutely stable is more important, I think. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.