Xen project Mailing List

Re: [Xen-devel] [RFC PATCH v2] xen: free_domheap_pages: delay page scrub to idle loop

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Tue, 20 May 2014 09:20:00 +0100

Cc: keir@xxxxxxx, ian.campbell@xxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 20 May 2014 08:20:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 20.05.14 at 04:15, <lliubbo@xxxxxxxxx> wrote: > So I use a percpu scrub page list in this patch, the tradeoff is we may not > use all idle cpus. It depends on free_domheap_pages() runs on which cpu. So this means the time until all memory can be used again a completely arbitrary amount of time can pass, as it depends on the freeing CPU to be idle long enough (many minutes according to your observations) - even if all the rest of the system was idle. I see the problem with the lock contention, but I think an at least slightly more sophisticated solution is going to be needed. > @@ -633,6 +635,9 @@ static struct page_info *alloc_heap_pages( > goto found; > } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */ > > + if ( scrub_free_pages() ) > + continue; This will recover memory only from the current CPU's list - the larger the system, the less likely that this will turn up anything. Furthermore you're creating an almost unbounded loop here - for order > 0 the ability of scrub_pages() to make memory available doesn't mean that on the next iteration the loop wouldn't come back here. > @@ -1417,6 +1422,23 @@ void free_xenheap_pages(void *v, unsigned int order) > #endif > > > +unsigned long scrub_free_pages(void) The return value and the local variable below could easily be unsigned int. > +{ > + struct page_info *pg; > + unsigned long nr_scrubed = 0; > + > + /* Scrub around 400M memory every time */ > + while ( nr_scrubed < 100000 ) Without explanation such a hard coded number wouldn't be acceptable in any case. How long does it take to scrub 400Mb on a _slow_ system? I hope you realize that the amount of work you do here affects the wakeup time of a vCPU supposed to run on the given CPU. > @@ -1564,8 +1586,15 @@ void free_domheap_pages(struct page_info *pg, unsigned > int order) > * domain has died we assume responsibility for erasure. > */ > if ( unlikely(d->is_dying) ) > + { > + /* > + * Add page to page_scrub_list to speed up domain destroy, those > + * pages will be zeroed later by scrub_page_tasklet. > + */ > for ( i = 0; i < (1 << order); i++ ) > - scrub_one_page(&pg[i]); > + page_list_add_tail( &pg[i], &this_cpu(page_scrub_list) ); > + goto out; > + } If done this way, I see no reason why you couldn't add the page in one chunk to the list (i.e. even if order > 0), by making use of PFN_ORDER() to communicate the order to the scrubbing routine. But having sent a v2 patch without the conceptional questions being sorted out I consider kind of odd anyway. I.e. before sending another version I think you need to - explain that the latency gain here outweighs the performance effects on other guests, - explain why alternative approaches (like the suggested flagging of the pages as needing scrubbing during freeing, and doing the scrubbing in the background as well as on the allocation path) are worse, or at least not better than this one (this model may even allow avoiding the scrubbing altogether for hypervisor internal allocations). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.