Xen project Mailing List

Re: [Xen-devel] [PATCH v2 3/3] xen: use idle vcpus to scrub pages

On 07/01/2014 05:12 PM, Jan Beulich wrote: >>>> On 30.06.14 at 15:39, <lliubbo@xxxxxxxxx> wrote: >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -116,6 +116,7 @@ static void idle_loop(void) >> { >> if ( cpu_is_offline(smp_processor_id()) ) >> play_dead(); >> + scrub_free_pages(); >> (*pm_idle)(); > > So is it really appropriate to call pm_idle() if scrub_free_pages() didn't > return immediately? I.e. I would suppose the function ought to return > a boolean indicator whether any meaningful amount of processing > was done, in which case we may want to skip entering any kind of C > state (even if it would end up being just C1), or doing any of the > processing associated with this possible entering. > Thanks, I'll add a return value for scrub_free_pages() and skip pm_idle if necessary. >> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c >> index ab293c8..6ab1d1d 100644 >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -86,6 +86,12 @@ PAGE_LIST_HEAD(page_offlined_list); >> /* Broken page list, protected by heap_lock. */ >> PAGE_LIST_HEAD(page_broken_list); >> >> +/* A rough flag to indicate whether a node have need_scrub pages */ >> +static bool_t node_need_scrub[MAX_NUMNODES]; >> +static DEFINE_PER_CPU(bool_t, is_scrubbing); >> +static DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu); >> +static DEFINE_PER_CPU(struct page_list_head, free_list_cpu); >> + >> /************************* >> * BOOT-TIME ALLOCATOR >> */ >> @@ -948,6 +954,7 @@ static void free_heap_pages( >> { >> if ( !tainted ) >> { >> + node_need_scrub[node] = 1; >> for ( i = 0; i < (1 << order); i++ ) >> pg[i].count_info |= PGC_need_scrub; >> } > > Iirc it was more than this single place where you set > PGC_need_scrub, and hence where you'd now need to set the > other flag too. > I'm afraid this is the only place where PGC_need_scrub was set. >> @@ -1525,7 +1532,130 @@ void __init scrub_heap_pages(void) >> setup_low_mem_virq(); >> } >> >> +#define SCRUB_BATCH_ORDER 12 > > Please make this adjustable via command line, so that eventual > latency issues can be worked around. > Okay. >> +static void __scrub_free_pages(unsigned int node, unsigned int cpu) >> +{ >> + struct page_info *pg, *tmp; >> + unsigned int i; >> + int order; >> + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu); >> + struct page_list_head *local_free_list = &this_cpu(free_list_cpu); >> + >> + /* Scrub percpu list */ >> + while ( !page_list_empty(local_scrub_list) ) >> + { >> + pg = page_list_remove_head(local_scrub_list); >> + order = PFN_ORDER(pg); >> + ASSERT( pg && order <= SCRUB_BATCH_ORDER ); >> + for ( i = 0; i < (1 << order); i++ ) >> + { >> + ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) ); >> + scrub_one_page(&pg[i]); >> + } >> + page_list_add_tail(pg, local_free_list); >> + if ( softirq_pending(cpu) ) >> + return; > > Hard tabs. Please, also with further violations below in mind, check > your code before submitting. > I'm sorry for all of the coding style problems. By the way is there any script which can be used to check the code before submitting? Something like ./scripts/checkpatch.pl under linux. >> + } >> + >> + /* free percpu free list */ >> + if ( !page_list_empty(local_free_list) ) >> + { >> + spin_lock(&heap_lock); >> + page_list_for_each_safe( pg, tmp, local_free_list ) >> + { >> + order = PFN_ORDER(pg); >> + page_list_del(pg, local_free_list); >> + for ( i = 0; i < (1 << order); i++ ) >> + { >> + pg[i].count_info |= PGC_state_free; >> + pg[i].count_info &= ~PGC_need_scrub; > > This needs to happen earlier - the scrub flag should be cleared right > after scrubbing, and the free flag should imo be set when the page > gets freed. That's for two reasons: > 1) Hypervisor allocations don't need scrubbed pages, i.e. they can > allocate memory regardless of the scrub flag's state. AFAIR, the reason I set those flags here is to avoid a panic happen. > 2) You still detain the memory on the local lists from allocation. On a > many-node system, the 16Mb per node can certainly sum up (which > is not to say that I don't view the 16Mb on a single node as already > problematic). > Right, but we can adjust SCRUB_BATCH_ORDER. Anyway I'll take a retry as you suggested. >> + } >> + merge_free_trunks(pg, order, node, page_to_zone(pg), 0); >> + } >> + spin_unlock(&heap_lock); >> + } >> +} >> + >> +void scrub_free_pages(void) >> +{ >> + int order; >> + struct page_info *pg, *tmp; >> + unsigned int i, zone, nr_delisted = 0; >> + unsigned int cpu = smp_processor_id(); >> + unsigned int node = cpu_to_node(cpu); >> + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu); >> + >> + /* Return if our sibling already started scrubbing */ >> + for_each_cpu( i, per_cpu(cpu_sibling_mask,cpu) ) >> + if ( per_cpu(is_scrubbing, i) ) >> + return; >> + this_cpu(is_scrubbing) = 1; > > If you really want to enforce exclusiveness, you need a single > canonical flag per core (could e.g. be > per_cpu(is_scrubbing, cpumask_first(per_cpu(cpu_sibling_mask, cpu))), > set via test_and_set_bool()). Will be updated. > >> + >> + while ( !softirq_pending(cpu) ) >> + { >> + if ( !node_need_scrub[node] ) >> + { >> + /* Free local per cpu list before we exit */ >> + __scrub_free_pages(node, cpu); >> + goto out; >> + } > > This seems unnecessary: Just have the if() below be > > if ( node_need_scrub[node] && page_list_empty(local_scrub_list) ) > > along with __scrub_free_pages() returning whether it exited because > of softirq_pending(cpu) (which at once eliminates the need for the > check at the loop header above, allowing this to become a nice > do { ... } while ( !__scrub_free_pages() ). > Same here. >> + >> + /* Delist a batch of pages from global scrub list */ >> + if ( page_list_empty(local_scrub_list) ) >> + { >> + spin_lock(&heap_lock); >> + for ( zone = 0; zone < NR_ZONES; zone++ ) >> + { >> + for ( order = MAX_ORDER; order >= 0; order-- ) >> + { >> + page_list_for_each_safe( pg, tmp, &heap(node, zone, >> order) ) >> + { >> + if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) ) > > Please avoid the inner parentheses here. > >> + continue; >> + >> + page_list_del( pg, &heap(node, zone, order) ); >> + if ( order > SCRUB_BATCH_ORDER) > > Coding style. > >> + { >> + /* putback extra pages */ >> + i = order; >> + while ( i != SCRUB_BATCH_ORDER ) > > This would better be a do/while construct - on the first iteration you > already know the condition is true. > >> + { >> + PFN_ORDER(pg) = --i; >> + page_list_add_tail(pg, &heap(node, zone, >> i)); >> + pg += 1 << i; > > I realize there are cases where this is also not being done correctly in > existing code, but please use 1UL here and in all similar cases. > Sure. >> + } >> + PFN_ORDER(pg) = SCRUB_BATCH_ORDER; >> + } >> + >> + for ( i = 0; i < (1 << PFN_ORDER(pg)); i++ ) > > Can't you just use "order" here (and a few lines down)? > Then I have to use another temporary variable. Anyway, I'll make a update. >> + { >> + ASSERT( test_bit(_PGC_need_scrub, >> &pg[i].count_info) ); >> + ASSERT( !test_bit(_PGC_broken, >> &pg[i].count_info) ); >> + mark_page_offline(&pg[i], 0); > > mark_page_offline() ??? > Yes, it's a bit tricky here and I don't have a good idea right now. The problem is when free a page frame we have to avoid merging with pages on percpu scrub/free list, so I marked pages offline temporarily while adding to percpu lists. Another thing I'm still not clear about is how to handle the situation if #mc happened for pages on percpu scrub/free list. Thank you very much for your review. -Bob _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.