|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 3/3] xen: use idle vcpus to scrub pages
>>> On 30.06.14 at 15:39, <lliubbo@xxxxxxxxx> wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -116,6 +116,7 @@ static void idle_loop(void)
> {
> if ( cpu_is_offline(smp_processor_id()) )
> play_dead();
> + scrub_free_pages();
> (*pm_idle)();
So is it really appropriate to call pm_idle() if scrub_free_pages() didn't
return immediately? I.e. I would suppose the function ought to return
a boolean indicator whether any meaningful amount of processing
was done, in which case we may want to skip entering any kind of C
state (even if it would end up being just C1), or doing any of the
processing associated with this possible entering.
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index ab293c8..6ab1d1d 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -86,6 +86,12 @@ PAGE_LIST_HEAD(page_offlined_list);
> /* Broken page list, protected by heap_lock. */
> PAGE_LIST_HEAD(page_broken_list);
>
> +/* A rough flag to indicate whether a node have need_scrub pages */
> +static bool_t node_need_scrub[MAX_NUMNODES];
> +static DEFINE_PER_CPU(bool_t, is_scrubbing);
> +static DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu);
> +static DEFINE_PER_CPU(struct page_list_head, free_list_cpu);
> +
> /*************************
> * BOOT-TIME ALLOCATOR
> */
> @@ -948,6 +954,7 @@ static void free_heap_pages(
> {
> if ( !tainted )
> {
> + node_need_scrub[node] = 1;
> for ( i = 0; i < (1 << order); i++ )
> pg[i].count_info |= PGC_need_scrub;
> }
Iirc it was more than this single place where you set
PGC_need_scrub, and hence where you'd now need to set the
other flag too.
> @@ -1525,7 +1532,130 @@ void __init scrub_heap_pages(void)
> setup_low_mem_virq();
> }
>
> +#define SCRUB_BATCH_ORDER 12
Please make this adjustable via command line, so that eventual
latency issues can be worked around.
> +static void __scrub_free_pages(unsigned int node, unsigned int cpu)
> +{
> + struct page_info *pg, *tmp;
> + unsigned int i;
> + int order;
> + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
> + struct page_list_head *local_free_list = &this_cpu(free_list_cpu);
> +
> + /* Scrub percpu list */
> + while ( !page_list_empty(local_scrub_list) )
> + {
> + pg = page_list_remove_head(local_scrub_list);
> + order = PFN_ORDER(pg);
> + ASSERT( pg && order <= SCRUB_BATCH_ORDER );
> + for ( i = 0; i < (1 << order); i++ )
> + {
> + ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) );
> + scrub_one_page(&pg[i]);
> + }
> + page_list_add_tail(pg, local_free_list);
> + if ( softirq_pending(cpu) )
> + return;
Hard tabs. Please, also with further violations below in mind, check
your code before submitting.
> + }
> +
> + /* free percpu free list */
> + if ( !page_list_empty(local_free_list) )
> + {
> + spin_lock(&heap_lock);
> + page_list_for_each_safe( pg, tmp, local_free_list )
> + {
> + order = PFN_ORDER(pg);
> + page_list_del(pg, local_free_list);
> + for ( i = 0; i < (1 << order); i++ )
> + {
> + pg[i].count_info |= PGC_state_free;
> + pg[i].count_info &= ~PGC_need_scrub;
This needs to happen earlier - the scrub flag should be cleared right
after scrubbing, and the free flag should imo be set when the page
gets freed. That's for two reasons:
1) Hypervisor allocations don't need scrubbed pages, i.e. they can
allocate memory regardless of the scrub flag's state.
2) You still detain the memory on the local lists from allocation. On a
many-node system, the 16Mb per node can certainly sum up (which
is not to say that I don't view the 16Mb on a single node as already
problematic).
> + }
> + merge_free_trunks(pg, order, node, page_to_zone(pg), 0);
> + }
> + spin_unlock(&heap_lock);
> + }
> +}
> +
> +void scrub_free_pages(void)
> +{
> + int order;
> + struct page_info *pg, *tmp;
> + unsigned int i, zone, nr_delisted = 0;
> + unsigned int cpu = smp_processor_id();
> + unsigned int node = cpu_to_node(cpu);
> + struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
> +
> + /* Return if our sibling already started scrubbing */
> + for_each_cpu( i, per_cpu(cpu_sibling_mask,cpu) )
> + if ( per_cpu(is_scrubbing, i) )
> + return;
> + this_cpu(is_scrubbing) = 1;
If you really want to enforce exclusiveness, you need a single
canonical flag per core (could e.g. be
per_cpu(is_scrubbing, cpumask_first(per_cpu(cpu_sibling_mask, cpu))),
set via test_and_set_bool()).
> +
> + while ( !softirq_pending(cpu) )
> + {
> + if ( !node_need_scrub[node] )
> + {
> + /* Free local per cpu list before we exit */
> + __scrub_free_pages(node, cpu);
> + goto out;
> + }
This seems unnecessary: Just have the if() below be
if ( node_need_scrub[node] && page_list_empty(local_scrub_list) )
along with __scrub_free_pages() returning whether it exited because
of softirq_pending(cpu) (which at once eliminates the need for the
check at the loop header above, allowing this to become a nice
do { ... } while ( !__scrub_free_pages() ).
> +
> + /* Delist a batch of pages from global scrub list */
> + if ( page_list_empty(local_scrub_list) )
> + {
> + spin_lock(&heap_lock);
> + for ( zone = 0; zone < NR_ZONES; zone++ )
> + {
> + for ( order = MAX_ORDER; order >= 0; order-- )
> + {
> + page_list_for_each_safe( pg, tmp, &heap(node, zone,
> order) )
> + {
> + if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) )
Please avoid the inner parentheses here.
> + continue;
> +
> + page_list_del( pg, &heap(node, zone, order) );
> + if ( order > SCRUB_BATCH_ORDER)
Coding style.
> + {
> + /* putback extra pages */
> + i = order;
> + while ( i != SCRUB_BATCH_ORDER )
This would better be a do/while construct - on the first iteration you
already know the condition is true.
> + {
> + PFN_ORDER(pg) = --i;
> + page_list_add_tail(pg, &heap(node, zone, i));
> + pg += 1 << i;
I realize there are cases where this is also not being done correctly in
existing code, but please use 1UL here and in all similar cases.
> + }
> + PFN_ORDER(pg) = SCRUB_BATCH_ORDER;
> + }
> +
> + for ( i = 0; i < (1 << PFN_ORDER(pg)); i++ )
Can't you just use "order" here (and a few lines down)?
> + {
> + ASSERT( test_bit(_PGC_need_scrub,
> &pg[i].count_info) );
> + ASSERT( !test_bit(_PGC_broken,
> &pg[i].count_info) );
> + mark_page_offline(&pg[i], 0);
mark_page_offline() ???
Jan
> + }
> + page_list_add_tail(pg, local_scrub_list);
> + nr_delisted += ( 1 << PFN_ORDER(pg) );
> + if ( nr_delisted >= (1 << SCRUB_BATCH_ORDER) )
> + {
> + nr_delisted = 0;
> + spin_unlock(&heap_lock);
> + goto start_scrub;
> + }
> + }
> + }
> + }
> +
> + node_need_scrub[node] = 0;
> + spin_unlock(&heap_lock);
> + }
>
> + start_scrub:
> + __scrub_free_pages(node, cpu);
> + }
> +
> + out:
> + this_cpu(is_scrubbing) = 0;
> +}
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |