[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 3/3] xen: use idle vcpus to scrub pages



On 07/01/2014 05:12 PM, Jan Beulich wrote:
>>>> On 30.06.14 at 15:39, <lliubbo@xxxxxxxxx> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -116,6 +116,7 @@ static void idle_loop(void)
>>      {
>>          if ( cpu_is_offline(smp_processor_id()) )
>>              play_dead();
>> +        scrub_free_pages();
>>          (*pm_idle)();
> 
> So is it really appropriate to call pm_idle() if scrub_free_pages() didn't
> return immediately? I.e. I would suppose the function ought to return
> a boolean indicator whether any meaningful amount of processing
> was done, in which case we may want to skip entering any kind of C
> state (even if it would end up being just C1), or doing any of the
> processing associated with this possible entering.
>

Thanks, I'll add a return value for scrub_free_pages() and skip pm_idle
if necessary.

>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index ab293c8..6ab1d1d 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -86,6 +86,12 @@ PAGE_LIST_HEAD(page_offlined_list);
>>  /* Broken page list, protected by heap_lock. */
>>  PAGE_LIST_HEAD(page_broken_list);
>>  
>> +/* A rough flag to indicate whether a node have need_scrub pages */
>> +static bool_t node_need_scrub[MAX_NUMNODES];
>> +static DEFINE_PER_CPU(bool_t, is_scrubbing);
>> +static DEFINE_PER_CPU(struct page_list_head, scrub_list_cpu);
>> +static DEFINE_PER_CPU(struct page_list_head, free_list_cpu);
>> +
>>  /*************************
>>   * BOOT-TIME ALLOCATOR
>>   */
>> @@ -948,6 +954,7 @@ static void free_heap_pages(
>>      {
>>          if ( !tainted )
>>          {
>> +            node_need_scrub[node] = 1;
>>              for ( i = 0; i < (1 << order); i++ )
>>                  pg[i].count_info |= PGC_need_scrub;
>>          }
> 
> Iirc it was more than this single place where you set
> PGC_need_scrub, and hence where you'd now need to set the
> other flag too.
> 

I'm afraid this is the only place where PGC_need_scrub was set.

>> @@ -1525,7 +1532,130 @@ void __init scrub_heap_pages(void)
>>      setup_low_mem_virq();
>>  }
>>  
>> +#define SCRUB_BATCH_ORDER 12
> 
> Please make this adjustable via command line, so that eventual
> latency issues can be worked around.
> 

Okay.

>> +static void __scrub_free_pages(unsigned int node, unsigned int cpu)
>> +{
>> +    struct page_info *pg, *tmp;
>> +    unsigned int i;
>> +    int order;
>> +    struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
>> +    struct page_list_head *local_free_list = &this_cpu(free_list_cpu);
>> +
>> +    /* Scrub percpu list */
>> +    while ( !page_list_empty(local_scrub_list) )
>> +    {
>> +        pg = page_list_remove_head(local_scrub_list);
>> +        order = PFN_ORDER(pg);
>> +        ASSERT( pg && order <= SCRUB_BATCH_ORDER );
>> +        for ( i = 0; i < (1 << order); i++ )
>> +        {
>> +            ASSERT( test_bit(_PGC_need_scrub, &pg[i].count_info) );
>> +            scrub_one_page(&pg[i]);
>> +        }
>> +        page_list_add_tail(pg, local_free_list);
>> +        if ( softirq_pending(cpu) )
>> +            return;
> 
> Hard tabs. Please, also with further violations below in mind, check
> your code before submitting.
>

I'm sorry for all of the coding style problems.

By the way is there any script which can be used to check the code
before submitting? Something like ./scripts/checkpatch.pl under linux.

>> +    }
>> +
>> +    /* free percpu free list */
>> +    if ( !page_list_empty(local_free_list) )
>> +    {
>> +        spin_lock(&heap_lock);
>> +        page_list_for_each_safe( pg, tmp, local_free_list )
>> +        {
>> +            order = PFN_ORDER(pg);
>> +            page_list_del(pg, local_free_list);
>> +            for ( i = 0; i < (1 << order); i++ )
>> +        {
>> +                pg[i].count_info |= PGC_state_free;
>> +                pg[i].count_info &= ~PGC_need_scrub;
> 
> This needs to happen earlier - the scrub flag should be cleared right
> after scrubbing, and the free flag should imo be set when the page
> gets freed. That's for two reasons:
> 1) Hypervisor allocations don't need scrubbed pages, i.e. they can
> allocate memory regardless of the scrub flag's state.

AFAIR, the reason I set those flags here is to avoid a panic happen.

> 2) You still detain the memory on the local lists from allocation. On a
> many-node system, the 16Mb per node can certainly sum up (which
> is not to say that I don't view the 16Mb on a single node as already
> problematic).
> 

Right, but we can adjust SCRUB_BATCH_ORDER.
Anyway I'll take a retry as you suggested.

>> +            }
>> +            merge_free_trunks(pg, order, node, page_to_zone(pg), 0);
>> +        }
>> +        spin_unlock(&heap_lock);
>> +    }
>> +}
>> +
>> +void scrub_free_pages(void)
>> +{
>> +    int order;
>> +    struct page_info *pg, *tmp;
>> +    unsigned int i, zone, nr_delisted = 0;
>> +    unsigned int cpu = smp_processor_id();
>> +    unsigned int node = cpu_to_node(cpu);
>> +    struct page_list_head *local_scrub_list = &this_cpu(scrub_list_cpu);
>> +
>> +    /* Return if our sibling already started scrubbing */
>> +    for_each_cpu( i, per_cpu(cpu_sibling_mask,cpu) )
>> +        if ( per_cpu(is_scrubbing, i) )
>> +            return;
>> +    this_cpu(is_scrubbing) = 1;
> 
> If you really want to enforce exclusiveness, you need a single
> canonical flag per core (could e.g. be
> per_cpu(is_scrubbing, cpumask_first(per_cpu(cpu_sibling_mask, cpu))),
> set via test_and_set_bool()).

Will be updated.

> 
>> +
>> +    while ( !softirq_pending(cpu) )
>> +    {
>> +        if ( !node_need_scrub[node] )
>> +        {
>> +            /* Free local per cpu list before we exit */
>> +            __scrub_free_pages(node, cpu);
>> +            goto out;
>> +        }
> 
> This seems unnecessary: Just have the if() below be
> 
>         if ( node_need_scrub[node] && page_list_empty(local_scrub_list) )
> 
> along with __scrub_free_pages() returning whether it exited because
> of softirq_pending(cpu) (which at once eliminates the need for the
> check at the loop header above, allowing this to become a nice
> do { ... } while ( !__scrub_free_pages() ).
>

Same here.

>> +
>> +        /* Delist a batch of pages from global scrub list */
>> +        if ( page_list_empty(local_scrub_list) )
>> +        {
>> +            spin_lock(&heap_lock);
>> +            for ( zone = 0; zone < NR_ZONES; zone++ )
>> +            {
>> +                for ( order = MAX_ORDER; order >= 0; order-- )
>> +                {
>> +                    page_list_for_each_safe( pg, tmp, &heap(node, zone, 
>> order) )
>> +                    {
>> +                        if ( !test_bit(_PGC_need_scrub, &(pg->count_info)) )
> 
> Please avoid the inner parentheses here.
> 
>> +                            continue;
>> +
>> +                        page_list_del( pg, &heap(node, zone, order) );
>> +                        if ( order > SCRUB_BATCH_ORDER)
> 
> Coding style.
> 
>> +                        {
>> +                            /* putback extra pages */
>> +                            i = order;
>> +                            while ( i != SCRUB_BATCH_ORDER )
> 
> This would better be a do/while construct - on the first iteration you
> already know the condition is true.
> 
>> +                            {
>> +                                PFN_ORDER(pg) = --i;
>> +                                page_list_add_tail(pg, &heap(node, zone, 
>> i));
>> +                                pg += 1 << i;
> 
> I realize there are cases where this is also not being done correctly in
> existing code, but please use 1UL here and in all similar cases.
> 

Sure.

>> +                            }
>> +                            PFN_ORDER(pg) = SCRUB_BATCH_ORDER;
>> +                        }
>> +
>> +                        for ( i = 0; i < (1 << PFN_ORDER(pg)); i++ )
> 
> Can't you just use "order" here (and a few lines down)?
> 

Then I have to use another temporary variable. Anyway, I'll make a update.

>> +                        {
>> +                            ASSERT( test_bit(_PGC_need_scrub, 
>> &pg[i].count_info) );
>> +                            ASSERT( !test_bit(_PGC_broken, 
>> &pg[i].count_info) );
>> +                            mark_page_offline(&pg[i], 0);
> 
> mark_page_offline() ???
>

Yes, it's a bit tricky here and I don't have a good idea right now.
The problem is when free a page frame we have to avoid merging with
pages on percpu scrub/free list, so I marked pages offline temporarily
while adding to percpu lists.

Another thing I'm still not clear about is how to handle the situation
if #mc happened for pages on percpu scrub/free list.

Thank you very much for your review.

-Bob

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.