[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/9] mm: Scrub memory from idle loop



On 05/04/2017 11:31 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@xxxxxxxxxx> wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned 
>> int node,
>>      return pg;
>>  }
>>  
>> -static void scrub_free_pages(unsigned int node)
>> +static nodemask_t node_scrubbing;
>> +
>> +static unsigned int node_to_scrub(bool get_node)
>> +{
>> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
>> +    nodeid_t closest = NUMA_NO_NODE;
>> +    u8 dist, shortest = 0xff;
>> +
>> +    if ( node == NUMA_NO_NODE )
>> +        node = 0;
>> +
>> +    if ( node_need_scrub[node] &&
>> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
>> +        return node;
>> +
>> +    /*
>> +     * See if there are memory-only nodes that need scrubbing and choose
>> +     * the closest one.
>> +     */
>> +    local_node = node;
>> +    while ( 1 )
>> +    {
>> +        do {
>> +            node = cycle_node(node, node_online_map);
>> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>> +                  (node != local_node) );
>> +
>> +        if ( node == local_node )
>> +            break;
>> +
>> +        if ( node_need_scrub[node] )
>> +        {
>> +            if ( !get_node )
>> +                return node;
> I think the function parameter name is not / no longer suitable. The
> caller wants to get _some_ node in either case. The difference is
> whether it wants to just know whether there's _any_ needing scrub
> work done, or whether it wants _the one_ to actually scrub on. So
> how about "get_any" or "get_any_node" or just "any"?


Not only to find out whether there is anything to scrub but, if get_node
is true, to actually "get" it, i.e. set the bit in the node_scrubbing
mask. Thus the name.

>
>> +            if ( !node_test_and_set(node, node_scrubbing) )
>> +            {
>> +                dist = __node_distance(local_node, node);
>> +                if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) )
>> +                {
>> +                    /* Release previous node. */
>> +                    if ( closest != NUMA_NO_NODE )
>> +                        node_clear(closest, node_scrubbing);
>> +                    shortest = dist;
>> +                    closest = node;
>> +                }
>> +                else
>> +                    node_clear(node, node_scrubbing);
>> +            }
> Wouldn't it be better to check distance before setting the bit in
> node_scrubbing, avoiding the possible need to clear it again later
> (potentially misguiding other CPUs)?

Yes.

>
> And why would NUMA_NO_DISTANCE lead to a switch of nodes?
> I can see that being needed when closest == NUMA_NO_NODE,
> but once you've picked one I think you should switch only when
> you've found another that's truly closer.

OK --- yes, this was indeed done to "get started" (i.e. get first valid
value for 'closest').

>
>> +        }
>> +    }
>> +
>> +    return closest;
>> +}
>> +
>> +bool scrub_free_pages(void)
>>  {
>>      struct page_info *pg;
>>      unsigned int zone, order;
>>      unsigned long i;
>> +    unsigned int cpu = smp_processor_id();
>> +    bool preempt = false;
>> +    nodeid_t node;
>>  
>> -    ASSERT(spin_is_locked(&heap_lock));
>> +    /*
>> +     * Don't scrub while dom0 is being constructed since we may
>> +     * fail trying to call map_domain_page() from scrub_one_page().
>> +     */
>> +    if ( system_state < SYS_STATE_active )
>> +        return false;
> I assume that's because of the mapcache vcpu override? That's x86
> specific though, so the restriction here ought to be arch specific.
> Even better would be to find a way to avoid this restriction
> altogether, as on bigger systems only one CPU is actually busy
> while building Dom0, so all others could be happily scrubbing. Could
> that override become a per-CPU one perhaps?

Is it worth doing though? What you are saying below is exactly why I
simply return here --- there were very few dirty pages. This may change
if we decide to use idle-loop scrubbing for boot scrubbing as well (as
Andrew suggested earlier) but there is little reason to do it now IMO.

> Otoh there's not much to scrub yet until Dom0 had all its memory
> allocated, and we know which pages truly remain free (wanting
> what is currently the boot time scrubbing done on them). But that
> point in time may still be earlier than when we switch to
> SYS_STATE_active.
>
>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>>                          pg[i].count_info &= ~PGC_need_scrub;
>>                          node_need_scrub[node]--;
>>                      }
>> +                    if ( softirq_pending(cpu) )
>> +                    {
>> +                        preempt = true;
>> +                        break;
>> +                    }
> Isn't this a little too eager, especially if you didn't have to scrub
> the page on this iteration?

What would be a good place then? Count how actually scrubbed pages and
check for pending interrupts every so many?  Even if we don't scrub at
all walking whole heap can take a while.


>
>> @@ -1141,9 +1220,6 @@ static void free_heap_pages(
>>      if ( tainted )
>>          reserve_offlined_page(pg);
>>  
>> -    if ( need_scrub )
>> -        scrub_free_pages(node);
> I'd expect this eliminates the need for the need_scrub variable.

We still need it to decide whether to set PGC_need_scrub on pages.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.