Xen project Mailing List

Re: [Xen-devel] [PATCH v3 4/9] mm: Scrub memory from idle loop

On 05/04/2017 11:31 AM, Jan Beulich wrote: >>>> On 14.04.17 at 17:37, <boris.ostrovsky@xxxxxxxxxx> wrote: >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned >> int node, >> return pg; >> } >> >> -static void scrub_free_pages(unsigned int node) >> +static nodemask_t node_scrubbing; >> + >> +static unsigned int node_to_scrub(bool get_node) >> +{ >> + nodeid_t node = cpu_to_node(smp_processor_id()), local_node; >> + nodeid_t closest = NUMA_NO_NODE; >> + u8 dist, shortest = 0xff; >> + >> + if ( node == NUMA_NO_NODE ) >> + node = 0; >> + >> + if ( node_need_scrub[node] && >> + (!get_node || !node_test_and_set(node, node_scrubbing)) ) >> + return node; >> + >> + /* >> + * See if there are memory-only nodes that need scrubbing and choose >> + * the closest one. >> + */ >> + local_node = node; >> + while ( 1 ) >> + { >> + do { >> + node = cycle_node(node, node_online_map); >> + } while ( !cpumask_empty(&node_to_cpumask(node)) && >> + (node != local_node) ); >> + >> + if ( node == local_node ) >> + break; >> + >> + if ( node_need_scrub[node] ) >> + { >> + if ( !get_node ) >> + return node; > I think the function parameter name is not / no longer suitable. The > caller wants to get _some_ node in either case. The difference is > whether it wants to just know whether there's _any_ needing scrub > work done, or whether it wants _the one_ to actually scrub on. So > how about "get_any" or "get_any_node" or just "any"? Not only to find out whether there is anything to scrub but, if get_node is true, to actually "get" it, i.e. set the bit in the node_scrubbing mask. Thus the name. > >> + if ( !node_test_and_set(node, node_scrubbing) ) >> + { >> + dist = __node_distance(local_node, node); >> + if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) ) >> + { >> + /* Release previous node. */ >> + if ( closest != NUMA_NO_NODE ) >> + node_clear(closest, node_scrubbing); >> + shortest = dist; >> + closest = node; >> + } >> + else >> + node_clear(node, node_scrubbing); >> + } > Wouldn't it be better to check distance before setting the bit in > node_scrubbing, avoiding the possible need to clear it again later > (potentially misguiding other CPUs)? Yes. > > And why would NUMA_NO_DISTANCE lead to a switch of nodes? > I can see that being needed when closest == NUMA_NO_NODE, > but once you've picked one I think you should switch only when > you've found another that's truly closer. OK --- yes, this was indeed done to "get started" (i.e. get first valid value for 'closest'). > >> + } >> + } >> + >> + return closest; >> +} >> + >> +bool scrub_free_pages(void) >> { >> struct page_info *pg; >> unsigned int zone, order; >> unsigned long i; >> + unsigned int cpu = smp_processor_id(); >> + bool preempt = false; >> + nodeid_t node; >> >> - ASSERT(spin_is_locked(&heap_lock)); >> + /* >> + * Don't scrub while dom0 is being constructed since we may >> + * fail trying to call map_domain_page() from scrub_one_page(). >> + */ >> + if ( system_state < SYS_STATE_active ) >> + return false; > I assume that's because of the mapcache vcpu override? That's x86 > specific though, so the restriction here ought to be arch specific. > Even better would be to find a way to avoid this restriction > altogether, as on bigger systems only one CPU is actually busy > while building Dom0, so all others could be happily scrubbing. Could > that override become a per-CPU one perhaps? Is it worth doing though? What you are saying below is exactly why I simply return here --- there were very few dirty pages. This may change if we decide to use idle-loop scrubbing for boot scrubbing as well (as Andrew suggested earlier) but there is little reason to do it now IMO. > Otoh there's not much to scrub yet until Dom0 had all its memory > allocated, and we know which pages truly remain free (wanting > what is currently the boot time scrubbing done on them). But that > point in time may still be earlier than when we switch to > SYS_STATE_active. > >> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node) >> pg[i].count_info &= ~PGC_need_scrub; >> node_need_scrub[node]--; >> } >> + if ( softirq_pending(cpu) ) >> + { >> + preempt = true; >> + break; >> + } > Isn't this a little too eager, especially if you didn't have to scrub > the page on this iteration? What would be a good place then? Count how actually scrubbed pages and check for pending interrupts every so many? Even if we don't scrub at all walking whole heap can take a while. > >> @@ -1141,9 +1220,6 @@ static void free_heap_pages( >> if ( tainted ) >> reserve_offlined_page(pg); >> >> - if ( need_scrub ) >> - scrub_free_pages(node); > I'd expect this eliminates the need for the need_scrub variable. We still need it to decide whether to set PGC_need_scrub on pages. -boris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.