Xen project Mailing List

Re: [Xen-devel] [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)

On Fri, Jun 13, 2014 at 07:32:46PM +0100, Andrew Cooper wrote: > On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote: > > From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > > > > The page scrubbing is done in 128MB chunks in lockstep across all the > > non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst > > each > > chunk is being scrubbed and then release the heap_lock when the CPU's are > > finished scrubing their individual chunk. This allows for the heap_lock to > > not be held continously and for pending softirqs are to be serviced > > periodically across the CPU's. > > > > The page scrub memory chunks are allocated to the CPU's in a NUMA aware > > fashion to reduce socket interconnect overhead and improve performance. > > Specifically in the first phase we scrub at the same time on all the > > NUMA nodes that have CPUs - we also weed out the SMT threads so that > > we only use cores (that gives a 50% boost). The second phase is for NUMA > > nodes that have no CPUs - for that we use the closest NUMA node's CPUs > > (non-SMT again) to do the job. > > > > This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron > > 6386 machine from 49 seconds to 3 seconds. > > On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes > > to 63 seconds. > > > > Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > Reviewed-by: Tim Deegan <tim@xxxxxxx> > > Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > A few minor nits... > > > > > --- > > > > v2 > > - Reduced default chunk size to 128MB > > - Added code to scrub NUMA nodes with no active CPU linked to them > > - Be robust to boot CPU not being linked to a NUMA node > > > > v3: > > - Don't use SMT threads > > - Take care of remainder if the number of CPUs (or memory) is odd > > - Restructure the worker thread > > - s/u64/unsigned long/ > > > > v4: > > - Don't use all CPUs on non-CPU NUMA nodes, just closest one > > - Syntax, and docs updates > > - Compile on ARM > > - Fix bug when NUMA node has 0 pages > > > > v5: > > - Properly figure out best NUMA node. > > - Fix comments to be proper style. > > > > v6: > > - Missing page shift on default values, optimize cpumask usage. > > - Fix case of NODE having one page and below first_valid_mfn > > - Add Ack > > --- > > docs/misc/xen-command-line.markdown | 10 ++ > > xen/common/page_alloc.c | 211 > > ++++++++++++++++++++++++++++++++--- > > xen/include/asm-arm/numa.h | 1 + > > 3 files changed, 204 insertions(+), 18 deletions(-) > > > > diff --git a/docs/misc/xen-command-line.markdown > > b/docs/misc/xen-command-line.markdown > > index 25829fe..509f462 100644 > > --- a/docs/misc/xen-command-line.markdown > > +++ b/docs/misc/xen-command-line.markdown > > @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature > > to prevent > > accidentally leaking sensitive VM data into other VMs if Xen crashes > > and reboots. > > > > +### bootscrub_chunk > > _ needs escaping with a backslash > > > +> `= <size>` > > + > > +> Default: `134217728` > > Due to the implicit 'k' suffix on sizes, this is not a useful hint as to > how to change the default. > > Furthermore, `128M` is substantially clearer. <smiles> > > > + > > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap > > lock > > +and not running softirqs. Reduce this if softirqs are not being run > > frequently > > +enough. Setting this to a high value may cause boot failure, particularly > > if > > +the NMI watchdog is also enabled. > > + > > ### cachesize > > > `= <size>` > > > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > > ... > > - * Scrub all unallocated pages in all heap zones. This function is more > > - * convoluted than appears necessary because we do not want to continuously > > - * hold the lock while scrubbing very large memory areas. > > + * Scrub all unallocated pages in all heap zones. This function uses all > > + * online cpu's to scrub the memory in parallel. > > */ > > void __init scrub_heap_pages(void) > > { > > - unsigned long mfn; > > - struct page_info *pg; > > + cpumask_t node_cpus, all_worker_cpus; > > + unsigned int i, j; > > + unsigned long offset, max_per_cpu_sz = 0; > > + unsigned long start, end; > > + unsigned long rem = 0; > > + int last_distance, best_node; > > + int cpus; > > > > if ( !opt_bootscrub ) > > return; > > > > - printk("Scrubbing Free RAM: "); > > + cpumask_clear(&all_worker_cpus); > > + /* Scrub block size. */ > > + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT; > > + if ( chunk_size == 0 ) > > + chunk_size = MB(128) >> PAGE_SHIFT; > > + > > + /* Round #0 - figure out amounts and which CPUs to use. */ > > + for_each_online_node ( i ) > > + { > > + if ( !node_spanned_pages(i) ) > > + continue; > > + /* Calculate Node memory start and end address. */ > > + start = max(node_start_pfn(i), first_valid_mfn); > > + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page); > > + /* Just in case NODE has 1 page and starts below first_valid_mfn. > > */ > > + end = max(end, start); > > + /* CPUs that are online and on this node (if none, that it is OK). > > */ > > + cpus = find_non_smt(i, &node_cpus); > > + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus); > > + if ( cpus <= 0 ) > > + { > > + /* No CPUs on this node. Round #2 will take of it. */ > > + rem = 0; > > + region[i].per_cpu_sz = (end - start); > > + } > > + else > > + { > > + rem = (end - start) % cpumask_weight(&node_cpus); > > + region[i].per_cpu_sz = (end - start) / > > cpumask_weight(&node_cpus); > > The 'cpus' variable still holds cpumask_weight(&node_cpus), and is > rather more efficient to use. Grrr. Right! > > > + if ( region[i].per_cpu_sz > max_per_cpu_sz ) > > + max_per_cpu_sz = region[i].per_cpu_sz; > > + } > > + region[i].start = start; > > + region[i].rem = rem; > > + cpumask_copy(&region[i].cpus, &node_cpus); > > + } > > + > > + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n", > > num_online_nodes(), > > + cpumask_weight(&all_worker_cpus)); > > Both of these are signed quantities, rather than unsigned. Done! > > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.