|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen: Spread boot time page scrubbing across all available CPU's (v6)
On Fri, Jun 13, 2014 at 07:32:46PM +0100, Andrew Cooper wrote:
> On 13/06/14 19:05, Konrad Rzeszutek Wilk wrote:
> > From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> >
> > The page scrubbing is done in 128MB chunks in lockstep across all the
> > non-SMT CPU's. This allows for the boot CPU to hold the heap_lock whilst
> > each
> > chunk is being scrubbed and then release the heap_lock when the CPU's are
> > finished scrubing their individual chunk. This allows for the heap_lock to
> > not be held continously and for pending softirqs are to be serviced
> > periodically across the CPU's.
> >
> > The page scrub memory chunks are allocated to the CPU's in a NUMA aware
> > fashion to reduce socket interconnect overhead and improve performance.
> > Specifically in the first phase we scrub at the same time on all the
> > NUMA nodes that have CPUs - we also weed out the SMT threads so that
> > we only use cores (that gives a 50% boost). The second phase is for NUMA
> > nodes that have no CPUs - for that we use the closest NUMA node's CPUs
> > (non-SMT again) to do the job.
> >
> > This patch reduces the boot page scrub time on a 128GB 64 core AMD Opteron
> > 6386 machine from 49 seconds to 3 seconds.
> > On a IvyBridge-EX 8 socket box with 1.5TB it cuts it down from 15 minutes
> > to 63 seconds.
> >
> > Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > Reviewed-by: Tim Deegan <tim@xxxxxxx>
>
> Functionally, Reviewed-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
>
> A few minor nits...
>
> >
> > ---
> >
> > v2
> > - Reduced default chunk size to 128MB
> > - Added code to scrub NUMA nodes with no active CPU linked to them
> > - Be robust to boot CPU not being linked to a NUMA node
> >
> > v3:
> > - Don't use SMT threads
> > - Take care of remainder if the number of CPUs (or memory) is odd
> > - Restructure the worker thread
> > - s/u64/unsigned long/
> >
> > v4:
> > - Don't use all CPUs on non-CPU NUMA nodes, just closest one
> > - Syntax, and docs updates
> > - Compile on ARM
> > - Fix bug when NUMA node has 0 pages
> >
> > v5:
> > - Properly figure out best NUMA node.
> > - Fix comments to be proper style.
> >
> > v6:
> > - Missing page shift on default values, optimize cpumask usage.
> > - Fix case of NODE having one page and below first_valid_mfn
> > - Add Ack
> > ---
> > docs/misc/xen-command-line.markdown | 10 ++
> > xen/common/page_alloc.c | 211
> > ++++++++++++++++++++++++++++++++---
> > xen/include/asm-arm/numa.h | 1 +
> > 3 files changed, 204 insertions(+), 18 deletions(-)
> >
> > diff --git a/docs/misc/xen-command-line.markdown
> > b/docs/misc/xen-command-line.markdown
> > index 25829fe..509f462 100644
> > --- a/docs/misc/xen-command-line.markdown
> > +++ b/docs/misc/xen-command-line.markdown
> > @@ -198,6 +198,16 @@ Scrub free RAM during boot. This is a safety feature
> > to prevent
> > accidentally leaking sensitive VM data into other VMs if Xen crashes
> > and reboots.
> >
> > +### bootscrub_chunk
>
> _ needs escaping with a backslash
>
> > +> `= <size>`
> > +
> > +> Default: `134217728`
>
> Due to the implicit 'k' suffix on sizes, this is not a useful hint as to
> how to change the default.
>
> Furthermore, `128M` is substantially clearer.
<smiles>
>
> > +
> > +Maximum RAM block size chunks to be scrubbed whilst holding the page heap
> > lock
> > +and not running softirqs. Reduce this if softirqs are not being run
> > frequently
> > +enough. Setting this to a high value may cause boot failure, particularly
> > if
> > +the NMI watchdog is also enabled.
> > +
> > ### cachesize
> > > `= <size>`
> >
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > ...
> > - * Scrub all unallocated pages in all heap zones. This function is more
> > - * convoluted than appears necessary because we do not want to continuously
> > - * hold the lock while scrubbing very large memory areas.
> > + * Scrub all unallocated pages in all heap zones. This function uses all
> > + * online cpu's to scrub the memory in parallel.
> > */
> > void __init scrub_heap_pages(void)
> > {
> > - unsigned long mfn;
> > - struct page_info *pg;
> > + cpumask_t node_cpus, all_worker_cpus;
> > + unsigned int i, j;
> > + unsigned long offset, max_per_cpu_sz = 0;
> > + unsigned long start, end;
> > + unsigned long rem = 0;
> > + int last_distance, best_node;
> > + int cpus;
> >
> > if ( !opt_bootscrub )
> > return;
> >
> > - printk("Scrubbing Free RAM: ");
> > + cpumask_clear(&all_worker_cpus);
> > + /* Scrub block size. */
> > + chunk_size = opt_bootscrub_chunk >> PAGE_SHIFT;
> > + if ( chunk_size == 0 )
> > + chunk_size = MB(128) >> PAGE_SHIFT;
> > +
> > + /* Round #0 - figure out amounts and which CPUs to use. */
> > + for_each_online_node ( i )
> > + {
> > + if ( !node_spanned_pages(i) )
> > + continue;
> > + /* Calculate Node memory start and end address. */
> > + start = max(node_start_pfn(i), first_valid_mfn);
> > + end = min(node_start_pfn(i) + node_spanned_pages(i), max_page);
> > + /* Just in case NODE has 1 page and starts below first_valid_mfn.
> > */
> > + end = max(end, start);
> > + /* CPUs that are online and on this node (if none, that it is OK).
> > */
> > + cpus = find_non_smt(i, &node_cpus);
> > + cpumask_or(&all_worker_cpus, &all_worker_cpus, &node_cpus);
> > + if ( cpus <= 0 )
> > + {
> > + /* No CPUs on this node. Round #2 will take of it. */
> > + rem = 0;
> > + region[i].per_cpu_sz = (end - start);
> > + }
> > + else
> > + {
> > + rem = (end - start) % cpumask_weight(&node_cpus);
> > + region[i].per_cpu_sz = (end - start) /
> > cpumask_weight(&node_cpus);
>
> The 'cpus' variable still holds cpumask_weight(&node_cpus), and is
> rather more efficient to use.
Grrr. Right!
>
> > + if ( region[i].per_cpu_sz > max_per_cpu_sz )
> > + max_per_cpu_sz = region[i].per_cpu_sz;
> > + }
> > + region[i].start = start;
> > + region[i].rem = rem;
> > + cpumask_copy(®ion[i].cpus, &node_cpus);
> > + }
> > +
> > + printk("Scrubbing Free RAM on %u nodes using %u CPUs\n",
> > num_online_nodes(),
> > + cpumask_weight(&all_worker_cpus));
>
> Both of these are signed quantities, rather than unsigned.
Done!
>
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |