[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] cpumask handling scalability improvements



On 20/10/2011 14:36, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:

> This patch set makes some first steps towards eliminating the old cpumask
> accessors, replacing them by such that don't require the full NR_CPUS
> bits to be allocated (which obviously can be pretty wasteful when
> NR_CPUS is high, but the actual number is low or moderate).
> 
> 01: introduce and use nr_cpu_ids and nr_cpumask_bits
> 02: eliminate cpumask accessors referencing NR_CPUS
> 03: eliminate direct assignments of CPU masks
> 04: x86: allocate IRQ actions' cpu_eoi_map dynamically
> 05: allocate CPU sibling and core maps dynamically

I'm not sure about this. We can save ~500 bytes per cpumask_t when
NR_CPUS=4096 and actual nr_cpus<64. But how many cpumask_t's do we typically
have dynamically allocated all at once? Let's say we waste 2kB per VCPU and
per IRQ, and we have a massive system with ~1k VCPUs and ~1k IRQs -- we'd
save ~4MB in that extreme case. But such a large system probably actually
will have a lot of CPUs. And also a lot of memory, such that 4MB is quite
insignificant.

I suppose there is a second argument that it shrinks the containing
structures (struct domain, struct vcpu, struct irq_desc, ...) and maybe
helps reduce our order!=0 allocations?

By the way, I think we could avoid the NR_CPUS copying overhead everywhere
by having the cpumask.h functions respect nr_cpu_ids, but continuing to
return NR_CPUS for sentinel value (e.g., end of loop; or no bit found)? This
would not need to change tonnes of code. It only gets part of the benefit
(reducing cpu time overhead) but is more palatable?

> 06: allow efficient allocation of multiple CPU masks at once

That is utterly hideous and for insignificant saving.

> One reason I put the following ones together was to reduce the
> differences between the disassembly of hypervisors built for 4095
> and 2047 CPUs, which I looked at to determine the places where
> cpumask_t variables get copied without using cpumask_copy() (a
> job where grep is of no help). Hence consider these patch optional,
> but recommended.
> 
> 07: cpufreq: allocate CPU masks dynamically
> 08: x86/p2m: allocate CPU masks dynamically
> 09: cpupools: allocate CPU masks dynamically
> 10: credit: allocate CPU masks dynamically
> 11: x86/hpet: allocate CPU masks dynamically
> 12: cpumask <=> xenctl_cpumap: allocate CPU masks and byte maps dynamically

Questionable. Any subsystem that allocates no more than a handful of
cpumask_t's is possibly just as well left alone... I'm not dead set against
them if we deicde that 01-05 are actually worth pursuing, however.

 -- Keir

> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.