I am confused with a problem:
I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one VM with 1 CPU and 40G RAM.
For the first time I started the VM, it just took 3s, But for the second starting it took 30s.
After studied it by printing log, I have located a place in the hypervisor where cost too much time,
occupied 98% of the whole starting time.
xen/common/page_alloc.c
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
unsigned int zone_lo, unsigned int zone_hi,
unsigned int node, unsigned int order, unsigned int memflags)
{
if ( pg[i].u.free.need_tlbflush )
{
/* Add in extra CPUs that need flushing because of this page. */
cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
cpus_or(mask, mask, extra_cpus_mask);
}
}
1 in the first starting, most of need_tlbflush=NULL, so cost little; in the second one, most of RAM have been used
thus makes need_tlbflush=true, so cost much.
2 but I repeated the same experiment in another blade which contains 16 physical CPUs and 64G physical RAM, the second
starting cost 3s. After I traced the
process between the two second
startings, found that count entering into the judgement
of
pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the difference.
3 The code I pasted aims to compute a mask to determine whether it should flush CPU's TLB. I traced the values in starting period below:
cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF, extra_cpus_mask=0xFFFFFFFFFFFFFFFF
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
//after, mask=0, extra_cpus_mask=0
cpus_or(mask, mask, extra_cpus_mask);
//after, mask=0, extra_cpus_mask=0
every time it starts with mask=0, and ends with the same result mask=0, so leads to flush CPU's TLB definitely,
which seems meaningless in the staring process.
4 The problem is, this seemed meaningless code is very time-consuming, CPUs get more, time costs more, it takes 30s here in my blade
with 64 CPUs which may need some solution to improve the efficiency.