[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs
- To: tupeng212 <tupeng212@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>
- From: Keir Fraser <keir@xxxxxxx>
- Date: Thu, 11 Oct 2012 16:41:09 +0100
- Delivery-date: Thu, 11 Oct 2012 15:42:05 +0000
- List-id: Xen developer discussion <xen-devel.lists.xen.org>
- Thread-index: Ac2nxtVyq1Ry/EgQ/EKDteC17SNtPA==
- Thread-topic: [Xen-devel] alloc_heap_pages is low efficient with more CPUs
Not sure I understand. With 16 CPUs you find domain startup takes 3s always. With 64 CPUs you find it takes 3s first time, then 30s in future? And this is due to cost of tlbflush_filter() (not actual TLB flushes, because you always end up with mask=0)? If tlbflush_filter() were that expensive I’d expect the 16-CPU case to have slowdown after the first domain startup, too.
-- Keir
On 11/10/2012 16:18, "tupeng212" <tupeng212@xxxxxxxxx> wrote:
I am confused with a problem:
I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one VM with 1 CPU and 40G RAM.
For the first time I started the VM, it just took 3s, But for the second starting it took 30s.
After studied it by printing log, I have located a place in the hypervisor where cost too much time,
occupied 98% of the whole starting time.
xen/common/page_alloc.c
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
unsigned int zone_lo, unsigned int zone_hi,
unsigned int node, unsigned int order, unsigned int memflags)
{
if ( pg[i].u.free.need_tlbflush )
{
/* Add in extra CPUs that need flushing because of this page. */
cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
cpus_or(mask, mask, extra_cpus_mask);
}
}
1 in the first starting, most of need_tlbflush=NULL, so cost little; in the second one, most of RAM have been used
thus makes need_tlbflush=true, so cost much.
2 but I repeated the same experiment in another blade which contains 16 physical CPUs and 64G physical RAM, the second
starting cost 3s. After I traced the process between the two second startings, found that count entering into the judgement of
pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the difference.
3 The code I pasted aims to compute a mask to determine whether it should flush CPU's TLB. I traced the values in starting period below:
cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF, extra_cpus_mask=0xFFFFFFFFFFFFFFFF
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
//after, mask=0, extra_cpus_mask=0
cpus_or(mask, mask, extra_cpus_mask);
//after, mask=0, extra_cpus_mask=0
every time it starts with mask=0, and ends with the same result mask=0, so leads to flush CPU's TLB definitely,
which seems meaningless in the staring process.
4 The problem is, this seemed meaningless code is very time-consuming, CPUs get more, time costs more, it takes 30s here in my blade
with 64 CPUs which may need some solution to improve the efficiency.
tupeng
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|