Xen project Mailing List

Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs

To: tupeng212 <tupeng212@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

From: Keir Fraser <keir@xxxxxxx>

Date: Thu, 11 Oct 2012 16:41:09 +0100

Delivery-date: Thu, 11 Oct 2012 15:42:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac2nxtVyq1Ry/EgQ/EKDteC17SNtPA==

Thread-topic: [Xen-devel] alloc_heap_pages is low efficient with more CPUs

Not sure I understand. With 16 CPUs you find domain startup takes 3s always. With 64 CPUs you find it takes 3s first time, then 30s in future? And this is due to cost of tlbflush_filter() (not actual TLB flushes, because you always end up with mask=0)? If tlbflush_filter() were that expensive I’d expect the 16-CPU case to have slowdown after the first domain startup, too.

-- Keir

On 11/10/2012 16:18, "tupeng212" <tupeng212@xxxxxxxxx> wrote:

I am confused with a problem:
I have a blade with 64 physical CPUs and 64G physical RAM, and defined only one VM with 1 CPU and 40G RAM.
For the first time I started the VM, it just took 3s, But for the second starting it took 30s.
After studied it by printing log, I have located a place in the hypervisor where cost too much time,
occupied 98% of the whole starting time.

xen/common/page_alloc.c
/* Allocate 2^@order contiguous pages. */
static struct page_info *alloc_heap_pages(
    unsigned int zone_lo, unsigned int zone_hi,
    unsigned int node, unsigned int order, unsigned int memflags)
{
        if ( pg[i].u.free.need_tlbflush )
        {
            /* Add in extra CPUs that need flushing because of this page. */
            cpus_andnot(extra_cpus_mask, cpu_online_map, mask);
            tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
            cpus_or(mask, mask, extra_cpus_mask);
        }

}

1 in the first starting, most of need_tlbflush=NULL, so cost little; in the second one, most of RAM have been used
  thus makes need_tlbflush=true, so cost much.

2 but I repeated the same experiment in another blade which contains 16 physical CPUs and 64G physical RAM, the second
  starting cost 3s. After I traced the process between the two second startings, found that count entering into the judgement of
  pg[i].u.free.need_tlbflush is the same, but number of CPUs leads to the difference.

3 The code I pasted aims to compute a mask to determine whether it should flush CPU's TLB. I traced the values in starting period below:

cpus_andnot(extra_cpus_mask, cpu_online_map, mask);

//after, mask=0, cpu_online_map=0xFFFFFFFFFFFFFFFF, extra_cpus_mask=0xFFFFFFFFFFFFFFFF

tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);

//after, mask=0, extra_cpus_mask=0

cpus_or(mask, mask, extra_cpus_mask);

//after, mask=0, extra_cpus_mask=0

  every time it starts with mask=0, and ends with the same result mask=0, so leads to flush CPU's TLB definitely,
  which seems meaningless in the staring process.

4 The problem is, this seemed meaningless code is very time-consuming, CPUs get more, time costs more, it takes 30s here in my blade
  with 64 CPUs which may need some solution to improve the efficiency.

tupeng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.