Re: [Xen-devel] alloc_heap_pages is low efficient with more CPUs

With 16 CPUs you find domain startup takes 3s always.
With 16CPUs, first 0.3s, second 3s
With 64CPUs, first 3s, second 30s.
With 64 CPUs you find it takes 3s first time, then 30s in future?
: Yes
And this is due to cost of tlbflush_filter() (not actual TLB flushes, because you always end up with mask=0)?
: Yes, it costs much in tlbflush_filter() in the judgement.
TLB flushing is really very fast, it just sends a IPI to related CPU.
In the starting process's allocation, it always ends up with mask=0 which seems needless.
If tlbflush_filter() were that expensive Iâd expect the 16-CPU case to have slowdown after the first domain startup, too.
: Yes, you are right, 16CPU slows down too after its first startup.
The reason is very clear, I have discussed it with others, tlbflush_filter() is low efficient is no doubt,
But I don't know how to improve it .
and I also used  xen oprofile to find the following two functions are called high frequently:
alloc_heap_pages: 40%
__next_cpu : 20%
others: 0.x%
alloc_heap_pages -> tlbflush_filter -> for_each_cpu_mask next_cpu  -> __next_cpu
it seems traveling among CPUs is expensive.
