ï
With 16 CPUs you find domain startup takes 3s
always.
:No
With 16CPUs, first 0.3s, second 3s
With 64CPUs, first 3s, second 30s.
With 64 CPUs you find it takes 3s first time,
then 30s in future?
: Yes
And this is due to cost of tlbflush_filter() (not
actual TLB flushes, because you always end up with mask=0)?
: Yes, it costs much in tlbflush_filter() in the
judgement.
TLB flushing is really very fast, it just sends a
IPI to related CPU.
In the starting process's allocation, it always
ends up with mask=0 which seems needless.
If tlbflush_filter() were that expensive Iâd
expect the 16-CPU case to have slowdown after the first domain startup,
too.
: Yes, you are right, 16CPU slows down too after
its first startup.
The reason is very clear, I have discussed it
with others, tlbflush_filter() is low efficient is no doubt,
But I don't know how to improve it .
and I also used xen oprofile to
find the following two functions are called high frequently:
alloc_heap_pages: 40%
__next_cpu : 20%
others: 0.x%
.....
alloc_heap_pages -> tlbflush_filter -> for_each_cpu_mask
next_cpu -> __next_cpu
it seems traveling among CPUs is expensive.
|