[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Xen VMs and Unixbench: single vs multiple cpu behaviour



On Tue, 2015-11-24 at 23:41 +0100, Dario Faggioli wrote:
> So, let me see if I can put the numbers together and recap.
> 
> With a 4 vCPUs VM, we have:
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â no pinning / all on 1 pCPU / 1-to-1 pin
> Dhrystone 2 using register variablesÂÂ3355.0ÂÂÂÂÂÂÂ3359.4ÂÂÂÂÂÂÂÂÂÂ3385.2
> Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂ 787.6ÂÂÂÂÂÂÂÂ785.3ÂÂÂÂÂÂÂÂÂÂÂ784.2
> Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 298.8ÂÂÂÂÂÂÂÂ193.0ÂÂÂÂÂÂÂÂÂÂÂ303.7
> File Copy 1024 bufsize 2000 maxblocks 3292.7ÂÂÂÂÂÂÂ3303.1ÂÂÂÂÂÂÂÂÂÂ3294.0
> File Copy 256 bufsize 500 maxblocksÂÂÂ2078.2ÂÂÂÂÂÂÂ2089.2ÂÂÂÂÂÂÂÂÂÂ2083.3
> File Copy 4096 bufsize 8000 maxblocks 5516.9ÂÂÂÂÂÂÂ5559.8ÂÂÂÂÂÂÂÂÂÂ5576.7
> Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1855.9ÂÂÂÂÂÂÂ1857.8ÂÂÂÂÂÂÂÂÂÂ1856.1
> Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂÂ999.9ÂÂÂÂÂÂÂÂ987.6ÂÂÂÂÂÂÂÂÂÂÂ999.5
> Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 254.4ÂÂÂÂÂÂÂÂ826.4ÂÂÂÂÂÂÂÂÂÂÂ354.1
> Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ818.0ÂÂÂÂÂÂÂÂ840.1ÂÂÂÂÂÂÂÂÂÂÂ815.8
> Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂ6493.1ÂÂÂÂÂÂÂ1100.4ÂÂÂÂÂÂÂÂÂÂ6497.7
> System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2870.2ÂÂÂÂÂÂÂ2866.0ÂÂÂÂÂÂÂÂÂÂ2847.9
> System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂ1564.2ÂÂÂÂÂÂÂ1438.5ÂÂÂÂÂÂÂÂÂÂ1611.2
> 
I've done some more tests. I think I reproduced the issue. On the box
I'm using, I'm getting worse performance in general, though, which
makes it a little bit less evident, but it's noticeable (also, I'm
stuck with PV guests, for now, which also may be part of the reason for
smaller numbers).

In any case, I've used cpupools to isolate 4 pCPUs of a bigger machine.
None of these 4 pCPUs were a hyperthreading sibling of any other one
(yes, the box has hyperthreading), so that should not be an issue.

Now that I think more about this, I probably can setup things even
better, but in any case, here's what I've got for now:

                                      1 vCPU / 4 vCPUs / 4 vCPUs*
Dhrystone 2 using register variablesÂÂ2303.8   2301.3     2315.7
Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂÂ620.2    620.4      620.3
Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ404.2    199.8      392.9
File Copy 1024 bufsize 2000 maxblocks 824.4    802.8      802.0
File Copy 256 bufsize 500 maxblocksÂÂÂÂ508.3    492.8      494.3
File Copy 4096 bufsize 8000 maxblocksÂ1568.1   1528.5     1537.5
Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ371.8    362.7      365.4
Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂÂ215.4     86.8      211.3
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ238.3    140.8      240.7
Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ918.2    766.2      892.7
Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ861.9   2346.6      839.8
System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ337.3    330.5      332.5
System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂÂ594.1    526.5      584.6

In "4 vCPUs*", I've done everything like in "4 vCPUs", except that I
pinned the benchmarks on one guest vCPU _from_inside_ the guest itself.
That is, I run UnixBench like this (again, inside the VM):

Â# schedtool -a 1 -e ./Run -c 1

The fact that performance improves when doing that, makes me thinking
that the issue is related to how the Xen's scheduler handles the fact
that the Linux's scheduler migrates stuff between vCPUs in the specific
way this happens during some of the tests above.

I have heard similar reports, so I'll keep investigating. I've got
theories, but I'd like to collect a few more date before drawing
conclusions... Next step will be tracing some of the tests.

In the meantime, Marko, if you're still up for it, can you try these
two commands, in your 4 vCPUs VM, and report here the results?

From the UnixBench directory:

Â#Â./Run -c 1 spawn

Â# schedtool -a 1 -e ./Run -c 1 spawn

(for the latter, you may have to install 'schedtool'. There's another
thing allowing you to do basically the same, I think it's called
'taskset').

Here, this is what I see:

#Â./Run -c 1 spawn
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1789.9 lpsÂÂÂ(30.0 s, 2 samples)
System Benchmarks Partial Index  BASELINEÂÂÂÂÂÂÂRESULTÂÂÂÂINDEX
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 26.0ÂÂÂÂÂÂÂ1789.9ÂÂÂÂ142.1

# schedtool -a 1 -e ./Run -c 1 spawn
Process Creation                3074.9 lpsÂÂÂ(30.0 s, 2 samples)
System Benchmarks Partial Index BASELINEÂÂÂÂÂÂÂRESULTÂÂÂÂINDEX
Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ    126.0ÂÂÂÂÂÂÂ3074.9ÂÂÂÂ244.0

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.