Xen project Mailing List

Re: [Xen-users] Xen VMs and Unixbench: single vs multiple cpu behaviour

To: Marko ÄukiÄ <marko.djukic@xxxxxxxxx>

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Wed, 25 Nov 2015 10:54:45 +0100

Cc: xen-users@xxxxxxxxxxxxx, george.dunlap@xxxxxxxxxx

Delivery-date: Wed, 25 Nov 2015 09:55:11 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

On Tue, 2015-11-24 at 23:41 +0100, Dario Faggioli wrote: > So, let me see if I can put the numbers together and recap. > > With a 4 vCPUs VM, we have: > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â no pinning / all on 1 pCPU / 1-to-1 pin > Dhrystone 2 using register variablesÂÂ3355.0ÂÂÂÂÂÂÂ3359.4ÂÂÂÂÂÂÂÂÂÂ3385.2 > Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂ 787.6ÂÂÂÂÂÂÂÂ785.3ÂÂÂÂÂÂÂÂÂÂÂ784.2 > Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 298.8ÂÂÂÂÂÂÂÂ193.0ÂÂÂÂÂÂÂÂÂÂÂ303.7 > File Copy 1024 bufsize 2000 maxblocks 3292.7ÂÂÂÂÂÂÂ3303.1ÂÂÂÂÂÂÂÂÂÂ3294.0 > File Copy 256 bufsize 500 maxblocksÂÂÂ2078.2ÂÂÂÂÂÂÂ2089.2ÂÂÂÂÂÂÂÂÂÂ2083.3 > File Copy 4096 bufsize 8000 maxblocks 5516.9ÂÂÂÂÂÂÂ5559.8ÂÂÂÂÂÂÂÂÂÂ5576.7 > Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1855.9ÂÂÂÂÂÂÂ1857.8ÂÂÂÂÂÂÂÂÂÂ1856.1 > Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂÂ999.9ÂÂÂÂÂÂÂÂ987.6ÂÂÂÂÂÂÂÂÂÂÂ999.5 > Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 254.4ÂÂÂÂÂÂÂÂ826.4ÂÂÂÂÂÂÂÂÂÂÂ354.1 > Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ818.0ÂÂÂÂÂÂÂÂ840.1ÂÂÂÂÂÂÂÂÂÂÂ815.8 > Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂ6493.1ÂÂÂÂÂÂÂ1100.4ÂÂÂÂÂÂÂÂÂÂ6497.7 > System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ2870.2ÂÂÂÂÂÂÂ2866.0ÂÂÂÂÂÂÂÂÂÂ2847.9 > System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂ1564.2ÂÂÂÂÂÂÂ1438.5ÂÂÂÂÂÂÂÂÂÂ1611.2 > I've done some more tests. I think I reproduced the issue. On the box I'm using, I'm getting worse performance in general, though, which makes it a little bit less evident, but it's noticeable (also, I'm stuck with PV guests, for now, which also may be part of the reason for smaller numbers). In any case, I've used cpupools to isolate 4 pCPUs of a bigger machine. None of these 4 pCPUs were a hyperthreading sibling of any other one (yes, the box has hyperthreading), so that should not be an issue. Now that I think more about this, I probably can setup things even better, but in any case, here's what I've got for now: 1 vCPU / 4 vCPUs / 4 vCPUs* Dhrystone 2 using register variablesÂÂ2303.8 2301.3 2315.7 Double-Precision WhetstoneÂÂÂÂÂÂÂÂÂÂÂÂÂ620.2 620.4 620.3 Execl ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ404.2 199.8 392.9 File Copy 1024 bufsize 2000 maxblocksÂ 824.4 802.8 802.0 File Copy 256 bufsize 500 maxblocksÂÂÂÂ508.3 492.8 494.3 File Copy 4096 bufsize 8000 maxblocksÂ1568.1 1528.5 1537.5 Pipe ThroughputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ371.8 362.7 365.4 Pipe-based Context SwitchingÂÂÂÂÂÂÂÂÂÂÂ215.4 86.8 211.3 Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ238.3 140.8 240.7 Shell Scripts (1 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ918.2 766.2 892.7 Shell Scripts (8 concurrent)ÂÂÂÂÂÂÂÂÂÂÂ861.9 2346.6 839.8 System Call OverheadÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ337.3 330.5 332.5 System Benchmarks Index ScoreÂÂÂÂÂÂÂÂÂÂ594.1 526.5 584.6 In "4 vCPUs*", I've done everything like in "4 vCPUs", except that I pinned the benchmarks on one guest vCPU _from_inside_ the guest itself. That is, I run UnixBench like this (again, inside the VM): Â# schedtool -a 1 -e ./Run -c 1 The fact that performance improves when doing that, makes me thinking that the issue is related to how the Xen's scheduler handles the fact that the Linux's scheduler migrates stuff between vCPUs in the specific way this happens during some of the tests above. I have heard similar reports, so I'll keep investigating. I've got theories, but I'd like to collect a few more date before drawing conclusions... Next step will be tracing some of the tests. In the meantime, Marko, if you're still up for it, can you try these two commands, in your 4 vCPUs VM, and report here the results? From the UnixBench directory: Â#Â./Run -c 1 spawn Â# schedtool -a 1 -e ./Run -c 1 spawn (for the latter, you may have to install 'schedtool'. There's another thing allowing you to do basically the same, I think it's called 'taskset'). Here, this is what I see: #Â./Run -c 1 spawn Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ1789.9 lpsÂÂÂ(30.0 s, 2 samples) System Benchmarks Partial Index BASELINEÂÂÂÂÂÂÂRESULTÂÂÂÂINDEX Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 26.0ÂÂÂÂÂÂÂ1789.9ÂÂÂÂ142.1 # schedtool -a 1 -e ./Run -c 1 spawn Process CreationÂ 3074.9 lpsÂÂÂ(30.0 s, 2 samples) System Benchmarks Partial IndexÂ BASELINEÂÂÂÂÂÂÂRESULTÂÂÂÂINDEX Process CreationÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 126.0ÂÂÂÂÂÂÂ3074.9ÂÂÂÂ244.0 Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.