[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?



Hi Elena,

Thank you very much for sharing this! :-)

On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva
<elena.ufimtseva@xxxxxxxxxx> wrote:
>
> On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
> > <konrad.wilk@xxxxxxxxxx> wrote:
> > >> > Hey!
> > >> >
> > >> > CC-ing Elena.
> > >>
> > >> I think you forgot you cc.ed her..
> > >> Anyway, let's cc. her now... :-)
> > >>
> > >> >
> > >> >> We are measuring the execution time between native machine environment
> > >> >> and xen virtualization environment using PARSEC Benchmark [1].
> > >> >>
> > >> >> In virtualiztion environment, we run a domU with three VCPUs, each of
> > >> >> them pinned to a core; we pin the dom0 to another core that is not
> > >> >> used by the domU.
> > >> >>
> > >> >> Inside the Linux in domU in virtualization environment and in native
> > >> >> environment,  We used the cpuset to isolate a core (or VCPU) for the
> > >> >> system processors and to isolate a core for the benchmark processes.
> > >> >> We also configured the Linux boot command line with isocpus= option to
> > >> >> isolate the core for benchmark from other unnecessary processes.
> > >> >
> > >> > You may want to just offline them and also boot the machine with NUMA
> > >> > disabled.
> > >>
> > >> Right, the machine is booted up with NUMA disabled.
> > >> We will offline the unnecessary cores then.
> > >>
> > >> >
> > >> >>
> > >> >> We expect that execution time of benchmarks in xen virtualization
> > >> >> environment is larger than the execution time in native machine
> > >> >> environment. However, the evaluation gave us an opposite result.
> > >> >>
> > >> >> Below is the evaluation data for the canneal and streamcluster 
> > >> >> benchmarks:
> > >> >>
> > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> > >> >> Native: 6.387s
> > >> >> Virtualization: 5.890s
> > >> >>
> > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> > >> >> Native: 5.276s
> > >> >> Virtualization: 5.240s
> > >> >>
> > >> >> Is there anything wrong with our evaluation that lead to the abnormal
> > >> >> performance results?
> > >> >
> > >> > Nothing is wrong. Virtualization is naturally faster than baremetal!
> > >> >
> > >> > :-)
> > >> >
> > >> > No clue sadly.
> > >>
> > >> Ah-ha. This is really surprising to me.... Why will it speed up the
> > >> system by adding one more layer? Unless the virtualization disabled
> > >> some services that occur in native and interfere with the benchmark.
> > >>
> > >> If virtualization is faster than baremetal by nature, why we can see
> > >> that some experiment shows that virtualization introduces overhead?
> > >
> > > Elena told me that there were some weird regression in Linux 4.1 - where
> > > CPU burning workloads were _slower_ on baremetal than as guests.
> >
> > Hi Elena,
> > Would you mind sharing with us some of your experience of how you
> > found the real reason? Did you use some tool or some methodology to
> > pin down the reason (i.e,  CPU burning workloads in native is _slower_
> > on baremetal than as guests)?
> >
>
> Hi Meng
>
> Yes, sure!
>
> While working on performance tests for smt-exposing patches from Joao
> I run CPU bound workload in HVM guest and using same kernel in baremetal
> run same test.
> While testing cpu-bound workload on baremetal linux (4.1.0-rc2)
> I found that the time to complete the same test is few times more that
> as it takes for the same under HVM guest.
> I have tried tests where kernel threads pinned to cores and without pinning.
> The execution times are most of the times take as twice longer, sometimes 4
> times longer that HVM case.
>
> Interesting is not only that it takes sometimes 3-4 times more
> than HVM guest, but also that test with bound threads (to cores) takes almost
> 3 times longer
> to execute than running same cpu-bound test under HVM (in all
> configurations).


wow~ I didn't expect the native performance can be so "bad".... ;-)

>
>
> I run each test 5 times and here are the execution times (seconds):
>
> -------------------------------------------------
>         baremetal           |
> thread_bind | thread unbind | HVM pinned to cores
> ----------- |---------------|---------------------
>      74     |     83        |        28
>      74     |     88        |        28
>      74     |     38        |        28
>      74     |     73        |        28
>      74     |     87        |        28
>
> Sometimes better times were on unbinded tests, but not often enough
> to present it here. Some results are much worse and reach up to 120
> seconds.
>
> Each test has 8 kernel threads. In baremetal case I tried the following:
> - numa off,on;
> - all cpus are on;
> - isolate cpus from first node;
> - set intel_idle.max_cstate=1;
> - disable intel_pstate;
>
> I dont think I have exhausted all the options here, but it looked like
> two last changes did improve performance, but was still not comparable to
> HVM case.
> I am trying to find where regression had happened. Performance on newer
> kernel (I tried 4.5.0-rc4+) was close or better than HVM.
>
> I am trying to find f there were some relevant regressions to understand
> the reason of this.


I see. If this is only happening for the SMT, it may be caused by the
SMT-related load balancing in Linux scheduler.
However, I have disabled the HT on my machine. Probably, that's also
the reason why I didn't see so much different in performance.

>
>
>
> What kernel you guys use?


I'm using a quite old kernel
3.10.31
. The reason why I'm using this kernel is because I want to use the
LITMUS^RT [1], which is a linux testbed for real-time scheduling
research. (It has a new version though, and I can upgrade to the
latest version to see if the "problem" still occurs.)

Thanks and Best Regards,

Meng

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.