[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Virt overehead with HT [was: Re: Xen 4.5 development update]
On 07/14/2014 06:22 PM, Dario Faggioli wrote: On Mon, 2014-07-14 at 17:55 +0100, George Dunlap wrote:On 07/14/2014 05:44 PM, Dario Faggioli wrote:On Mon, 2014-07-14 at 17:32 +0100, Gordan Bobic wrote:On 07/14/2014 05:12 PM, Dario Faggioli wrote:Elapsed(stddev) BAREMETAL HVM kernbench -j4 31.604 (0.0963328) 34.078 (0.168582) kernbench -j8 26.586 (0.145705) 26.672 (0.0432435) kernbench -j 27.358 (0.440307) 27.49 (0.364897) With HT disabled in BIOS (which means only 4 CPUs for both): Elapsed(stddev) BAREMETAL HVM kernbench -j4 57.754 (0.0642651) 56.46 (0.0578792) kernbench -j8 31.228 (0.0775887) 31.362 (0.210998) kernbench -j 32.316 (0.0270185) 33.084 (0.600442)BTW, there's a mistake here. The three runs, in the no-HT case are as follows: kernbench -j2 kernbench -j4 kernbench -j I.e., half the number of VCPUs, as much as there are VCPUs and unlimited, exactly as for the HT case.Ah -- that's a pretty critical piece of information. So actually, on native, HT enabled and disabled effectively produce the same exact thing if HT is not actually being used: 31 seconds in both cases. But on Xen, enabling HT when it's not being used (i.e., when in theory each core should have exactly one process running), performance goes from 31 seconds to 34 seconds -- roughly a 10% degradation.Yes. 7.96% degradation, to be precise. I attempted an analysis in my first e-mail. Cutting and pasting it here... What do you think? "I guess I can investigate a bit more about what happens with '-j4'. What I suspect is that the scheduler may make a few non-optimal decisions wrt HT, when there are more PCPUs than busy guest VCPUs. This may be due to the fact that Dom0 (or another guest VCPU doing other stuff than kernbench) may be already running on PCPUs that are on different cores than the guest's one (i.e., the guest VCPUs that wants to run kernbench), and that may force two guest's vCPUs to execute on two HTs some of the time (which of course is something that does not happen on baremetal!)." I just re-run the benchmark with credit2, which has no SMT knowledge, and the first run (the one that does not use HT) ended up to be 37.54, while the other two were pretty much the same of above (26.81 and 27.92). This confirms, for me, that it's an SMT balancing issue that we're seen. I'll try more runs, e.g. with number of VCPUs equal less than nr_corse/2 and see what happens. Again, thoughts? Have you tried it with VCPUs pinned to appropriate PCPUs? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |