Xen project Mailing List

Re: [Xen-devel] DomU vs Dom0 performance.

To: sushrut shirole <shirole.sushrut@xxxxxxxxx>

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Fri, 4 Oct 2013 09:24:27 -0400

Cc: Felipe Franciosi <felipe.franciosi@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 04 Oct 2013 13:24:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Oct 03, 2013 at 02:50:27PM -0400, sushrut shirole wrote: > Hi Konrad, > > Thank you for the simple and wonderful explanation. Now I understand why > the syscall micro-benchmark performs better on domU > than the dom0. But I am still confused about 'memory bandwidth' > micro-benchmark performance. Memory bandwidth micro-benchmark > test will cause a page fault when the page is accessed for the first time. > I presume the PTE updates is the major reason for the > performance degradation of the dom0. But after first few page faults, all Correct. Each PTE update at worst requires a hypercall. We do have batching which means you can batch up to 32 PTE updates in one hypercall. But if you mix the PTE updates with mprotect, etc, it gets worst. > the pages would be in the memory (Both dom0 and domU > have 4096M of memory and micro-benchmark uses < test_size * 3 i.e. 1000M * > 3 in this case), then why does there is considerable > amount of performance difference ? I don't know what the micro-benchmark does. Does it use mprotect and any page manipulations? > > Thank you, > Sushrut. > > > > On 1 October 2013 10:24, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>wrote: > > > On Tue, Oct 01, 2013 at 12:55:18PM +0000, sushrut shirole wrote: > > > Please find my response inline. > > > > > > Thank you, > > > Sushrut. > > > > > > On 1 October 2013 10:05, Felipe Franciosi <felipe.franciosi@xxxxxxxxxx > > >wrote: > > > > > > > 1) Can you paste your entire config file here?**** > > > > > > > > This is just for clarification on the HVM bit.**** > > > > > > > > Your âdiskâ config suggests you are using the PV protocol for storage > > > > (blkback). > > > > > > > > kernel = "hvmloader" > > > builder='hvm' > > > memory = 4096 > > > name = "ArchHVM" > > > vcpus=8 > > > disk = [ 'phy:/dev/sda5,hda,w', > > > 'file:/root/dev/iso/archlinux.iso,hdc:cdrom,r' ] > > > device_model = 'qemu-dm' > > > boot="c" > > > sdl=0 > > > xen_platform_pci=1 > > > opengl=0 > > > vnc=0 > > > vncpasswd='' > > > nographic=1 > > > stdvga=0 > > > serial='pty' > > > > > > > > > > 2) Also, can you run âuname -a" in both dom0 and domU and paste it > > here as > > > > well?**** > > > > > > > > Based on the syscall latencies you presented, it sounds like one > > > > domain may be 32bit and the other 64bit.**** > > > > > > > > ** > > > > > > > kernel information on dom0 is : > > > Linux localhost 3.5.0-IDD #5 SMP PREEMPT Fri Sep 6 23:31:56 UTC 2013 > > x86_64 > > > GNU/Linux > > > > > > on domU is : > > > Linux domu 3.5.0-IDD-12913 #2 SMP PREEMPT Sun Dec 9 17:54:30 EST 2012 > > > x86_64 GNU/Linux > > > > > > 3) You are doing this:**** > > > > > > > > ** ** > > > > > > > > > <snip> > > > > > for i in `ls test_file.*` > > > > > do > > > > > sudo dd if=./$i of=/dev/zero > > > > > done > > > > > </snip> > > > > > > > > My bad. I have changed it to /dev/null. > > > > > > **** > > > > > > > > I donât know what you intended with this, but you canât output to > > > > /dev/zero (you can read from /dev/zero, but you can only output to > > > > /dev/null).**** > > > > > > > > If your âimgâ is 5G and your guest has 4G of RAM, you will not > > > > consistently buffer the entire image.**** > > > > > > > > ** > > > > > > > Even though I am using a 5G of img, read operations executed are of size > > 1G > > > only. Also lm_benchmark doesn't involve any read/writes to this ".img", > > > still the results I am getting are better on domU when measured with lm > > > micro benchmarks. > > > > > > > ** > > > > > > > > You are then doing buffered IO (note that some of your requests are > > > > completing in 10us). That can only happen if you are reading from > > memory > > > > and not from disk. > > > > > > > Even though a single request is completing in 10us, total time required > > to > > > complete all requests (5000000) is 17 & 13 seconds for dom0 and domU > > > respectively. > > > > > > (I forgot to mention that I have a SSD installed on this machine) > > > > > > > ** > > > > > > > > If you want to consistently compare the performance between two > > domains, > > > > you should always bypass the VMâs cache with O_DIRECT.**** > > > > > > > > ** > > > > > > > But looking at results of lat_syscall and bw_mem microbenchmarks, it > > shows > > > that syscalls are executed faster in domU and memory bandwidth is more in > > > domU. > > > > Yes. That is expected with HVM guests. Their syscall overhead and also > > memory > > bandwith will be faster than PV guests (which is what dom0 is). > > > > That is why PVH is such an intersting future direction - it is PV with HVM > > containers to lower the syscall overhead and memory page table operations. > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.