Xen project Mailing List

I've been trying to compare memory access speed between bare-metal, xen-pv and xen-pvhvm (hvm with pv drivers). In all 3 setups I'm running the same kernel (3.6.6), built with support for xen, on a 64 core AMD Opteron 6378. The output of xm info (relevant parts):

machine : x86_64

nr_cpus : 64

nr_nodes : 8

cores_per_socket : 16

threads_per_core : 1

cpu_mhz : 2400

hw_caps : 178bf3ff:2fd3fbff:00000000:00001710:32983203:00000000:01ebbfff:00000008

virt_caps : hvm

total_memory : 524262

free_memory : 498318

free_cpus : 0

xen_major : 4

xen_minor : 1

xen_extra : .2

xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64

xen_scheduler : credit

xen_pagesize : 4096

platform_params : virt_start=0xffff800000000000

xen_changeset : Tue Apr 10 10:50:08 2012 +0200 12:efd10c64454c

xen_commandline : auto BOOT_IMAGE=user-xen root=801 placeholder no-bootscrub dom0_mem=4096M dom0_max_vcpus=16 dom0_vcpus_pin root=/dev/sda1 noreboot

cc_compiler : gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)

To avoid as much noise as possible I've got a kernel module in which I'm performing the following straight forward test:

for (j = 2; j < 18; j++) {

cpysize = 1 << j;

cpycnt = size / cpysize;

do_gettimeofday(&tvb);

for (i = 0; i < loops; ++i) {

for (k = 0; k < cpycnt; ++k) {

memcpy(src+k*cpysize,dst+k*cpysize,cpysize);

}

do_gettimeofday(&tve);

msec = timevaldiff(&tvb, &tve);

printk(KERN_INFO "Did the loops in %ld msec for cpysize = %d \n", msec, cpysize);

}

Where src and dst were allocate with vmalloc of size "size". Also, src is initialized with 0s.

Since it's the same code, run within the kernel, I would have expected the obvious: bare metal to be the fastest, then pvhvm and then pv. Curiously, the numbers I get (summarized in the table below), contradict my assumption:

	time (msec) - 100 loops over 16MB
chunk size (cpysize)	bare metal	pvhvm	pv
4	5827	4606	4971
8	3865	3030	3448
16	3177	2134	2270
32	3241	2216	2059
64	1009	943	925
128	760	599	566
256	767	592	559
512	727	587	544
1024	701	575	524
2048	688	570	507
4096	678	566	498
8192	662	552	489
16384	652	542	480
32768	646	539	478
65536	644	535	474
131072	643	535	473

The peculiar observations:

=================

1. bare metal seems to be slower in all cases (???)

2. pvhvm is faster then pv, but only for small chunks

3. for large chunks, the order is the reverse one of what I would have anticipated: pv (fastest), pvhvm then bare metal (slowest).

Does anyone have any ideas why this might be happening?

Am I missing something?

Cheers,

-- Tudor.

[Xen-users] Investigating memory performance: bare metal vs. xen-pv vs. xen-hvm