Angela, 
 
I'm not sure what you EXPECTED to see. A virtual machine 
will always be (somewhat) slower than the "real" hardware, because you have an 
extra software layer for some operations. Basicly, this is the price you pay for 
the extended system functionality that you get. It's the same as saying "If I 
remove the file-system from my Operating System, I can read from or write 
to the disk much quicker than going through the file-system"[1]. You gain 
some functionality, and you loose in performance. 
 
Comments on Byte-Bench:
I can't explain the pipe throughput, because I just don't 
know anything about how that works.
 
Process creation involves a lot of page-table work, which 
is definitely a typical situation where the hypervisor (Xen) has to take extra 
action on top of what's normally done in the OS, as each operations that 
normally are trivial writes to a page-table entry now has become a call into Xen 
to perform the "trivial operation". So instead of a few simple operations, we 
now have a software interrupt, a function call and several extra operations just 
to find out what needs to be done, then the actual page-table update. I expect 
this to be an order of magnitude slower than the native 
operations.
 
My guess is that shell-scripts isn't slower in themselves, 
but that there are several new processes created within the shell-script. 
 
 
Comments on lmbench:
read & write are slower - no big surprise, it's most likely that the read & write's go to a file, which commonly is emulated through the loopback 
mounted file that is the DomU's "disk". So you get twice the amount of reads, one in Dom0 reading the disk image, and the data is 
then transferred to DomU through a "read" operation. 
 
Similar for any of the other file-related operations, 
they become two-step operations, with Dom0 doing the actual work and then 
transferring the result to DomU.
 
Protection fault handling go through extra steps, as 
the code enters Xen itself and then has to be passed back to the actual guest 
that prot-faulted, so it's expected that those take longer than the same 
operation in native OS.
 
I still have no explanation for the pipe behaviour - in 
the few minutes I've been working on this answer, I haven't learnt how pipes 
work ;-)
 
Sockets, probably related to pipes... But I have no 
real idea how pipes or sockets work... 
 
fork+<something>: More work needed in virtual 
machine than in the real hardware, as described on process creation above. A 
factor ~2x slower isn't bad at all... Some of these operations also involve file 
operations, which adds to the already slower operation. 
 
[1] This assumes the file-system is relatively stupid 
in caching things, because a modern file-system performs a lot of clever 
caching/optimisation to increase the system 
performance.
 
--
Mats
 
  
  Hi all,
While doing some benchmarking of Xen, I ran across a 
  couple performance issues. I am wondering if anyone else has noticed this and 
  whether there is anything I can do to tune the performance.
The 
  setup:
CPU: Athlon XP 2500+ (1826.005 MHz) 
RAM: Limited to 256 MB in 
  native and xenU 
Disk:Maxtor 6B200P0, ATA DISK drive 
Motherboard: ASUS 
  A7VBX-MX SE 
Network: tested only loopback interface.
I have Fedora 
  Core 4 installed as dom0, with Scientific Linux 3.0.7 (RHEL3) installed on a 
  separate partition as the single domU. I installed the FC4 xen rpms 
  (xen-3.0-0.20050912.fc4, kernel-xenU-2.6.12-1.1454_FC4, 
  kernel-xen0-2.6.12-1.1454_FC4) using yum.
I used the following 
  benchmark tools/suites:
bonnie++-1.03a
UnixBench 4.1.0
ab
lmbench 
  3.0-a5 
The areas where I saw the greatest performance hit were in 
  system calls, process creation, and pipe throughput. Here are some selected 
  results:
UnixBench:
============
Scientific Linux 3 
  Native:
  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- 
  Linux localhost.localdomain 2.4.21-27.0.2.EL #1 Tue Jan 18 20:27:31 CST 2005 
  i686 athlon i386 GNU/
Linux
  Start Benchmark Run: Thu Sep 22 
  15:23:17 PDT 2005
   2 interactive users.
   
  15:23:17  up 12 min,  2 users,  load average: 0.03, 0.08, 
  0.05
  lrwxr-xr-x    1 root     
  root            4 
  Sep  9 10:56 /bin/sh -> bash
  /bin/sh: symbolic link to 
  bash
  
  /dev/hdc11            
  20161172   5059592  14077440  27% 
  /
<--snip-->
System Call 
  Overhead                     
  995605.1 lps   (10.0 secs, 10 samples)
Pipe 
  Throughput                          
  1135376.3 lps   (10.0 secs, 10 samples)
Pipe-based Context 
  Switching             
  375521.7 lps   (10.0 secs, 10 samples)
Process 
  Creation                           
  9476.4 lps   (30.0 secs, 3 samples)
Execl 
  Throughput                           
  2918.3 lps   (29.7 secs, 3 
  samples)
<--snip-->
                     
  INDEX 
  VALUES
TEST                                        
  BASELINE     RESULT      
  INDEX
Dhrystone 2 using register 
  variables        116700.0  
  4307104.5      369.1
Double-Precision 
  Whetstone                      
  55.0      980.4      
  178.3
Execl 
  Throughput                                
  43.0     2918.3      
  678.7
File Copy 1024 bufsize 2000 
  maxblocks         3960.0   
  143780.0      363.1
File Copy 256 bufsize 500 
  maxblocks           
  1655.0    72156.0      436.0
File 
  Copy 4096 bufsize 8000 
  maxblocks         5800.0   
  192427.0      331.8
Pipe 
  Throughput                              
  12440.0  1135376.3      912.7
Process 
  Creation                               
  126.0     9476.4      
  752.1
Shell Scripts (8 
  concurrent)                     
  6.0      329.7      
  549.5
System Call 
  Overhead                         
  15000.0   995605.1      
  663.7
                                                                 
  =========
     FINAL 
  SCORE                                                     
  475.2 
--------------------------------------------
SL3 
  XenU
  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- Linux 
  localhost.localdomain 2.6.12-1.1454_FC4xenU #1 SMP Fri Sep 9 00:45:34 EDT 2005 
  i686 athlon i386 GNU/Linux
  Start Benchmark Run: Fri Sep 23 09:08:23 
  PDT 2005
   1 interactive users.
   09:08:23  
  up 0 min,  1 user,  load average: 0.95, 0.25, 0.08
  
  lrwxr-xr-x    1 root     
  root            4 
  Sep  9 10:56 /bin/sh -> bash
  /bin/sh: symbolic link to 
  bash
  
  /dev/sda1             
  20161172   5058964  14078068  27% 
  /
<--snip-->
System Call 
  Overhead                     
  969225.3 lps   (10.0 secs, 10 samples)
Pipe 
  Throughput                          
  619270.7 lps   (10.0 secs, 10 samples)
Pipe-based Context 
  Switching              
  85183.9 lps   (10.0 secs, 10 samples)
Process 
  Creation                           
  3014.6 lps   (30.0 secs, 3 samples)
Execl 
  Throughput                           
  1807.4 lps   (29.9 secs, 3 
  samples)
<--snip-->
                     
  INDEX VALUES            
  
TEST                                        
  BASELINE     RESULT      
  INDEX
Dhrystone 2 using register 
  variables        116700.0  
  4288647.9      367.5
Double-Precision 
  Whetstone                      
  55.0      976.3      
  177.5
Execl 
  Throughput                                
  43.0     1807.4      
  420.3
File Copy 1024 bufsize 2000 
  maxblocks         3960.0   
  143559.0      362.5
File Copy 256 bufsize 500 
  maxblocks           
  1655.0    70328.0      424.9
File 
  Copy 4096 bufsize 8000 
  maxblocks         5800.0   
  186297.0      321.2
Pipe 
  Throughput                              
  12440.0   619270.7      497.8
Process 
  Creation                               
  126.0     3014.6      
  239.3
Shell Scripts (8 
  concurrent)                     
  6.0      188.0      
  313.3
System Call 
  Overhead                         
  15000.0   969225.3      
  646.2
                                                                 
  =========
     FINAL 
  SCORE                                                     
  356.0
---------------------------------------------------------------------------------
lmbench 
  Selected Results:
==========================
SL3 
  Native:
<--snip-->
Simple syscall: 0.1516 microseconds
Simple 
  read: 0.2147 microseconds
Simple write: 0.1817 microseconds
Simple stat: 
  1.8486 microseconds
Simple fstat: 0.3026 microseconds
Simple open/close: 
  2.2201 microseconds
<--snip-->
Protection fault: 0.2196 
  microseconds
Pipe latency: 2.2539 microseconds
AF_UNIX sock stream 
  latency: 4.8221 microseconds
Process fork+exit: 143.7297 
  microseconds
Process fork+execve: 483.0833 microseconds
Process 
  fork+/bin/sh -c: 1884.0000 
  microseconds
-------------------------------------------------
SL3 
  XenU:
<--snip-->
Simple syscall: 0.1671 microseconds
Simple 
  read: 0.4090 microseconds
Simple write: 0.3588 microseconds
Simple stat: 
  3.5761 microseconds
Simple fstat: 0.5530 microseconds
Simple open/close: 
  3.9425 microseconds
<--snip-->
Protection fault: 0.5993 
  microseconds
Pipe latency: 12.1886 microseconds
AF_UNIX sock stream 
  latency: 22.3485 microseconds
Process fork+exit: 365.8667 
  microseconds
Process fork+execve: 1066.4000 microseconds
Process 
  fork+/bin/sh -c: 3826.0000 
  microseconds
<--snip-->
-------------------------------------------------------------------------
I 
  can post the full results of these tests if anyone is interested. 
Does 
  anyone have any ideas for tuning the performance of the domUs? Are there any 
  configurations that perform better than others? 
Thank You,
Angela 
  Norton