[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] lots of cycles in i/o wait state
Hi Folks,I've been doing some experimenting to see how far I can push some old hardware into a virtualized environment - partially to see how much use I can get out of the hardware, and partially to learn more about the behavior of, and interactions between, software RAID, LVM, DRBD, and Xen. Basic configuration:- two machines, 4 disk drives each, two 1G ethernet ports (1 each to the outside world, 1 each as a cross-connect) - each machine runs Xen 3 on top of Debian Lenny (the basic install)- very basic Dom0s - just running the hypervisor and i/o (including disk management) ---- software RAID6 (md) ---- LVM ---- DRBD ---- heartbeat to provide some failure migration- dom0, on each machine, runs directly on md RAID volumes (RAID1 for boot, RAID6 for root and swap) - each Xen VM uses 2 DRBD volumes - one for root, one for swap - one of the VMs has a third volume, used for backup copies of filesOne domU, on one machine, runs a medium volume mail/list server. This used to run non-virtualized on one of the machines, and I moved it into a domU. Before virtualization, everything just hummed along (98% idle time as reported by top). Virtualized, the machine is mostly idle, but now top reports a lot of i/o wait time, usually in the 20-25% range). As I've started experimenting with adding additional domUs, in various configurations, I've found that my mail server can get into a state where it's spending almost all of its cycles in an i/o wait state (95% and higher as reported by top). This is particularly noticeable when I run a backup job (essentially a large tar job that reads from the root volume and writes to the backup volume). The domU grinds to halt. So I've been trying to track down the bottlenecks.At first, I thought this was probably a function of pushing my disk stack beyond reasonable limits - what with multiple domUs on top of DRBD volumes, on top of LVM volumes, on top of software RAID6 (md). I figured I was seeing a lot of disk churning. But... after running some disk benchmarks, what I'm seeing is something else: - I took one machine, turned off all the domUs, and turned off DRBD- I ran a disk benchmark (bonnie++) on dom0, which reported 50MB/sec to 90MB/sec of throughput depending on the test (not exactly sure what this means, but it's a baseline) - I then brought up DRBD and various combinations of domUs, and ran the benchmark in various places - the most interesting result, running in the same domU as the mail server: 34M-60M depending on the test (not much degredation from running directly on the RAID volume - but.... while running, the benchmark, the baseline i/o wait percentage jumps from 25% to the 70-90% range So... the question becomes, if it's not disk churning, what's causing all those i/o wait cycles? I'm starting to think it might involve buffering or other interactions in the hypervisor. Any thoughts or suggestions regarding diagnostics and/or tuning? (Other than "throw hardware at it" of course :-). Thanks very much, Miles Fidelman _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |