Xen project Mailing List

Re: [Xen-API] XCP 1.1 - poor disk performance, high load and wa

From: SpamMePlease PleasePlease <spankthespam@xxxxxxxxx>

Date: Sat, 22 Sep 2012 19:36:23 +0100

Delivery-date: Sat, 22 Sep 2012 18:36:41 +0000

List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>

Hi all, Since I am still looking for a solution to this mystery, I've posted some additional data about the hardware: http://pastebin.com/Z4BVShvF - you can find there output of lspci, dmidecode and dmesg. As a sidenote, I've reinstalled the OS with 4K sectors in mind and GPT, and the performance is still crawling when comparing to the installation on older hardware (the same vm imports/exports in ~50m vs ~3m). I really hope someone can shed some light on what's happening and how to fix it. Cheers, S. On Fri, Sep 21, 2012 at 3:18 PM, SpamMePlease PleasePlease <spankthespam@xxxxxxxxx> wrote: > Actually, I've reinstalled the OS without md raid (despite the fact I > have this configuration working perfectly fine on another server) and > I still have extremely poor performance when importing vm's: > > * the process takes over hour for a vm exported in ~3 minutes > * top looks like that, when importing the vm to fresh, empty (no > running vm's) system: > > top - 16:09:40 up 1:49, 3 users, load average: 1.36, 1.44, 1.38 > Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.3%us, 0.0%sy, 0.0%ni, 60.7%id, 39.0%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 771328k total, 763052k used, 8276k free, 269656k buffers > Swap: 524280k total, 0k used, 524280k free, 305472k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 6433 root 20 0 217m 25m 8348 S 0.7 3.4 0:35.63 xapi > 92 root 20 0 0 0 0 S 0.3 0.0 0:00.02 bdi-default > 11775 root 20 0 2036 1072 584 S 0.3 0.1 0:01.63 xe > 15058 root 20 0 2424 1120 832 S 0.3 0.1 0:06.77 top > 17127 root 20 0 2424 1108 828 R 0.3 0.1 0:00.13 top > > * the iostat looks like that: > > avg-cpu: %user %nice %system %iowait %steal %idle > 0.15 0.00 0.20 44.85 0.05 54.75 > > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz > avgqu-sz await svctm %util > sda 0.00 0.40 2.40 2.40 1864.00 575.60 508.25 > 2.97 649.58 208.33 100.00 > sda1 0.00 0.00 2.40 0.40 1864.00 11.20 669.71 > 0.82 346.43 281.43 78.80 > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda3 0.00 0.40 0.00 2.00 0.00 564.40 282.20 > 2.14 1074.00 500.00 100.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > dm-1 0.00 0.00 0.00 2.20 0.00 759.20 345.09 > 2.57 976.36 454.55 100.00 > tda 0.00 0.00 1.00 8.60 8.00 756.80 79.67 > 16.00 1546.04 104.17 100.00 > xvda 0.00 130.00 1.00 8.60 8.00 756.80 79.67 > 148.83 10160.42 104.17 100.00 > > * smartclt shows both disks to be in perfect health > * hdparm reports decent speeds on raw sda/sdb devices (~160Mb/s) > * I was pointed out that the drives are new 4k sector size ones, and > I've modified install.img .py files to accomodate that change in few > places, and will try to reinstall the machine afterwards > > Any clue? > S. > > On Fri, Sep 21, 2012 at 3:17 PM, SpamMePlease PleasePlease > <spankthespam@xxxxxxxxx> wrote: >> Actually, I've reinstalled the OS without md raid (despite the fact I >> have this configuration working perfectly fine on another server) and >> I still have extremely poor performance when importing vm's: >> >> * the process takes over hour for a vm exported in ~3 minutes >> * top looks like that, when importing the vm to fresh, empty (no >> running vm's) system: >> >> top - 16:09:40 up 1:49, 3 users, load average: 1.36, 1.44, 1.38 >> Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.3%us, 0.0%sy, 0.0%ni, 60.7%id, 39.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 771328k total, 763052k used, 8276k free, 269656k buffers >> Swap: 524280k total, 0k used, 524280k free, 305472k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 6433 root 20 0 217m 25m 8348 S 0.7 3.4 0:35.63 xapi >> 92 root 20 0 0 0 0 S 0.3 0.0 0:00.02 bdi-default >> 11775 root 20 0 2036 1072 584 S 0.3 0.1 0:01.63 xe >> 15058 root 20 0 2424 1120 832 S 0.3 0.1 0:06.77 top >> 17127 root 20 0 2424 1108 828 R 0.3 0.1 0:00.13 top >> >> * the iostat looks like that: >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.15 0.00 0.20 44.85 0.05 54.75 >> >> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.40 2.40 2.40 1864.00 575.60 508.25 >> 2.97 649.58 208.33 100.00 >> sda1 0.00 0.00 2.40 0.40 1864.00 11.20 669.71 >> 0.82 346.43 281.43 78.80 >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda3 0.00 0.40 0.00 2.00 0.00 564.40 282.20 >> 2.14 1074.00 500.00 100.00 >> sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> dm-1 0.00 0.00 0.00 2.20 0.00 759.20 345.09 >> 2.57 976.36 454.55 100.00 >> tda 0.00 0.00 1.00 8.60 8.00 756.80 79.67 >> 16.00 1546.04 104.17 100.00 >> xvda 0.00 130.00 1.00 8.60 8.00 756.80 79.67 >> 148.83 10160.42 104.17 100.00 >> >> * smartclt shows both disks to be in perfect health >> * hdparm reports decent speeds on raw sda/sdb devices (~160Mb/s) >> * I was pointed out that the drives are new 4k sector size ones, and >> I've modified install.img .py files to accomodate that change in few >> places, and will try to reinstall the machine afterwards >> >> Any clue? >> S. >> >> On Mon, Sep 17, 2012 at 2:44 PM, Denis Cardon >> <denis.cardon@xxxxxxxxxxxxxxxxxxxxxx> wrote: >>> Hi George, >>> >>>> By default XCP 1.1. does not support software raid. >>>> >>>> Under certain condition you can use is, but you need to know you >>>> going >>>> do deep water. And it's better to know how to swim... I mean, >>>> understand >>>> internals of XCP. >>>> >>>> If you are 'just a user' - do not use software raid. >>>> >>>> If you wish some help here - say which device (virtual or real) is >>>> bottleneck. >>> >>> first thank you for the dedicated time you spend on the xcp mailing list. >>> It is definitly a big advantage for the project to have a dynamic mailing >>> list. >>> >>> I've come across the slow md raid5 issue with a XCP 1.1 a month ago, and >>> didn't have much time to look into it since it is a non production small >>> dev/test server and I'm using hardware raid on production servers. >>> >>> Like the initial poster's mail, on my dev server io write access goes to >>> hell and loadavg skyrockets even with very light disk write. However the >>> behavior is not consistent with a standard io staturation since parallel io >>> access are not much affected... That is to say that I can launch a "iozone >>> -a -n512m -g512m" on a 256MB VM and at the same time a "find /" still goes >>> thought quite smoothly... Using vmstat on dom0 sometime I saw up to 60k >>> blocks per seconds (4k block I guess) so thoughput seems to be acceptable >>> some time. Note : I'm activated cache on the SATA disk (I know, bad idea >>> for production but ok for me for dev). I've not experienced such behavior >>> with installation with hardware RAID. >>> >>> If I have some time this week, I'll try to convert the setup with ext3 >>> partition (thin provisioning) to be able to make iozone directly on the SR. >>> I know md soft RAID is not a supported configuration, and I agree it should >>> not be unless the sata disk cache is deactivate (which gives bad >>> performance). However in some very small setup or dev setup, it might still >>> be interesting. >>> >>> Cheers and keep on the good work! >>> >>> Denis >>> >>>> >>>> >>>> On 16.09.2012 20:43, SpamMePlease PleasePlease wrote: >>>> > All, >>>> > >>>> > I've installed XCP 1.1 from latest available ISO on Hetzner's 4S >>>> > server with usage of md raid 1 and LVM for local storage >>>> > repository. >>>> > The problem Im seeing is extremely poor disk performance - >>>> > importing >>>> > vm file that was exported in ~3m from another XCP 1.1 (also >>>> > Hetzner's >>>> > server, but bit older one, EQ6) takes up to 2 hours, and in >>>> > meantime >>>> > the dom0 becomes almost unusable, the load goes up to 2, it has >>>> > constant 50% (or higher) of wa(it) and is extremely sluggish. >>>> > >>>> > Now, I wouldnt mind the sluggishness of dom0, but 2h for vm import >>>> > seems crazy and unacceptable. I've made multiple installations of >>>> > the >>>> > server to make sure I am not doing anything wrong, but the same >>>> > setup >>>> > works flawless on older machine. I've tested the drives and they >>>> > seem >>>> > to be fine, with up to ~170mb/s throughtput on SATA3. >>>> > >>>> > Is there anything else I can check to see if its hardware problem, >>>> > or >>>> > anything that could be configured on dom0 to make it operational >>>> > and >>>> > usable? >>>> > >>>> > Kind regards, >>>> > S. >>>> > >>>> > _______________________________________________ >>>> > Xen-api mailing list >>>> > Xen-api@xxxxxxxxxxxxx >>>> > http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api >>>> >>>> _______________________________________________ >>>> Xen-api mailing list >>>> Xen-api@xxxxxxxxxxxxx >>>> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api >>>> >>> >>> _______________________________________________ >>> Xen-api mailing list >>> Xen-api@xxxxxxxxxxxxx >>> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.