Xen project Mailing List

Re: [Xen-users] add_random must be set to 1 for me - Archlinux HVM x64 - XenServer 7 Latest Patched

To: "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>

Date: Tue, 25 Oct 2016 13:36:25 -0500

Delivery-date: Tue, 25 Oct 2016 18:37:02 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

On Tue, Oct 25, 2016 at 1:29 PM, Austin S. Hemmelgarn <ahferroin7@xxxxxxxxx> wrote: > On 2016-10-25 13:40, WebDawg wrote: >> >> On Tue, Oct 25, 2016 at 6:29 AM, Austin S. Hemmelgarn >> <ahferroin7@xxxxxxxxx> wrote: >>> >>> On 2016-10-24 14:53, WebDawg wrote: >>>> >>>> >>>> On Wed, Oct 19, 2016 at 2:23 PM, Austin S. Hemmelgarn >>>> <ahferroin7@xxxxxxxxx> wrote: >>>> So adding 3 more vCPU for a total of 4x on the domU, this just by >>>> itself speeds up the dd write to xvda to 20MB a second. But, the IO >>>> load also adds a sy aka system cpu time load to almost all the CPU's. >>>> (the CPU load has been sy load the entire time) All in all it sticks >>>> to about 200-300% CPU use at this point. >>> >>> >>> >>>> The only thing that changes anything at this point is to add the >>>> oflag=direct to dd. When I add that CPU use dramatically lowers and >>>> write speed goes much higher. Still the CPU use compared to debian is >>>> no change. >>> >>> >>> OK, this suggests the issue is somewhere in the caching in the guest OS. >>> My >>> first thoughts knowing that are: >>> 1. How much RAM does the VM have? >>> 2. What value does `cat /proc/sys/vm/vfs_cache_pressure` show? >>> 3. Are you doing anything with memory ballooning? >> >> >> /proc/sys/vm/vfs_cache_pressure shows a value of 100 on both domU's. >> >> THE BALLOONING IS THE ANSWER. >> >> Okay, >> >> So I do not know what the deal is but when you mentioned ballooning I >> was looking at the memory settings of the domU. These are the >> settings that I had: >> >> Static: 128 MiB/ 2 GiB >> Dynamic: 2 GiB/ 4 GiB >> >> I cannot even set this at command line. I wanted to replicate the >> problem after I fixed it and tried this: >> >> xe vm-memory-limits-set dynamic-max=400000000 dynamic-min=200000000 >> static-max=200000000 static-min=16777216 name-label=domU-name >> >> Error code: MEMORY_CONSTRAINT_VIOLATION >> >> Error parameters: Memory limits must satisfy: static_min <= >> dynamic_min <= dynamic_max <= static_max >> >> The dynamic MAX was bigger then the static MAX, which is impossible to >> set, but somehow happened. >> >> I do not know if it happened from import from XenServer 6.5 to >> XenServer 7, or the multiple software products I was using to manage >> it or just something got corrupted. >> >> So I have been checking it all out after setting everything to this: >> >> Static: 128 MiB/ 2 GiB >> Dynamic: 2 GiB/ 2 GiB >> >> It is all working as expected now! >> >> Like I said...I cannot understand how the dynamic MAX was bigger then >> the static MAX when XenServer does not allow you to set this. Does >> anyone have any experience in setting these bad settings and can >> explain why I was having such bad CPU use issues? > > I have no idea how it could have happened, or why it was causing what you > were seeing to happen. >> >> >>>> >>>> Debian is: 3.16.0-4-amd64 >>>> >>>> archlinux is: 4.8.4-1-ARCH #1 SMP PREEMPT Sat Oct 22 18:26:57 CEST >>>> 2016 x86_64 GNU/Linux >>>> >>>> >>>> scsi_mod.use_blk_mq=0 and it looks like it did nothing. My >>>> /proc/cmdline shows that it is there and it should be doing >>>> something....but my scheduler setting in the queue dir still says >>>> none. >>>> >>>> I think looking into this that the none is a result of the xen PVHVM >>>> block front driver? >>> >>> >>> Probably. I've not been able to find any way to turn it off for the Xen >>> PV >>> block device driver (which doesn't really surprise me, xen-blkfront has >>> multiple (virtual) 'hardware' queues, and stuff like that is exactly what >>> blk-mq was designed to address (although it kind of sucks for anything >>> like >>> that except NVMe devices right now)). >>> >> >> If I remember too, the kernel line stuff to tune this, is not in the >> docs yet :/ At least the ones I was looking at. > > Well, I've been looking straight at the code in Linux, and can't find it > either (although I could just be missing it, C is not my language of choice, > and I have even less experience reading kernel code). > >> >>>> >>>>>> >>>>>> >>>>>> If someone could shed some insight why enabling IO generation/linking >>>>>> of timing/entropy data to /dev/random makes the 'system work' this >>>>>> would be great. Like I said, I am just getting into this and I will >>>>>> be doing more tuning if I can. >>>>> >>>>> >>>>> >>>> ONCE AGAIN, I am wrong here. add_random does nothing to help me >>>> anymore. In fact I cannot find any setting under queue that does >>>> anything to help, at least in what I am trying to fix. >>>> >>>> I am sorry for this false information. >>>> >>>>> I'm kind of surprised at this though, since I've >>>>> got half a dozen domains running fine with blk-mq getting within 1% of >>>>> the >>>>> disk access speed the host sees (and the host is using blk-mq too, both >>>>> in >>>>> the device-mapper layer, and the lower block layer). Some info about >>>>> the >>>>> rest of the storage stack might be helpful (ie, what type of backing >>>>> storage >>>>> are you using for the VM disks (on LVM, MD RAID, flat partitions, flat >>>>> files, etc), what Xen driver (raw disk, blktap, something else?), and >>>>> what >>>>> are you accessing in the VM (raw disk, partition, LVM volume, etc))? >>>> >>>> >>>> >>>> This is a RAID 6 SAS Array. >>>> >>>> The kernel that I am using (archlinux: linux), is all vanilla except >>>> for, it looks like, one patch: >>>> >>>> >>>> >>>> https://git.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/linux >>>> >>>> That patch changes from 7 to >>>> CONSOLE_LOGLEVEL_DEFAULT 4 >>>> >>>> These are the results from some tests: >>>> >>>> dd if=/dev/zero of=/root/test2 bs=1M oflag=direct >>>> 19.7 MB/s >>>> CPU: 20% >>>> >>>> dd if=/dev/zero of=/root/test2 bs=1M >>>> 2.5 MB/s >>>> CPU: 100% >>> >>> >>> This brings one other thing to mind: What filesystem are you using in the >>> domU? My guess is that this is some kind of interaction between the >>> filesystem and the blk-mq code. One way to check that would be to try >>> writing directly to a disk from the domU instead of through the >>> filesystem. >>> >> >> I am using ext4. > > Even aside from the fact that you figured out what was causing the issue, > the fact that your using ext4 pretty much completely rules out the > filesystem as a potential cause. >> >> >>>> >>>> This is sy CPU / system CPU use; so something in the kernel? >>>> >>>> One the debian domU almost no CPU is hit. >>>> >>>> I am also thinking that 20MB/s is bad in general for my RAID6 as >>>> almost nothing is reading and writing to it. But one thing at a time >>>> and the only reason I mention it, is that it might help to figure out >>>> this issue. >>> >>> >>> >> >> So, now that it looks like I have figured out what is wrong, but not >> figured out how it got that wrong: Does anyone have any pointers for >> increasing the speed of the Local Storage array? I know I can add the >> backup battery, but even without that...a SAS RAID6 running at 20mb a >> second in domU seems so slow.... > > It really depends on a bunch of factors. Things ranging like how many disks > in the array, how many arrays is the HBA managing, is it set up for > multipath, how much RAM dom0 and the domU have, what kind of memory > bandwidth you have, and even how well the HBA driver is written can have an > impact. >> >> >> In dom0 I get about 20-21 mb a second with dd oflag=direct > > I would expect this to be better without oflag=direct. Direct I/O tends to > slow things down because it completely bypasses the page cache _and_ it > pretty much forces things to be synchronous (which means it slows down > writes _way_ more than it slows down reads). > > As a couple of points of comparison, using the O_DIRECT flag on the output > file (which is what oflag=direct does in dd) cuts write speed by about 25% > on the high-end SSD I have in my laptop, and roughly 30% on the consumer > grade SATA drives in my home server system. > > And, as a general rule, performance with O_DIRECT isn't a good reference > point for most cases, since very little software uses it without being > configured to do so, and pretty much everything in widespread use that can > use it is set up so it's opt-in (and usually used with AIO as well, which dd > can't emulate). >> >> >> I think XenServer or Xen has some type of disk io load balancing stuff >> or am I wrong? > > Not that I know of. All the I/O scheduling is done by Domain 0 (or at least > all the scheduling on the host side). You might try booting dom0 with > blk-mq disabled if it isn't already. > Thanks for all the help with this. It has been about a week coming and while I am very satisfied to have figured it out, putting this much work into a configuration error was really dissatisfying. At least I learned a great deal about the IO schedulers in linux and even more about how xen handles it all. I have been meaning to dive into Linux/Xen IO because of some bad performance in the past but never got to it. I also have been meaning to work on compiling kernels in Archlinux and such so I have learned a bit about that. Overall the experience has not all been that bad. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx https://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.