[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] High "steal" on new dom0 with no domu's running
A bit of back story. My Xen experience started out about 5 years ago with an Aaeon EMB-CV1 with 4GB of memory, running four VMs. As I outgrew that hardware, I moved to an Aaeon EMB-KB1 with 8GB of memory. I outgrew that hardware and have moved to an Asrock J5040-ITX with 16GB of memory. These have all worked beautifully and have been very performant, it's amazed me the capabilities of paravirtualization and the efficiency attained from the platform. All have been running with a Debian dom0 and Debian domu's, using the Debian-maintained Xen version. Each dom0 install has been a fresh build, migrating the VMs afterwards. I recently decided to add an additional board into the mix to help with load so that things requiring more horsepower (like my ELK stack, minecraft server, nextcloud instance, etc.) can live on the 5040, while lower-CPU stuff like DNS, VPN server, mail server, etc. can live on a board with a slower CPU. I wanted to be on the same CPU architecture so that I could live-migrate VMs for maintenance, so I picked up an Asrock J4205 with 8GB of memory. After installing Debian 10 (my standard build currently,) the board was snappy and performant. I then installed xen-tools and xen-system-amd64, and after rebooting, the system took significantly longer to boot, and was very laggy from the console (and SSH as well.) At this point I wasn't running any VMs, didn't have any custom tweaks, etc. Looking at top, there was a lot of "steal," overall averaging around 10% between all four cores (all physical cores.) I tried tying the dom0 to only one CPU and at that point the dom0 was consistently performant again. However, any domu's I tried to spin up would be very laggy, with high "steal." Live-migrating them back to the 5040 they'd be fine again. This was also the case if I didn't live-migrate but just started them up on the 4205. I thought maybe I'd goofed something up in the build somehow, so I blew away that installation and rebuilt it from scratch, and experience the same thing. I started logging performance with sysstat and this is what I see: On the 5040 with 6 VMs running: 08:25:01 AM CPU %user %nice %system %iowait %steal %idle 08:35:01 AM all 0.03 0.00 0.11 0.00 0.05 99.81 08:45:01 AM all 0.03 0.00 0.11 0.00 0.05 99.81 08:55:01 AM all 0.03 0.00 0.17 0.00 0.20 99.60 09:05:01 AM all 0.03 0.00 0.12 0.00 0.05 99.80 09:15:01 AM all 0.03 0.00 0.13 0.00 0.09 99.75 09:25:01 AM all 0.03 0.00 0.18 0.00 0.26 99.53 Average: all 0.03 0.00 0.14 0.00 0.12 99.72 On the 4205 with no VMs running: 08:35:01 AM CPU %user %nice %system %iowait %steal %idle 08:45:02 AM all 0.03 0.00 0.07 0.01 7.74 92.15 08:55:02 AM all 0.03 0.00 0.09 0.00 8.95 90.93 09:05:01 AM all 0.03 0.00 0.07 0.00 9.19 90.70 09:15:01 AM all 0.03 0.00 0.08 0.00 7.93 91.96 09:25:01 AM all 0.03 0.00 0.07 0.00 8.85 91.05 09:35:01 AM all 0.03 0.00 0.19 0.00 6.73 93.05 Average: all 0.03 0.00 0.09 0.00 8.24 91.63 # top top - 09:45:55 up 1:29, 1 user, load average: 0.15, 0.11, 0.08 Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 1.1 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si, 2.3 st %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 78.9 id, 0.0 wa, 0.0 hi, 0.0 si, 21.1 st %Cpu2 : 0.0 us, 1.1 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si, 9.6 st %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni, 62.8 id, 0.0 wa, 0.0 hi, 0.0 si, 37.2 st Is there any way to tell what's causing the performance degradation, and what the dom0 is doing when it's "stealing" the CPU? I've been googling the issue a lot the last few days and haven't found anything useful so far, only threads saying that this happens when you oversubscribe your domu's, but as I'm not running any domu's at this point I don't see how that could be an issue since it's just sitting there looking cool but not doing any real work. Local disk storage on both dom0's is a single 20GB Intel 313 SLC SSD. VMs are stored on a Debian nas box, connecting via iscsi. # uname -a Linux vhost2 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux # cat /etc/debian_version 10.9 # xl info host : release : 4.19.0-14-amd64 version : #1 SMP Debian 4.19.171-2 (2021-01-30) machine : x86_64 nr_cpus : 4 max_cpu_id : 3 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 1497.612 hw_caps : bfebfbff:47f8e3bf:2c100800:00000101:0000000f:2094e283:00000000:00000100 virt_caps : hvm hvm_directio total_memory : 8040 free_memory : 7413 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 11 xen_extra : .4 xen_version : 4.11.4 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : xen_commandline : placeholder dom0_mem=512M,max:512M no-real-mode edd=off cc_compiler : gcc (Debian 8.3.0-6) 8.3.0 cc_compile_by : pkg-xen-devel cc_compile_domain : lists.alioth.debian.org cc_compile_date : Fri Dec 11 21:33:51 UTC 2020 build_id : 6d8e0fa3ddb825695eb6c6832631b4fa2331fe41 xend_config_format : 4 Chris
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |