[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: High xen_hypercall_sched_op usage
On 14.11.23 15:54, Klaus Darilion wrote: Hi! Server: AMD Rome 64C/128T, 2xNVME SSDs->Linux Softraid->LVM (some LVs use DRBD) dom0: Ubuntu 2204, 16vCPUs, dom0_vcpus_pin domU 1: PV, Ubuntu 2204, 80vCPUs, no pinning, load 30, Postgresql-DB Server domU 2: PV, Ubuntu 2204, 16vCPUs, no pinning, load 1-2, webserver For whatever reason, today the DB-server was getting slow. We saw: -increased load -increased CPU (only "system" increased) -reduced disk IOps -increased disk IO Latency -no increase in userspace workloadStill we do not know if the reduced IO performance was the cause of the issue, or the consequence of the issue. We reduced load from the DB, dis-/reconnected DRBD, fstrim in domU. After some time things were fine again.To better understand what was happening maybe someone can answer my questions: a) I used the "perf top" utility in the domU and it reports something like: 76.23% [kernel] [k] xen_hypercall_sched_op 4.14% [kernel] [k] xen_hypercall_xen_version 0.97% [kernel] [k] pvclock_clocksource_read 0.84% perf [.] queue_event 0.81% [kernel] [k] pte_mfn_to_pfn.part.00.57% postgres [.] hash_search_with_hash_valueSo most of CPU time is consumed by xen_hypercall_sched_op. IS it normal that xen_hypercall_sched_opbasically eats up all CPU? Is this an indication of some underlying problem? Or is that normal? In a PV guest the sched_op hypercall is used e.g. for going to idle. I guess you are adding up all idle time to the sched_op hypercall. b) I know that we only have CPU pinning for the dom0, but not for the domU (reason: some legacy thing that was not implemented correctly probably)# xl vcpu-listName ID VCPU CPU State Time(s) Affinity (Hard / Soft)Domain-0 0 0 0 -b- 66581.0 0 / all Domain-0 0 1 1 -b- 60248.8 1 / all … Domain-0 0 14 14 -b- 65531.2 14 / all Domain-0 0 15 15 -b- 68970.9 15 / all domU1 3 0 74 -b- 113149.8 all / 0-127 …b1) So, as the VMs are not pinned, it may happen that the same CPU is used for the dom0 and the domU. But why? There are 128vCPUs available, and only 112vCPUs used. Is XEN not smart enough to use all vCPUs? You are mixing up vcpus and physical cpus. A vcpu is a virtualized cpu presented to the guest. It can run on any physical cpu if no pinning etc. is involved. b2) Sometimes I see that 2 vCPUs use the same CPU? How can that be that a CPUs is used concurrently for 2 vCPUs? And why, as there are plenty of vCPUs left?root@cc6-vie:/home/darilion# xl vcpu-list|grep 102Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)domU1 3 67 102 r-- 119730.3 all / 0-127 domU1 3 77 102 -b- 119224.1 all / 0-127 This shows that vcpu 77 is blocked (AKA idle), so it is not waiting for a physical cpu to become free. The Xen credit2 scheduler will prefer to try running only a single vcpu on a core, as long as enough cores are available to achieve that goal. This maximizes performance, but it can result in a situation like the one you are seeing: in case the idle vcpu currently hooked to the same cpu as an already running one wants to run again, it needs to switch cpus. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |