Hello!
Answering myself to share my findings.
1.
The XEN Wiki performance tips are very old, some are still true, some are just wrong.
2.
Bottleneck was CPU, either by having to few CPUs in the dom0, or by not distributing interrupt workload to multiple CPUs
Some wiki page mentions that in the domU the network handling is always on vCPU 0. This is not true. Current vif interface is multi-queue. So the interrupt of each queue should be pinned to a dedicated vCPU.
this can be done manually, or using irqbalance.
The vif in the dom0 and the physical NICs are also mutli queue and interrupts need to be distributed over vCPUs too -> manual pinning or irqbalance.
Giving the dom0 enough vCPUs to handle the interrupts.
I often found claims that PV is old and slow. The newer PVH is much faster. I can not confirm that. In my testings PV was as fast as PVH.
So what I did:
-
stay with PV
-
increased dom0 vCPU from 4 to 16
-
installed irqbalance in domU
-
(irqbalance was already installed in dom0)
With this changes, the performance of the name server in the domU increased from 170.000 pps to 850.000 pps.
regards
Klaus
Von: Klaus Darilion
Gesendet: Montag, 19. September 2022 23:07
An: 'xen-users@xxxxxxxxxxxxxxxxxxxx' <xen-users@xxxxxxxxxxxxxxxxxxxx>
Betreff: Performance Problems, probably network related
Hello!
Hardware: 2 servers, hardware is more or less identical
Server 1: Ubuntu 20.04 (xen 4.11, Kernel 5.4, Linux Bridge)
AMD EPYC 7702P 64-Core Processor
BCM57416 10G NIC
dom0 has 4 vCPUs
Server 2: VMware ESXi 7.0
AMD EPYC 7543P 32-Core Processor
BCM57414 NetXtreme-E 10Gb/25Gb
VM: Ubuntu 20.04, 8vCPUs. Running Knot DNS name server. I am doing benchmark tests against a VM running either on XEN or VMware.
In both cases no tuning (no cpu pinning …).
The XEN VM: 170.000 qps
The ESX VM: 575.000 qps
So, the XEN VM is much slower than the VMware VM. I thought this is because the XEN VM is "good old" PV. So I repeated the test with type=pvh but the results were the same. I did
some more tests:
When I test with a name server which is CPU intensive, then VMware is only a bit faster. But if the workload is more network-heavy (pps), then VMware is much more faster.
I have read
https://wiki.xenproject.org/wiki/Network_Throughput_and_Performance_Guide but there are so many things and I do not know which of them are still relevant, or where to start.
Are there some general advices where to start debugging and tuning (ie are there know network bottlenecks)? Or is XEN known to be slower than VMware in network througput (then I
could just stop tuning).
Thanks
Klaus
--
Klaus Darilion, Head of Operations
nic.at GmbH, Jakob-Haringer-Straße 8/V
5020 Salzburg, Austria