Xen project Mailing List

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

To: Wei Liu <wei.liu2@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: "Zhangleiqiang (Trump)" <zhangleiqiang@xxxxxxxxxx>

Date: Wed, 3 Dec 2014 14:43:37 +0000

Accept-language: zh-CN, en-US

Cc: "Xiaoding \(B\)" <xiaoding1@xxxxxxxxxx>, Zhuangyuxin <zhuangyuxin@xxxxxxxxxx>, zhangleiqiang <zhangleiqiang@xxxxxxxxx>, "Luohao \(brian\)" <brian.luohao@xxxxxxxxxx>, "Yuzhou \(C\)" <vitas.yuzhou@xxxxxxxxxx>

Delivery-date: Wed, 03 Dec 2014 14:44:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQDh95ofp9r8yimUqbpVEX+wSvTJx8Li7g//+BxYCAAK5KQP//kQwAgAH961A=

Thread-topic: [Xen-devel] Poor network performance between DomU with multiqueue support

> -----Original Message----- > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > Sent: Tuesday, December 02, 2014 11:59 PM > To: Zhangleiqiang (Trump) > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); Xiaoding > (B); Yuzhou (C); Zhuangyuxin > Subject: Re: [Xen-devel] Poor network performance between DomU with > multiqueue support > > On Tue, Dec 02, 2014 at 02:46:36PM +0000, Zhangleiqiang (Trump) wrote: > > Thanks for your reply, Wei. > > > > I do the following testing just now and found the results as follows: > > > > There are three DomUs (4U4G) are running on Host A (6U6G) and one DomU > (4U4G) is running on Host B (6U6G), I send packets from three DomUs to the > DomU on Host B simultaneously. > > > > 1. The "top" output of Host B as follows: > > > > top - 09:42:11 up 1:07, 2 users, load average: 2.46, 1.90, 1.47 > > Tasks: 173 total, 4 running, 169 sleeping, 0 stopped, 0 zombie > > %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.8 > > si, 1.9 st > > %Cpu1 : 0.0 us, 27.0 sy, 0.0 ni, 63.1 id, 0.0 wa, 0.0 hi, 9.5 > > si, 0.4 st > > %Cpu2 : 0.0 us, 90.0 sy, 0.0 ni, 8.3 id, 0.0 wa, 0.0 hi, 1.7 > > si, 0.0 st > > %Cpu3 : 0.4 us, 1.4 sy, 0.0 ni, 95.4 id, 0.0 wa, 0.0 hi, 1.4 > > si, 1.4 st > > %Cpu4 : 0.0 us, 60.2 sy, 0.0 ni, 39.5 id, 0.0 wa, 0.0 hi, 0.3 > > si, 0.0 st > > %Cpu5 : 0.0 us, 2.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 6.9 si, 0.9 > st > > KiB Mem: 4517144 total, 3116480 used, 1400664 free, 876 > buffers > > KiB Swap: 2103292 total, 0 used, 2103292 free. 2374656 > cached Mem > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM > TIME+ COMMAND > > 7440 root 20 0 0 0 0 R 71.10 0.000 > 8:15.38 vif4.0-q3-guest > > 7434 root 20 0 0 0 0 R 59.14 0.000 > 9:00.58 vif4.0-q0-guest > > 18 root 20 0 0 0 0 R 33.89 0.000 > 2:35.06 ksoftirqd/2 > > 28 root 20 0 0 0 0 S 20.93 0.000 > 3:01.81 ksoftirqd/4 > > > > > > As shown above, only two netback related processes (vif4.0-*) are running > with high cpu usage, and the other 2 netback processes are idle. The "ps" > result of vif4.0-* processes as follows: > > > > root 7434 50.5 0.0 0 0 ? R 09:23 11:29 > [vif4.0-q0-guest] > > root 7435 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q0-deall] > > root 7436 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q1-guest] > > root 7437 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q1-deall] > > root 7438 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q2-guest] > > root 7439 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q2-deall] > > root 7440 48.1 0.0 0 0 ? R 09:23 10:55 > [vif4.0-q3-guest] > > root 7441 0.0 0.0 0 0 ? S 09:23 0:00 > [vif4.0-q3-deall] > > root 9724 0.0 0.0 9244 1520 pts/0 S+ 09:46 0:00 > grep --color=auto > > > > > > 2. The "rx" related content in /proc/interupts in receiver DomU (on Host B): > > > > 73: 2 0 2925405 0 > > xen-dyn-event > eth0-q0-rx > > 75: 43 93 0 118 > > xen-dyn-event > eth0-q1-rx > > 77: 2 3376 14 1983 > > xen-dyn-event > eth0-q2-rx > > 79: 2414666 0 9 0 > > xen-dyn-event > eth0-q3-rx > > > > As shown above, it seems like that only q0 and q3 handles the interrupt > triggered by packet receving. > > > > Any advise? Thanks. > > Netback selects queue based on the return value of skb_get_queue_mapping. > The queue mapping is set by core driver or ndo_select_queue (if specified by > individual driver). In this case netback doesn't have its implementation of > ndo_select_queue, so it's up to core driver to decide which queue to dispatch > the packet to. I think you need to inspect why Dom0 only steers traffic to > these two queues but not all of them. > > Don't know which utility is handy for this job. Probably tc(8) is useful? Thanks Wei. I think the reason for the above results that only two netback/netfront processes works hard is the queue select method. I have tried to send packets from multiple host/vm to a vm, and all of the netback/netfront processes are running with high cpu usage a few times. However, I find another issue. Even using 6 queues and making sure that all of these 6 netback processes running with high cpu usage (indeed, any of it running with 87% cpu usage), the whole VM receive throughout is not very higher than results when using 4 queues. The results are from 4.5Gbps to 5.04 Gbps using TCP with 512 bytes length and 4.3Gbps to 5.78Gbps using TCP with 1460 bytes length. According to the testing result from WIKI: http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing, The VM receive throughput is also more lower than VM transmit. I am wondering why the VM receive throughout cannot be up to 8-10Gbps as VM transmit under multi-queue? I also tried to send packets directly from Local Dom0 to DomU, the DomU receive throughput can reach about 8-12Gbps, so I am also wondering why transmitting packets from Dom0 to Remote DomU can only reach about 4-5Gbps throughout? > Wei. > > > ---------- > > zhangleiqiang (Trump) > > > > Best Regards > > > > > > > -----Original Message----- > > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > > > Sent: Tuesday, December 02, 2014 8:12 PM > > > To: Zhangleiqiang (Trump) > > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); > > > Xiaoding (B); Yuzhou (C); Zhuangyuxin > > > Subject: Re: [Xen-devel] Poor network performance between DomU with > > > multiqueue support > > > > > > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump) wrote: > > > > > -----Original Message----- > > > > > From: xen-devel-bounces@xxxxxxxxxxxxx > > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei Liu > > > > > Sent: Tuesday, December 02, 2014 7:02 PM > > > > > To: zhangleiqiang > > > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx > > > > > Subject: Re: [Xen-devel] Poor network performance between DomU > > > > > with multiqueue support > > > > > > > > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote: > > > > > > Hi, all > > > > > > I am testing the performance of xen netfront-netback > > > > > > driver that with > > > > > multi-queues support. The throughput from domU to remote dom0 is > > > > > 9.2Gb/s, but the throughput from domU to remote domU is only > > > > > 3.6Gb/s, I think the bottleneck is the throughput from dom0 to > > > > > local domU. However, we have done some testing and found the > > > > > throughput from dom0 to local domU is 5.8Gb/s. > > > > > > And if we send packets from one DomU to other 3 DomUs on > > > > > > different > > > > > host simultaneously, the sum of throughout can reach 9Gbps. It > > > > > seems like the bottleneck is the receiver? > > > > > > After some analysis, I found that even the max_queue of > > > > > > netfront/back > > > > > is set to 4, there are some strange results as follows: > > > > > > 1. In domU, only one rx queue deal with softirq > > > > > > > > > > Try to bind irq to different vcpus? > > > > > > > > Do you mean we try to bind irq to different vcpus in DomU? I will try it > now. > > > > > > > > > > Yes. Given the fact that you have two backend threads running while > > > only one DomU vcpu is busy, it smells like misconfiguration in DomU. > > > > > > If this phenomenon persists after correctly binding irqs, you might > > > want to check traffic is steering correctly to different queues. > > > > > > > > > > > > > > 2. In dom0, only two netback queues process are scheduled, > > > > > > other two > > > > > process aren't scheduled. > > > > > > > > > > How many Dom0 vcpu do you have? If it only has two then there > > > > > will only be two processes running at a time. > > > > > > > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU running > > > > in > > > Dom0 and so four netback processes are running in Dom0 (because the > > > max_queue param of netback kernel module is set to 4). > > > > The phenomenon is that only 2 of these four netback process were > > > > running > > > with about 70% cpu usage, and another two use little CPU. > > > > Is there a hash algorithm to determine which netback process to > > > > handle the > > > input packet? > > > > > > > > > > I think that's whatever default algorithm Linux kernel is using. > > > > > > We don't currently support other algorithms. > > > > > > Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.