Xen project Mailing List

Re: [Xen-devel] Poor network performance between DomU with multiqueue support

On Wed, Dec 03, 2014 at 02:43:37PM +0000, Zhangleiqiang (Trump) wrote: > > -----Original Message----- > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > > Sent: Tuesday, December 02, 2014 11:59 PM > > To: Zhangleiqiang (Trump) > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); > > Xiaoding > > (B); Yuzhou (C); Zhuangyuxin > > Subject: Re: [Xen-devel] Poor network performance between DomU with > > multiqueue support > > > > On Tue, Dec 02, 2014 at 02:46:36PM +0000, Zhangleiqiang (Trump) wrote: > > > Thanks for your reply, Wei. > > > > > > I do the following testing just now and found the results as follows: > > > > > > There are three DomUs (4U4G) are running on Host A (6U6G) and one DomU > > (4U4G) is running on Host B (6U6G), I send packets from three DomUs to the > > DomU on Host B simultaneously. > > > > > > 1. The "top" output of Host B as follows: > > > > > > top - 09:42:11 up 1:07, 2 users, load average: 2.46, 1.90, 1.47 > > > Tasks: 173 total, 4 running, 169 sleeping, 0 stopped, 0 zombie > > > %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.8 > > > si, 1.9 st > > > %Cpu1 : 0.0 us, 27.0 sy, 0.0 ni, 63.1 id, 0.0 wa, 0.0 hi, 9.5 > > > si, 0.4 st > > > %Cpu2 : 0.0 us, 90.0 sy, 0.0 ni, 8.3 id, 0.0 wa, 0.0 hi, 1.7 > > > si, 0.0 st > > > %Cpu3 : 0.4 us, 1.4 sy, 0.0 ni, 95.4 id, 0.0 wa, 0.0 hi, 1.4 > > > si, 1.4 st > > > %Cpu4 : 0.0 us, 60.2 sy, 0.0 ni, 39.5 id, 0.0 wa, 0.0 hi, 0.3 > > > si, 0.0 st > > > %Cpu5 : 0.0 us, 2.8 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 6.9 si, > > > 0.9 > > st > > > KiB Mem: 4517144 total, 3116480 used, 1400664 free, 876 > > buffers > > > KiB Swap: 2103292 total, 0 used, 2103292 free. 2374656 > > cached Mem > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM > > TIME+ COMMAND > > > 7440 root 20 0 0 0 0 R 71.10 0.000 > > 8:15.38 vif4.0-q3-guest > > > 7434 root 20 0 0 0 0 R 59.14 0.000 > > 9:00.58 vif4.0-q0-guest > > > 18 root 20 0 0 0 0 R 33.89 0.000 > > 2:35.06 ksoftirqd/2 > > > 28 root 20 0 0 0 0 S 20.93 0.000 > > 3:01.81 ksoftirqd/4 > > > > > > > > > As shown above, only two netback related processes (vif4.0-*) are running > > with high cpu usage, and the other 2 netback processes are idle. The "ps" > > result of vif4.0-* processes as follows: > > > > > > root 7434 50.5 0.0 0 0 ? R 09:23 11:29 > > [vif4.0-q0-guest] > > > root 7435 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q0-deall] > > > root 7436 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q1-guest] > > > root 7437 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q1-deall] > > > root 7438 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q2-guest] > > > root 7439 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q2-deall] > > > root 7440 48.1 0.0 0 0 ? R 09:23 10:55 > > [vif4.0-q3-guest] > > > root 7441 0.0 0.0 0 0 ? S 09:23 0:00 > > [vif4.0-q3-deall] > > > root 9724 0.0 0.0 9244 1520 pts/0 S+ 09:46 0:00 > > grep --color=auto > > > > > > > > > 2. The "rx" related content in /proc/interupts in receiver DomU (on Host > > > B): > > > > > > 73: 2 0 2925405 0 > > > xen-dyn-event > > eth0-q0-rx > > > 75: 43 93 0 118 > > > xen-dyn-event > > eth0-q1-rx > > > 77: 2 3376 14 1983 > > > xen-dyn-event > > eth0-q2-rx > > > 79: 2414666 0 9 0 > > > xen-dyn-event > > eth0-q3-rx > > > > > > As shown above, it seems like that only q0 and q3 handles the interrupt > > triggered by packet receving. > > > > > > Any advise? Thanks. > > > > Netback selects queue based on the return value of skb_get_queue_mapping. > > The queue mapping is set by core driver or ndo_select_queue (if specified by > > individual driver). In this case netback doesn't have its implementation of > > ndo_select_queue, so it's up to core driver to decide which queue to > > dispatch > > the packet to. I think you need to inspect why Dom0 only steers traffic to > > these two queues but not all of them. > > > > Don't know which utility is handy for this job. Probably tc(8) is useful? > > Thanks Wei. > > I think the reason for the above results that only two > netback/netfront processes works hard is the queue select method. I > have tried to send packets from multiple host/vm to a vm, and all of > the netback/netfront processes are running with high cpu usage a few > times. > A few times? You might want to check some patches to rework RX stall detection by David Vrabel that went in after 3.16. > However, I find another issue. Even using 6 queues and making sure > that all of these 6 netback processes running with high cpu usage > (indeed, any of it running with 87% cpu usage), the whole VM receive > throughout is not very higher than results when using 4 queues. The > results are from 4.5Gbps to 5.04 Gbps using TCP with 512 bytes length > and 4.3Gbps to 5.78Gbps using TCP with 1460 bytes length. > I would like to ask if you're still using 4U4G (4 CPU 4 G?) configuration? If so, please make sure there are at least the same number of vcpus as queues. > According to the testing result from WIKI: > http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing, > The VM receive throughput is also more lower than VM transmit. > I think that's expected, because guest RX data path still uses grant_copy while guest TX uses grant_map to do zero-copy transmit. > I am wondering why the VM receive throughout cannot be up to 8-10Gbps > as VM transmit under multi-queue? I also tried to send packets > directly from Local Dom0 to DomU, the DomU receive throughput can > reach about 8-12Gbps, so I am also wondering why transmitting packets > from Dom0 to Remote DomU can only reach about 4-5Gbps throughout? If data is from Dom0 to DomU then SKB is probably not fragmented by network stack. You can use tcpdump to check that. Wei. > > > Wei. > > > > > ---------- > > > zhangleiqiang (Trump) > > > > > > Best Regards > > > > > > > > > > -----Original Message----- > > > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx] > > > > Sent: Tuesday, December 02, 2014 8:12 PM > > > > To: Zhangleiqiang (Trump) > > > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); > > > > Xiaoding (B); Yuzhou (C); Zhuangyuxin > > > > Subject: Re: [Xen-devel] Poor network performance between DomU with > > > > multiqueue support > > > > > > > > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump) wrote: > > > > > > -----Original Message----- > > > > > > From: xen-devel-bounces@xxxxxxxxxxxxx > > > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei Liu > > > > > > Sent: Tuesday, December 02, 2014 7:02 PM > > > > > > To: zhangleiqiang > > > > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx > > > > > > Subject: Re: [Xen-devel] Poor network performance between DomU > > > > > > with multiqueue support > > > > > > > > > > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote: > > > > > > > Hi, all > > > > > > > I am testing the performance of xen netfront-netback > > > > > > > driver that with > > > > > > multi-queues support. The throughput from domU to remote dom0 is > > > > > > 9.2Gb/s, but the throughput from domU to remote domU is only > > > > > > 3.6Gb/s, I think the bottleneck is the throughput from dom0 to > > > > > > local domU. However, we have done some testing and found the > > > > > > throughput from dom0 to local domU is 5.8Gb/s. > > > > > > > And if we send packets from one DomU to other 3 DomUs on > > > > > > > different > > > > > > host simultaneously, the sum of throughout can reach 9Gbps. It > > > > > > seems like the bottleneck is the receiver? > > > > > > > After some analysis, I found that even the max_queue of > > > > > > > netfront/back > > > > > > is set to 4, there are some strange results as follows: > > > > > > > 1. In domU, only one rx queue deal with softirq > > > > > > > > > > > > Try to bind irq to different vcpus? > > > > > > > > > > Do you mean we try to bind irq to different vcpus in DomU? I will try > > > > > it > > now. > > > > > > > > > > > > > Yes. Given the fact that you have two backend threads running while > > > > only one DomU vcpu is busy, it smells like misconfiguration in DomU. > > > > > > > > If this phenomenon persists after correctly binding irqs, you might > > > > want to check traffic is steering correctly to different queues. > > > > > > > > > > > > > > > > > 2. In dom0, only two netback queues process are scheduled, > > > > > > > other two > > > > > > process aren't scheduled. > > > > > > > > > > > > How many Dom0 vcpu do you have? If it only has two then there > > > > > > will only be two processes running at a time. > > > > > > > > > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU running > > > > > in > > > > Dom0 and so four netback processes are running in Dom0 (because the > > > > max_queue param of netback kernel module is set to 4). > > > > > The phenomenon is that only 2 of these four netback process were > > > > > running > > > > with about 70% cpu usage, and another two use little CPU. > > > > > Is there a hash algorithm to determine which netback process to > > > > > handle the > > > > input packet? > > > > > > > > > > > > > I think that's whatever default algorithm Linux kernel is using. > > > > > > > > We don't currently support other algorithms. > > > > > > > > Wei. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.