[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor network performance between DomU with multiqueue support



> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> Sent: Tuesday, December 02, 2014 11:59 PM
> To: Zhangleiqiang (Trump)
> Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); Xiaoding
> (B); Yuzhou (C); Zhuangyuxin
> Subject: Re: [Xen-devel] Poor network performance between DomU with
> multiqueue support
> 
> On Tue, Dec 02, 2014 at 02:46:36PM +0000, Zhangleiqiang (Trump) wrote:
> > Thanks for your reply, Wei.
> >
> > I do the following testing just now and found the results as follows:
> >
> > There are three DomUs (4U4G) are running on Host A (6U6G) and one DomU
> (4U4G) is running on Host B (6U6G), I send packets from three DomUs to the
> DomU on Host B simultaneously.
> >
> > 1. The "top" output of Host B as follows:
> >
> > top - 09:42:11 up  1:07,  2 users,  load average: 2.46, 1.90, 1.47
> > Tasks: 173 total,   4 running, 169 sleeping,   0 stopped,   0 zombie
> > %Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.8
> > si,  1.9 st
> > %Cpu1  :  0.0 us, 27.0 sy,  0.0 ni, 63.1 id,  0.0 wa,  0.0 hi,  9.5
> > si,  0.4 st
> > %Cpu2  :  0.0 us, 90.0 sy,  0.0 ni,  8.3 id,  0.0 wa,  0.0 hi,  1.7
> > si,  0.0 st
> > %Cpu3  :  0.4 us,  1.4 sy,  0.0 ni, 95.4 id,  0.0 wa,  0.0 hi,  1.4
> > si,  1.4 st
> > %Cpu4  :  0.0 us, 60.2 sy,  0.0 ni, 39.5 id,  0.0 wa,  0.0 hi,  0.3
> > si,  0.0 st
> > %Cpu5  :  0.0 us,  2.8 sy,  0.0 ni, 89.4 id,  0.0 wa,  0.0 hi,  6.9 si,  0.9
> st
> > KiB Mem:   4517144 total,  3116480 used,  1400664 free,      876
> buffers
> > KiB Swap:  2103292 total,        0 used,  2103292 free.  2374656
> cached Mem
> >
> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
> TIME+ COMMAND
> >  7440 root      20   0       0      0      0 R 71.10 0.000
> 8:15.38 vif4.0-q3-guest
> >  7434 root      20   0       0      0      0 R 59.14 0.000
> 9:00.58 vif4.0-q0-guest
> >    18 root      20   0       0      0      0 R 33.89 0.000
> 2:35.06 ksoftirqd/2
> >    28 root      20   0       0      0      0 S 20.93 0.000
> 3:01.81 ksoftirqd/4
> >
> >
> > As shown above, only two netback related processes (vif4.0-*) are running
> with high cpu usage, and the other 2 netback processes are idle. The "ps"
> result of vif4.0-* processes as follows:
> >
> > root      7434 50.5  0.0      0     0 ?        R    09:23  11:29
> [vif4.0-q0-guest]
> > root      7435  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q0-deall]
> > root      7436  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q1-guest]
> > root      7437  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q1-deall]
> > root      7438  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q2-guest]
> > root      7439  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q2-deall]
> > root      7440 48.1  0.0      0     0 ?        R    09:23  10:55
> [vif4.0-q3-guest]
> > root      7441  0.0  0.0      0     0 ?        S    09:23   0:00
> [vif4.0-q3-deall]
> > root      9724  0.0  0.0   9244  1520 pts/0    S+   09:46   0:00
> grep --color=auto
> >
> >
> > 2. The "rx" related content in /proc/interupts in receiver DomU (on Host B):
> >
> > 73:         2               0               2925405         0               
> >         xen-dyn-event
>       eth0-q0-rx
> > 75:         43              93              0                       118     
> >                 xen-dyn-event
>       eth0-q1-rx
> > 77:         2               3376    14                      1983            
> > xen-dyn-event
>       eth0-q2-rx
> > 79:         2414666 0               9                       0               
> >         xen-dyn-event
>       eth0-q3-rx
> >
> > As shown above, it seems like that only q0 and q3 handles the interrupt
> triggered by packet receving.
> >
> > Any advise? Thanks.
> 
> Netback selects queue based on the return value of skb_get_queue_mapping.
> The queue mapping is set by core driver or ndo_select_queue (if specified by
> individual driver). In this case netback doesn't have its implementation of
> ndo_select_queue, so it's up to core driver to decide which queue to dispatch
> the packet to.  I think you need to inspect why Dom0 only steers traffic to
> these two queues but not all of them.
> 
> Don't know which utility is handy for this job. Probably tc(8) is useful?

Thanks Wei.

I think the reason for the above results that only two netback/netfront 
processes works hard is the queue select method. I have tried to send packets 
from multiple host/vm to a vm, and all of the netback/netfront processes are 
running with high cpu usage a few times.

However, I find another issue. Even using 6 queues and making sure that all of 
these 6 netback processes running with high cpu usage (indeed, any of it 
running with 87% cpu usage), the whole VM receive throughout is not very higher 
than results when using 4 queues. The results are from 4.5Gbps to 5.04 Gbps 
using TCP with 512 bytes length and 4.3Gbps to 5.78Gbps using TCP with 1460 
bytes length.

According to the testing result from WIKI: 
http://wiki.xen.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing,
 The VM receive throughput is also more lower than VM transmit. 

I am wondering why the VM receive throughout cannot be up to 8-10Gbps as VM 
transmit under multi-queue?  I also tried to send packets directly from Local 
Dom0 to DomU, the DomU receive throughput can reach about 8-12Gbps, so I am 
also wondering why transmitting packets from Dom0 to Remote DomU can only reach 
about 4-5Gbps throughout?

> Wei.
> 
> > ----------
> > zhangleiqiang (Trump)
> >
> > Best Regards
> >
> >
> > > -----Original Message-----
> > > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> > > Sent: Tuesday, December 02, 2014 8:12 PM
> > > To: Zhangleiqiang (Trump)
> > > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian);
> > > Xiaoding (B); Yuzhou (C); Zhuangyuxin
> > > Subject: Re: [Xen-devel] Poor network performance between DomU with
> > > multiqueue support
> > >
> > > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump) wrote:
> > > > > -----Original Message-----
> > > > > From: xen-devel-bounces@xxxxxxxxxxxxx
> > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei Liu
> > > > > Sent: Tuesday, December 02, 2014 7:02 PM
> > > > > To: zhangleiqiang
> > > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx
> > > > > Subject: Re: [Xen-devel] Poor network performance between DomU
> > > > > with multiqueue support
> > > > >
> > > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote:
> > > > > > Hi, all
> > > > > >     I am testing the performance of xen netfront-netback
> > > > > > driver that with
> > > > > multi-queues support. The throughput from domU to remote dom0 is
> > > > > 9.2Gb/s, but the throughput from domU to remote domU is only
> > > > > 3.6Gb/s, I think the bottleneck is the throughput from dom0 to
> > > > > local domU. However, we have done some testing and found the
> > > > > throughput from dom0 to local domU is 5.8Gb/s.
> > > > > >     And if we send packets from one DomU to other 3 DomUs on
> > > > > > different
> > > > > host simultaneously, the sum of throughout can reach 9Gbps. It
> > > > > seems like the bottleneck is the receiver?
> > > > > >     After some analysis, I found that even the max_queue of
> > > > > > netfront/back
> > > > > is set to 4, there are some strange results as follows:
> > > > > >     1. In domU, only one rx queue deal with softirq
> > > > >
> > > > > Try to bind irq to different vcpus?
> > > >
> > > > Do you mean we try to bind irq to different vcpus in DomU? I will try it
> now.
> > > >
> > >
> > > Yes. Given the fact that you have two backend threads running while
> > > only one DomU vcpu is busy, it smells like misconfiguration in DomU.
> > >
> > > If this phenomenon persists after correctly binding irqs, you might
> > > want to check traffic is steering correctly to different queues.
> > >
> > > > >
> > > > > >     2. In dom0, only two netback queues process are scheduled,
> > > > > > other two
> > > > > process aren't scheduled.
> > > > >
> > > > > How many Dom0 vcpu do you have? If it only has two then there
> > > > > will only be two processes running at a time.
> > > >
> > > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU running
> > > > in
> > > Dom0 and so four netback processes are running in Dom0 (because the
> > > max_queue param of netback kernel module is set to 4).
> > > > The phenomenon is that only 2 of these four netback process were
> > > > running
> > > with about 70% cpu usage, and another two use little CPU.
> > > > Is there a hash algorithm to determine which netback process to
> > > > handle the
> > > input packet?
> > > >
> > >
> > > I think that's whatever default algorithm Linux kernel is using.
> > >
> > > We don't currently support other algorithms.
> > >
> > > Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.