[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor network performance between DomU with multiqueue support



On Tue, Dec 02, 2014 at 02:46:36PM +0000, Zhangleiqiang (Trump) wrote:
> Thanks for your reply, Wei.
> 
> I do the following testing just now and found the results as follows:
> 
> There are three DomUs (4U4G) are running on Host A (6U6G) and one DomU (4U4G) 
> is running on Host B (6U6G), I send packets from three DomUs to the DomU on 
> Host B simultaneously. 
> 
> 1. The "top" output of Host B as follows:
> 
> top - 09:42:11 up  1:07,  2 users,  load average: 2.46, 1.90, 1.47
> Tasks: 173 total,   4 running, 169 sleeping,   0 stopped,   0 zombie
> %Cpu0  :  0.0 us,  0.0 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.8 si,  1.9 
> st
> %Cpu1  :  0.0 us, 27.0 sy,  0.0 ni, 63.1 id,  0.0 wa,  0.0 hi,  9.5 si,  0.4 
> st
> %Cpu2  :  0.0 us, 90.0 sy,  0.0 ni,  8.3 id,  0.0 wa,  0.0 hi,  1.7 si,  0.0 
> st
> %Cpu3  :  0.4 us,  1.4 sy,  0.0 ni, 95.4 id,  0.0 wa,  0.0 hi,  1.4 si,  1.4 
> st
> %Cpu4  :  0.0 us, 60.2 sy,  0.0 ni, 39.5 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 
> st
> %Cpu5  :  0.0 us,  2.8 sy,  0.0 ni, 89.4 id,  0.0 wa,  0.0 hi,  6.9 si,  0.9 
> st
> KiB Mem:   4517144 total,  3116480 used,  1400664 free,      876 buffers
> KiB Swap:  2103292 total,        0 used,  2103292 free.  2374656 cached Mem
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
>                                                           
>  7440 root      20   0       0      0      0 R 71.10 0.000   8:15.38 
> vif4.0-q3-guest                                                    
>  7434 root      20   0       0      0      0 R 59.14 0.000   9:00.58 
> vif4.0-q0-guest                                                    
>    18 root      20   0       0      0      0 R 33.89 0.000   2:35.06 
> ksoftirqd/2                                                        
>    28 root      20   0       0      0      0 S 20.93 0.000   3:01.81 
> ksoftirqd/4
> 
> 
> As shown above, only two netback related processes (vif4.0-*) are running 
> with high cpu usage, and the other 2 netback processes are idle. The "ps" 
> result of vif4.0-* processes as follows:
> 
> root      7434 50.5  0.0      0     0 ?        R    09:23  11:29 
> [vif4.0-q0-guest]
> root      7435  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q0-deall]
> root      7436  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q1-guest]
> root      7437  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q1-deall]
> root      7438  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q2-guest]
> root      7439  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q2-deall]
> root      7440 48.1  0.0      0     0 ?        R    09:23  10:55 
> [vif4.0-q3-guest]
> root      7441  0.0  0.0      0     0 ?        S    09:23   0:00 
> [vif4.0-q3-deall]
> root      9724  0.0  0.0   9244  1520 pts/0    S+   09:46   0:00 grep 
> --color=auto
> 
> 
> 2. The "rx" related content in /proc/interupts in receiver DomU (on Host B):
> 
> 73:   2               0               2925405         0                       
> xen-dyn-event           eth0-q0-rx
> 75:   43              93              0                       118             
>         xen-dyn-event           eth0-q1-rx
> 77:   2               3376    14                      1983            
> xen-dyn-event           eth0-q2-rx
> 79:   2414666 0               9                       0                       
> xen-dyn-event           eth0-q3-rx
> 
> As shown above, it seems like that only q0 and q3 handles the interrupt 
> triggered by packet receving.
> 
> Any advise? Thanks.

Netback selects queue based on the return value of
skb_get_queue_mapping. The queue mapping is set by core driver or
ndo_select_queue (if specified by individual driver). In this case
netback doesn't have its implementation of ndo_select_queue, so it's up
to core driver to decide which queue to dispatch the packet to.  I
think you need to inspect why Dom0 only steers traffic to these two
queues but not all of them.

Don't know which utility is handy for this job. Probably tc(8) is
useful?

Wei.

> ----------
> zhangleiqiang (Trump)
> 
> Best Regards
> 
> 
> > -----Original Message-----
> > From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> > Sent: Tuesday, December 02, 2014 8:12 PM
> > To: Zhangleiqiang (Trump)
> > Cc: Wei Liu; zhangleiqiang; xen-devel@xxxxxxxxxxxxx; Luohao (brian); 
> > Xiaoding
> > (B); Yuzhou (C); Zhuangyuxin
> > Subject: Re: [Xen-devel] Poor network performance between DomU with
> > multiqueue support
> > 
> > On Tue, Dec 02, 2014 at 11:50:59AM +0000, Zhangleiqiang (Trump) wrote:
> > > > -----Original Message-----
> > > > From: xen-devel-bounces@xxxxxxxxxxxxx
> > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of Wei Liu
> > > > Sent: Tuesday, December 02, 2014 7:02 PM
> > > > To: zhangleiqiang
> > > > Cc: wei.liu2@xxxxxxxxxx; xen-devel@xxxxxxxxxxxxx
> > > > Subject: Re: [Xen-devel] Poor network performance between DomU with
> > > > multiqueue support
> > > >
> > > > On Tue, Dec 02, 2014 at 04:30:49PM +0800, zhangleiqiang wrote:
> > > > > Hi, all
> > > > >     I am testing the performance of xen netfront-netback driver
> > > > > that with
> > > > multi-queues support. The throughput from domU to remote dom0 is
> > > > 9.2Gb/s, but the throughput from domU to remote domU is only
> > > > 3.6Gb/s, I think the bottleneck is the throughput from dom0 to local
> > > > domU. However, we have done some testing and found the throughput
> > > > from dom0 to local domU is 5.8Gb/s.
> > > > >     And if we send packets from one DomU to other 3 DomUs on
> > > > > different
> > > > host simultaneously, the sum of throughout can reach 9Gbps. It seems
> > > > like the bottleneck is the receiver?
> > > > >     After some analysis, I found that even the max_queue of
> > > > > netfront/back
> > > > is set to 4, there are some strange results as follows:
> > > > >     1. In domU, only one rx queue deal with softirq
> > > >
> > > > Try to bind irq to different vcpus?
> > >
> > > Do you mean we try to bind irq to different vcpus in DomU? I will try it 
> > > now.
> > >
> > 
> > Yes. Given the fact that you have two backend threads running while only one
> > DomU vcpu is busy, it smells like misconfiguration in DomU.
> > 
> > If this phenomenon persists after correctly binding irqs, you might want to
> > check traffic is steering correctly to different queues.
> > 
> > > >
> > > > >     2. In dom0, only two netback queues process are scheduled,
> > > > > other two
> > > > process aren't scheduled.
> > > >
> > > > How many Dom0 vcpu do you have? If it only has two then there will
> > > > only be two processes running at a time.
> > >
> > > Dom0 has 6 vcpus, and 6G memory. There are only one DomU running in
> > Dom0 and so four netback processes are running in Dom0 (because the
> > max_queue param of netback kernel module is set to 4).
> > > The phenomenon is that only 2 of these four netback process were running
> > with about 70% cpu usage, and another two use little CPU.
> > > Is there a hash algorithm to determine which netback process to handle the
> > input packet?
> > >
> > 
> > I think that's whatever default algorithm Linux kernel is using.
> > 
> > We don't currently support other algorithms.
> > 
> > Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.