[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Poor network performance between DomU with multiqueue support
On 04/12/14 14:31, Zhangleiqiang (Trump) wrote: It's quite sure that the grant copy is the bottleneck for a single queue RX traffic. I don't know what's the plan to help that, currently only a faster CPU can help you with that.-----Original Message----- From: Zoltan Kiss [mailto:zoltan.kiss@xxxxxxxxxx] Sent: Thursday, December 04, 2014 9:35 PM To: Zhangleiqiang (Trump); Wei Liu; xen-devel@xxxxxxxxxxxxx Cc: Xiaoding (B); Zhuangyuxin; zhangleiqiang; Luohao (brian); Yuzhou (C) Subject: Re: [Xen-devel] Poor network performance between DomU with multiqueue support On 04/12/14 12:09, Zhangleiqiang (Trump) wrote:I think that's expected, because guest RX data path still uses grant_copy whileguest TX uses grant_map to do zero-copy transmit.As I understand, the RX process is as follows: 1. Phy NIC receive packet 2. XEN Hypervisor trigger interrupt to Dom0 3. Dom0' s NIC driver do the "RX" operation, and the packet is stored into SKB which is also owned/shared with netbackNot that easy. There is something between the NIC driver and netback which directs the packets, e.g. the old bridge driver, ovs, or the IP stack of the kernel.4. NetBack notify netfront through event channel that a packet is receiving 5. Netfront grant a buffer for receiving and notify netback the GR (if using grant-resue mechanism, netfront just notify the GR to netback) through IO RingIt looks a bit confusing in the code, but netfront put "requests" on the ring buffer, which contains the grant ref of the guest page where the backend can copy. When the packet comes, netback consumes these requests and send back a response telling the guest the grant copy of the packet finished, it can start handling the data. (sending a response means it's placing a response in the ring and trigger the event channel) And ideally netback should always have requests in the ring, so it doesn't have to wait for the guest to fill it up.6. NetBack do the grant_copy to copy packet from its SKB to the buffer referenced by GR, and notify netfront through event channel 7. Netfront copy the data from buffer to user-level app's SKBOr wherever that SKB should go, yes. Like with any received packet on a real network interface.Am I right? Why not using zero-copy transmit in guest RX data pash too ?Because that means you are mapping that memory to the guest, and you won't have any guarantee when the guest will release them. And netback can't just unmap them forcibly after a timeout, because finding a correct timeout value would be quite impossible. A malicious/buggy/overloaded guest can hold on to Dom0 memory indefinitely, but it even becomes worse if the memory came from another guest: you can't shutdown that guest for example, until all its memory is returned to him.Thanks for your detailed explanation about RX data path, I have get it, :) About the issue that poor performance between DomU to DomU, but high throughout between Dom0 to remote Dom0/DomU mentioned in my previous mail, do you have any idea about it? I am wondering if netfront/netback can be optimized to reach the 10Gbps throughout between DomUs running on different hosts connected with 10GE network. Currently, it seems like the TX is not the bottleneck, because we can reach the aggregate throughout of 9Gbps when sending packets from one DomU to other 3 DomUs running on different host. So I think the bottleneck maybe the RX, are you agreed with me? I am wondering what is the main reason that prevent RX to reach the higher throughout? Compared to KVM+virtio+vhost, which can reach high throughout, the RX has extra grantcopy operation, and the grantcopy operation may be one reason for it. Do you have any idea about it too? Regards, Zoli _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |