[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 05/10] net: move destructor_arg to the front of sk_buff.
On 04/11/2012 01:20 AM, Eric Dumazet wrote: > On Tue, 2012-04-10 at 12:15 -0700, Alexander Duyck wrote: > >> Actually now that I think about it my concerns go much further than the >> memset. I'm convinced that this is going to cause a pretty significant >> performance regression on multiple drivers, especially on non x86_64 >> architecture. What we have right now on most platforms is a >> skb_shared_info structure in which everything up to and including frag 0 >> is all in one cache line. This gives us pretty good performance for igb >> and ixgbe since that is our common case when jumbo frames are not >> enabled is to split the head and place the data in a page. > I dont understand this split thing for MTU=1500 frames. > > Even using half a page per fragment, each skb : > > needs 2 allocations for sk_buff and skb->head, plus one page alloc / > reference. > > skb->truesize = ksize(skb->head) + sizeof(*skb) + PAGE_SIZE/2 = 512 + > 256 + 2048 = 2816 bytes The number you provide for head is currently only available for 128 byte skb allocations. Anything larger than that will generate a 1K allocation. Also after all of these patches the smallest size you can allocate will be 1K for anything under 504 bytes. The size advantage is actually more for smaller frames. In the case of igb the behaviour is to place anything less than 512 bytes into just the header and to skip using the page. As such we get a much more ideal allocation for small packets. since the truesize is only 1280 in that case. In the case of ixgbe the advantage is more of a cache miss advantage. Ixgbe only receives the data into pages now. I can prefetch the first two cache lines of the page into memory while allocating the skb to receive it. As such we essentially cut the number of cache misses in half versus the old approach which had us generating cache misses on the sk_buff during allocation, and then generating more cache misses again once we received the buffer and can fill out the sk_buff fields. A similar size advantage exists as well, but only for frames 256 bytes or smaller. > With non split you have : > > 2 allocations for sk_buff and skb->head. > > skb->truesize = ksize(skb->head) + sizeof(*skb) = 2048 + 256 = 2304 > bytes > > less overhead and less calls to page allocator... > > This only can benefit if GRO is on, since aggregation can use fragments > and a single sk_buff, instead of a frag_list There is much more than true size involved here. My main argument is that if we are going to align this modified skb_shared_info so that it is aligned on nr_frags we should do it on all architectures, not just x86_64. Thanks, Alex _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |