Xen project Mailing List

Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Matt Wilson <msw@xxxxxxxxxx>

Date: Tue, 18 Dec 2012 11:43:50 -0800

Cc: "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>, "Palagummi, Siva" <Siva.Palagummi@xxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Delivery-date: Tue, 18 Dec 2012 19:44:28 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Tue, Dec 18, 2012 at 10:02:48AM +0000, Ian Campbell wrote: > On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote: > > On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote: [...] > > > Do you mean the ring or the actual buffers? > > > > Sorry, the actual buffers. > > > > > The current code tries to coalesce multiple small frags/heads because it > > > is usually trivial but doesn't try too hard with multiple larger frags, > > > since they take up most of a page by themselves anyway. I suppose this > > > does waste a bit of buffer space and therefore could take more ring > > > slots, but it's not clear to me how much this matters in practice (it > > > might be tricky to measure this with any realistic workload). > > > > In the case where we're consistently handling large heads (like when > > using a MTU value of 9000 for streaming traffic), we're wasting 1/3 of > > the available buffers. > > Sorry if I missed this earlier in the thread, but how do we end up > wasting so much? I see SKBs with: skb_headlen(skb) == 8157 offset_in_page(skb->data) == 64 when handling long streaming ingress flows from ixgbe with MTU (on the NIC and both sides of the VIF) set to 9000. When all the SKBs making up the flow have the above property, xen-netback uses 3 pages instead of two. The first buffer gets 4032 bytes copied into it. The next buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes copied into it. See this post in the archives for a more detailed walk through netbk_gop_frag_copy(): http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html > For an skb with 9000 bytes in the linear area, which must necessarily be > contiguous, do we not fill the first two page sized buffers completely? > The remaining 808 bytes must then have its own buffer. Hrm, I suppose > that's about 27% wasted over the three pages. If we are doing something > worse than that though then we have a bug somewhere (nb: older netbacks > would only fill the first 2048 bytes of each buffer, the wastage is > presumably phenomenal in that case ;-), MAX_BUFFER_OFFSET is now == > PAGE_SIZE though) Sorry, I should have said 8157 bytes for my example. :-) > Unless I've misunderstood this thread and we are considering packing > data from multiple skbs into a single buffer? (i.e. the remaining > 4096-808=3288 bytes in the third buffer above would contain data for the > next skb). Does the ring protocol even allow for that possibility? It > seems like a path to complexity to me. No, I'm not suggesting that we come up with an extension to pack the next skb into any remaining space left over from the current one. I agree that would make for a lot of complexity managing the ratio of meta slots to buffers, etc. > > > The cost of splitting a copy into two should be low though, the copies > > > are already batched into a single hypercall and I'd expect things to be > > > mostly dominated by the data copy itself rather than the setup of each > > > individual op, which would argue for splitting a copy in two if that > > > helps fill the buffers. > > > > That was my thought as well. We're testing a patch that does just this > > now. > > > > > The flip side is that once you get past the headers etc the paged frags > > > likely tend to either bits and bobs (fine) or mostly whole pages. In the > > > whole pages case trying to fill the buffers will result in every copy > > > getting split. My gut tells me that the whole pages case probably > > > dominates, but I'm not sure what the real world impact of splitting all > > > the copies would be. > > > > Right, I'm less concerned about the paged frags. It might make sense > > to skip some space so that the copying can be page aligned. I suppose > > it depends on how many defferent pages are in the list, and what the > > total size is. > > > > In practice I'd think it would be rare to see a paged SKB for ingress > > traffic to domUs unless there is significant intra-host communication > > (dom0->domU, domU->domU). When domU ingress traffic is originating > > from an Ethernet device it shouldn't be paged. Paged SKBs would come > > into play when a SKB is formed for transmit on an egress device that > > is SG-capable. Or am I misunderstanding how paged SKBs are used these > > days? > > I think it depends on the hardware and/or driver. IIRC some devices push > down frag zero into the device for RX DMA and then share it with the > linear area (I think this might have something to do with making LRO or > GRO easier/workable). > > Also things such as GRO can commonly cause received skbs being passed up > the stack to contain several frags. > > I'm not quite sure how this works but in the case of s/w GRO I wouldn't > be surprised if this resulted in a skb with lots of 1500 byte (i.e. wire > MTU) frags. I think we would end up at least coalescing those two per > buffer on transmit (3000/4096 = 73% filling the page). > > Doing better would either need start_new_rx_buffer to always completely > fill each buffer or to take a much more global view of the frags (e.g. > taking the size of the next frag and how it fits into consideration > too). What's the down side to making start_new_rx_buffer() always try to fill each buffer? Matt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.