[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used



On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote:
> On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote:
> > > -----Original Message-----
> > > From: Matt Wilson [mailto:msw@xxxxxxxxxx]
> > > Sent: Wednesday, December 12, 2012 3:05 AM
> > >
> > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote:
> > > >
> > > > You can clearly see below that copy_off is input to
> > > > start_new_rx_buffer while copying frags.
> > > 
> > > Yes, but that's the right thing to do. copy_off should be set to the
> > > destination offset after copying the last byte of linear data, which
> > > means "skb_headlen(skb) % PAGE_SIZE" is correct.
> > > 
> >
> > No. It is not correct for two reasons. For example what if
> > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would
> > be set to ZERO. And now if there exists some data in frags, ZERO
> > will be passed in as copy_off value and start_new_rx_buffer will
> > return FALSE. And second reason is the obvious case from the current
> > code where "offset_in_page(skb->data)" size hole will be left in the
> > first buffer after first pass in case remaining data that need to be
> > copied is going to overflow the first buffer.
> 
> Right, and I'm arguing that having the code leave a hole is less
> desirable than potentially increasing the number of copy
> operations. I'd like to hear from Ian and others if using the buffers
> efficiently is more important than reducing copy operations. Intuitively,
> I think it's more important to use the ring efficiently.

Do you mean the ring or the actual buffers?

The current code tries to coalesce multiple small frags/heads because it
is usually trivial but doesn't try too hard with multiple larger frags,
since they take up most of a page by themselves anyway. I suppose this
does waste a bit of buffer space and therefore could take more ring
slots, but it's not clear to me how much this matters in practice (it
might be tricky to measure this with any realistic workload).

The cost of splitting a copy into two should be low though, the copies
are already batched into a single hypercall and I'd expect things to be
mostly dominated by the data copy itself rather than the setup of each
individual op, which would argue for splitting a copy in two if that
helps fill the buffers.

The flip side is that once you get past the headers etc the paged frags
likely tend to either bits and bobs (fine) or mostly whole pages. In the
whole pages case trying to fill the buffers will result in every copy
getting split. My gut tells me that the whole pages case probably
dominates, but I'm not sure what the real world impact of splitting all
the copies would be.

> 
> > If we fix it the way you are proposing, not to leave a hole in the
> > first buffer by modifying start_new_rx_buffer then it will occupy 4
> > ring buffers but will require 8 copy operations as per existing
> > logic in netbk_gop_skb while copying head!!
> 
> I think that we should optimize for more commonly seen lengths, which
> I'd think would be 2-3 pages max. The delta in copy operations is
> smaller in these cases, where we would use 4 copy operations for 2
> pages (as opposed to 3 with the current algorithm) and 6 copy
> operations for 3 pages (as opposed to 4 with the current algorithm).
> 
> IanC, Konrad - do you have any opinions on the best approach here?
> 
> Matt



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.