[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net v2 1/3] xen-netback: remove pointless clause from if statement



> -----Original Message-----
> From: netdev-owner@xxxxxxxxxxxxxxx [mailto:netdev-
> owner@xxxxxxxxxxxxxxx] On Behalf Of David Laight
> Sent: 28 March 2014 10:01
> To: Paul Durrant; Sander Eikelenboom
> Cc: netdev@xxxxxxxxxxxxxxx; Wei Liu; Ian Campbell; xen-devel@xxxxxxxxxxxxx
> Subject: RE: [Xen-devel] [PATCH net v2 1/3] xen-netback: remove pointless
> clause from if statement
> 
> From: Paul Durrant
> ...
> > > The behaviour of the Windows frontend is different to netfront; it tries 
> > > to
> > > keep the shared ring as full as possible so the estimate could be as
> > > pessimistic as you like (as long as it doesn't exceed ring size ;-)) and 
> > > you'd
> > > never see the lock-up. For some reason (best known to the originator of
> the
> > > code I suspect) the Linux netfront driver limits the number of requests it
> > > posts into the shared ring leading to the possibility of lock-up in the 
> > > case
> > > where the backend needs more slots than the fontend 'thinks' it should.
> > > But from what i read the ring size is determined by the frontend .. so 
> > > that
> PV
> > > driver should be able to guarantee that itself ..
> > >
> >
> > The ring size is 256 - that's baked in. The number of pending requests
> available to backend *is*
> > determined by the frontend.
> >
> > > Which begs for the question .. was that change of max_slots_needed
> > > calculation *needed* to prevent the problem you saw on "Windows
> Server
> > > 2008R2",
> > > or was that just changed for correctness ?
> > >
> >
> > It was changed for correctness. As I understand it, use of MAX_SKB_FRAGS
> is incorrect if compound
> > pages are in use as the page size is no longer the slot size. It's also 
> > wasteful
> to always wait for
> > space for a maximal packet if the packet you have is smaller so the
> intention of the max estimate was
> > that it should be at least the number of slots required but not excessive. I
> think you've proved that
> > making such an estimate is just too hard and since we don't want to fall
> back to the old dry-run style
> > of slot counting (which meant you had two codepaths that *must* arrive at
> the same number - and they
> > didn't, which is why I was getting the lock-up with Windows guests) I think
> we should just go with
> > full-packing so that we don't need to estimate.
> 
> A reasonable high estimate for the number of slots required for a specific
> message is 'frag_count + total_size/4096'.
> So if that are that many slots free it is definitely ok to add the message.
> 

Hmm, that may work. By total_size, I assume you mean skb->len, so that 
calculation is based on an overhead of 1 non-optimally packed slot per frag. 
There'd still need to be a +1 for the GSO 'extra' though.

> I can see a more general problem for transmits.
> I believe a NAPI driver is supposed to indicate that it can't accept
> a tx packet in advance of being given a specific packet to transmit.
> This means it has to keep enough tx ring space for a worst case packet
> (which in some cases can be larger than 1+MAX_SKB_FRAGS) even though
> such a packet is unlikely.
> I would be tempted to save the skb that 'doesn't fit' within the driver
> rather than try to second guess the number of fragments the next packet
> will need.
> 

Well, we avoid that by having an internal queue and then only stopping the tx 
queue if the skb we were just handed will definitely not fit. TBH though, I 
think this internal queue is problematic though as we require a context switch 
to get the skbs into the shared ring and I think the extra latency caused by 
this is hitting performance. If we do get rid of it then we do need to worry 
about the size of a maximal skb again.

  Paul

> FWIW the USB3 'bulk' driver has the same problem, fragments can't cross
> 64k boundaries.
> 
>       David
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.