[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control
> -----Original Message----- > From: annie li [mailto:annie.li@xxxxxxxxxx] > Sent: 01 December 2013 04:18 > To: Paul Durrant > Cc: Wei Liu; xen-devel@xxxxxxxxxxxxx; David Vrabel; Ian Campbell > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest- > receive-side flow control > > > On 2013-11-29 18:03, Paul Durrant wrote: > >> Can you describe it in more details? What kind of scenarios does this > >> issue happen? I assume current xenvif_count_skb_slots() returns > accurate > >> value now. > >> > > Ok, I'll elaborate a bit for the record... > > > > The way that flow control works without this patch is that, in start_xmit > > the > code uses xenvif_count_skb_slots() to predict how many slots > xenvif_gop_skb() will consume and then adds this to a 'req_cons_peek' > counter which it then uses to determine if that shared ring has that amount > of space available by checking whether 'req_prod' has passed that value. If > the ring doesn't have space the tx queue is stopped. xenvif_gop_skb() will > then consume slots and update 'req_cons' and issue responses, updating > 'rsp_prod' as it goes. The frontend will consume those responses and post > new requests, by updating req_prod. So, req_prod chases req_cons which > chases rsp_prod, and can never exceed that value. Thus if > xenvif_count_skb_slots() ever returns a number of slots greater than > xenvif_gop_skb() uses req_cons_peek will get to a value that req_prod > cannot achieve (since it's limited by the 'real' req_cons) and, if this > happens > enough times, req_cons_peek gets more than a ring size ahead! > > of req_cons and the tx queue then remains stopped forever waiting for > an unachievable amount of space to become available in the ring. It was fairly > trivial to check that this was happening by adding a BUG_ON if the value > calculated by xenvif_count_skb_slots() was ever different to that returned > by xenvif_gob_skb(). Starting a simple SMB transfer between a couple of > windows VMs was enough to trigger it. > > So xenvif_count_skb_slots() not returning accurate vaule causes this issue. > Yes, that's right. > > > > Having two routines trying to calculate the same value is always going to be > fragile, so this patch does away with that. All we essentially need to do is > make sure that we have 'enough stuff' on our internal queue without letting > it build up uncontrollably. So start_xmit makes a cheap optimistic check of > how much space is needed for an skb and only turns the queue off if that is > unachievable. net_rx_action() is the place where we could do with an > accurate predicition but, since that has proven tricky to calculate, a cheap > worse-case (but not too bad) estimate is all we really need since the only > thing we *must* prevent is xenvif_gop_skb() consuming more slots than are > available. > > > > I also added some hysteresis as the code was pretty dumb in that respect > and would wake the tx queue as soon as enough space for a single skb > became available, basically leading to a lot of thrashing between the queue > being stopped or active. > > > > Does that explain? > > > This makes sense, thanks Paul. > Good :-) Paul _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |