[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next] xen-netback: improve guest-receive-side flow control



On Thu, Nov 28, 2013 at 01:11:12PM +0000, Paul Durrant wrote:
> The flow control code relies on a double pass of tke skb, firstly to count
> the number of ring slots needed to supply sufficient grant references, and
> another to actually consume those references and build the grant copy
> operations. It transpires that, in some scenarios, the initial count and the
> number of references consumed differs and, after this happens a number of
> times, flow control is completely broken.

Please add blank lines between paragraphs in later versions.

> This patch simplifies things by making a very quick static analysis of the
> minimal number or ring slots needed for an skb and then only stopping the
> queue if that minimal number of slots is not available. The worker thread
> does a worst-case analysis and only processes what will definitely fit,
> leaving remaining skbs on the internal queue. This patch also gets rid of the
> drop that occurs if an skb won't fit (because that doesn't matter - we always
> internally queue *at least* what we can put in the shared ring) and adds
> some hysteresis to waking the external queue. (It only gets woken when at
> least half the shared ring is available).

I think this is the right direction to go. However we've tripped over
this several times so is it possible to test it with XenRT?

> Without this patch I can trivially stall netback permanently by just doing
> a large guest to guest file copy between two Windows Server 2008R2 VMs on a
> single host.
> Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> Cc: Ian Campbell <ian.campbell@xxxxxxxxxx>
> Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> ---
>  drivers/net/xen-netback/common.h    |   26 ++---
>  drivers/net/xen-netback/interface.c |   45 ++++----
>  drivers/net/xen-netback/netback.c   |  202 
> ++++++++++-------------------------
>  3 files changed, 87 insertions(+), 186 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h 
> b/drivers/net/xen-netback/common.h
> index 08ae01b..d7888db 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -136,12 +136,7 @@ struct xenvif {
>       char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
>       struct xen_netif_rx_back_ring rx;
>       struct sk_buff_head rx_queue;
> -
> -     /* Allow xenvif_start_xmit() to peek ahead in the rx request
> -      * ring.  This is a prediction of what rx_req_cons will be
> -      * once all queued skbs are put on the ring.
> -      */
> -     RING_IDX rx_req_cons_peek;
> +     bool rx_event;
>  

What is this for? Reading through the code my impression is that this is
a flag indicating RX interrupt is triggered. Why is it useful? Why is
skb_queue_empty not sufficient?

>       /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
>        * head/fragment page uses 2 copy operations because it
> @@ -198,8 +193,6 @@ void xenvif_xenbus_fini(void);
>  
[...]
>  
> -void xenvif_notify_tx_completion(struct xenvif *vif)
> -{
> -     if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
> -             netif_wake_queue(vif->dev);
> -}
> -
>  static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
>  {
>       struct xenvif *vif = netdev_priv(dev);
> @@ -378,6 +378,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long 
> tx_ring_ref,
>       if (err < 0)
>               goto err;
>  
> +     init_waitqueue_head(&vif->wq);
> +

I guess the reason behind this code movement is that, if we don't do this
netback will crash if we get interrupt before waitqueue is set up?

>       if (tx_evtchn == rx_evtchn) {
>               /* feature-split-event-channels == 0 */
>               err = bind_interdomain_evtchn_to_irqhandler(
[...]
> @@ -583,23 +477,36 @@ void xenvif_rx_action(struct xenvif *vif)
>  
>       skb_queue_head_init(&rxq);
>  
> -     count = 0;
> -
>       while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> -             vif = netdev_priv(skb->dev);
> -             nr_frags = skb_shinfo(skb)->nr_frags;
> +             int max_slots_needed;
> +             int i;
>  
> -             sco = (struct skb_cb_overlay *)skb->cb;
> -             sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> +             /* We need a cheap worse case estimate for the number of
> +              * slots we'll use.
> +              */
>  
> -             count += nr_frags + 1;
> +             max_slots_needed = 2;

I'm afraid this starting point is not correct. Consider you have a SKB
with very large linear buffer, you might need more than 2 slots to fit
that in, right?

> +             for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +                     struct page *page;
>  
> -             __skb_queue_tail(&rxq, skb);
> +                     page = skb_frag_page(&skb_shinfo(skb)->frags[i]);
> +                     max_slots_needed += 1 << compound_order(page);

This estimation might reserve slots more than necessary.

The removal of xenvif_count_skb_slots() is OK because you're now
estimating minimum slots, but the actual number of slots consumed needs
to be acurate. Any reason you don't want to use xenvif_gop_skb()?

> +             }
> +             if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 ||
> +                 skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> +                     max_slots_needed++;
>  
> -             /* Filled the batch queue? */
> -             /* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
> -             if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> +             /* If the skb may not fit then bail out now */
> +             if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
> +                     skb_queue_head(&vif->rx_queue, skb);
>                       break;
> +             }
> +
> +             sco = (struct skb_cb_overlay *)skb->cb;
> +             sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
> +             BUG_ON(sco->meta_slots_used > max_slots_needed);
> +
> +             __skb_queue_tail(&rxq, skb);
>       }
>  
[...]
> +
>  int xenvif_kthread(void *data)
>  {
>       struct xenvif *vif = data;
> +     const int threshold = XEN_NETIF_RX_RING_SIZE / 2;
>  
>       while (!kthread_should_stop()) {
> +

Stray blank line.


Wei.

>               wait_event_interruptible(vif->wq,
>                                        rx_work_todo(vif) ||
>                                        kthread_should_stop());
>               if (kthread_should_stop())
>                       break;
>  
> -             if (rx_work_todo(vif))
> +             if (!skb_queue_empty(&vif->rx_queue))
>                       xenvif_rx_action(vif);
>  
> +             vif->rx_event = false;
> +             if (xenvif_rx_ring_slots_available(vif, threshold))
> +                     xenvif_start_queue(vif);
> +
>               cond_resched();
>       }
>  
> -- 
> 1.7.10.4

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.