[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH] Network Checksum Removal



Currently in Xen, interdomain communication needlessly wastes CPU cycles
calculating and verifying TCP/UDP checksums.  This is unnecessary, as
the possibility of packet corruption between domains is miniscule (and
can be detected in memory via ECC).  Also, domU's are unable to take
advantage of any adapter hardware checksum offload capabilities when
transmitting packets outside of the system.

This patch removes the inter-xen network checksums by using the existing
Linux hardware checksum offload infrastructure.  This decreased the
changes needed by this patch, and enabled me to easily use hardware
checksum on the physical
devices.

Here is how the traffic flow now works (generically):
Traffic generated by dom0 will not do the TCP/UDP checksums and will
notify domU this via the csum bit in netif_rx_response_t.  domU will
check for the csum bit on each incoming packet, and if not enabled it
will verify the checksum.

Traffic generated externally, if rx hardware checksum is available and
enabled, then dom0 will notify domU that it is unnecessary to validate
this checksum (providing the checksum is valid) by enabling the csum
bit.  If domU is not notified that it is unnecessary to vaildate the 
checksum, then domU will do it.

Traffic generated by domU will not do the TCP/UDP checksums and will
notify dom0 this via the csim bit in netif_tx_request_t.  dom0 will
check for the csum bit on each incoming packet, and if enabled it will
calculate the necessary bits for hardware checksum offload (skb->csum, 
which is the offset to insert the checksum).  It also sets
skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->flags |= SKB_FDW_NO_CSUM;

ip_summed is set in the case that the packet is destined for dom0, which
will prevent dom0 from checking the TCP/UDP checksum.  Unfortunately,
this flag is stomped on by both routing and bridging.  So I added a new
skb field and a new flag, SKB_FDW_NO_CSUM.  This field is checked on
transmission and corrects the fields that have been modified by the
bridging/routing code.  Once these fields have been corrected, the
adapter (if tx csum able) or stack (via skb_checksum_help()) will
calculate the TCP/UDP checksum.

Performance:
I ran the following test cases with netperf3 TCP_STREAM, and get the
following boosts (using bridging):
domU->dom0              500Mbps
dom0->domU              10Mbps
domU->remote host       none
domU->domU              70Mbps
Note: I have a small bridging patch which increases dom0 throughput.  I
am in the process of having it accepted into the Linux kernel.

I currently do not have CPU utilization numbers (where the real boost of
this patch would be), and I do not have throughput numbers for
routing/nat.


Also, I added the ability to enable/disable checksum offload via the
ethtool command.  

Signed-off-by: Jon Mason <jdmason@xxxxxxxxxx>

--- ../xen-unstable-pristine/xen/include/public/io/netif.h      2005-05-04 
22:20:10.000000000 -0500
+++ xen/include/public/io/netif.h       2005-05-18 12:05:41.000000000 -0500
@@ -12,7 +12,8 @@
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.  */
     MEMORY_PADDING;
-    u16      id;     /*  8: Echoed in response message. */
+    u16      csum:1;
+    u16      id:15;     /*  8: Echoed in response message. */
     u16      size;   /* 10: Packet size in bytes.       */
 } PACKED netif_tx_request_t; /* 12 bytes */
 
@@ -29,7 +30,8 @@ typedef struct {
 typedef struct {
     memory_t addr;   /*  0: Machine address of packet.              */
     MEMORY_PADDING;
-    u16      id;     /*  8:  */
+    u16      csum:1;
+    u16      id:15;     /*  8:  */
     s16      status; /* 10: -ve: BLKIF_RSP_* ; +ve: Rx'ed pkt size. */
 } PACKED netif_rx_response_t; /* 12 bytes */
 
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c  
    2005-05-04 22:20:01.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/netback.c       2005-05-19 
13:25:50.000000000 -0500
@@ -13,6 +13,9 @@
 #include "common.h"
 #include <asm-xen/balloon.h>
 #include <asm-xen/evtchn.h>
+#include <net/ip.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 #include <linux/delay.h>
@@ -154,10 +157,14 @@ int netif_be_start_xmit(struct sk_buff *
         __skb_put(nskb, skb->len);
         (void)skb_copy_bits(skb, -hlen, nskb->data - hlen, skb->len + hlen);
         nskb->dev = skb->dev;
+       nskb->ip_summed = skb->ip_summed;
         dev_kfree_skb(skb);
         skb = nskb;
     }
 
+    if (skb->ip_summed > 0)
+       netif->rx->ring[MASK_NETIF_RX_IDX(netif->rx_resp_prod)].resp.csum = 1;
+       
     netif->rx_req_cons++;
     netif_get(netif);
 
@@ -646,6 +653,18 @@ static void net_tx_action(unsigned long 
         skb->dev      = netif->dev;
         skb->protocol = eth_type_trans(skb, skb->dev);
 
+       skb->csum = 0;
+       if (txreq.csum) {
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
+               skb->flags |= SKB_FDW_NO_CSUM;
+               skb->nh.iph = (struct iphdr *) skb->data;
+               if (skb->nh.iph->protocol == IPPROTO_TCP)
+                       skb->csum = offsetof(struct tcphdr, check);
+               if (skb->nh.iph->protocol == IPPROTO_UDP)
+                       skb->csum = offsetof(struct udphdr, check);
+       } else
+               skb->ip_summed = CHECKSUM_NONE;
+
         netif->stats.rx_bytes += txreq.size;
         netif->stats.rx_packets++;
 
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c
    2005-05-04 22:20:09.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netback/interface.c     2005-05-20 
10:36:14.000000000 -0500
@@ -159,6 +159,7 @@ void netif_create(netif_be_create_t *cre
     dev->get_stats       = netif_be_get_stats;
     dev->open            = net_open;
     dev->stop            = net_close;
+    dev->features        = NETIF_F_NO_CSUM;
 
     /* Disable queuing. */
     dev->tx_queue_len = 0;
--- 
../xen-unstable-pristine/linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c
    2005-05-04 22:20:11.000000000 -0500
+++ linux-2.6.11-xen-sparse/drivers/xen/netfront/netfront.c     2005-05-20 
13:15:39.000000000 -0500
@@ -40,6 +40,7 @@
 #include <linux/init.h>
 #include <linux/bitops.h>
 #include <linux/proc_fs.h>
+#include <linux/ethtool.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 #include <net/arp.h>
@@ -287,6 +288,11 @@ static int send_fake_arp(struct net_devi
     return dev_queue_xmit(skb);
 }
 
+static struct ethtool_ops network_ethtool_ops = {
+       .get_tx_csum = ethtool_op_get_tx_csum,
+       .set_tx_csum = ethtool_op_set_tx_csum,
+};
+
 static int network_open(struct net_device *dev)
 {
     struct net_private *np = netdev_priv(dev);
@@ -472,6 +478,7 @@ static int network_start_xmit(struct sk_
     tx->id   = id;
     tx->addr = virt_to_machine(skb->data);
     tx->size = skb->len;
+    tx->csum = (skb->ip_summed) ? 1 : 0;
 
     wmb(); /* Ensure that backend will see the request. */
     np->tx->req_prod = i + 1;
@@ -572,6 +579,9 @@ static int netif_poll(struct net_device 
         skb->len  = rx->status;
         skb->tail = skb->data + skb->len;
 
+       if (rx->csum)
+               skb->ip_summed = CHECKSUM_UNNECESSARY;
+               
         np->stats.rx_packets++;
         np->stats.rx_bytes += rx->status;
 
@@ -966,7 +976,9 @@ static int create_netdev(int handle, str
     dev->get_stats       = network_get_stats;
     dev->poll            = netif_poll;
     dev->weight          = 64;
-    
+    dev->features       = NETIF_F_IP_CSUM;
+    SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
+
     if ((err = register_netdev(dev)) != 0) {
         printk(KERN_WARNING "%s> register_netdev err=%d\n", __FUNCTION__, err);
         goto exit;
--- ../xen-unstable-pristine/linux-2.6.11-xen0/include/linux/skbuff.h   
2005-03-02 01:38:38.000000000 -0600
+++ linux-2.6.11-xen0/include/linux/skbuff.h    2005-05-18 12:05:41.000000000 
-0500
@@ -37,6 +37,10 @@
 #define CHECKSUM_HW 1
 #define CHECKSUM_UNNECESSARY 2
 
+#define SKB_CLONED     1
+#define SKB_NOHDR      2
+#define SKB_FDW_NO_CSUM        4
+
 #define SKB_DATA_ALIGN(X)      (((X) + (SMP_CACHE_BYTES - 1)) & \
                                 ~(SMP_CACHE_BYTES - 1))
 #define SKB_MAX_ORDER(X, ORDER)        (((PAGE_SIZE << (ORDER)) - (X) - \
@@ -238,7 +242,7 @@ struct sk_buff {
                                mac_len,
                                csum;
        unsigned char           local_df,
-                               cloned,
+                               flags,
                                pkt_type,
                                ip_summed;
        __u32                   priority;
@@ -370,7 +374,7 @@ static inline void kfree_skb(struct sk_b
  */
 static inline int skb_cloned(const struct sk_buff *skb)
 {
-       return skb->cloned && atomic_read(&skb_shinfo(skb)->dataref) != 1;
+       return (skb->flags & SKB_CLONED) && 
atomic_read(&skb_shinfo(skb)->dataref) != 1;
 }
 
 /**
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/skbuff.c        
2005-03-02 01:38:17.000000000 -0600
+++ linux-2.6.11-xen0/net/core/skbuff.c 2005-05-18 12:05:41.000000000 -0500
@@ -240,7 +240,7 @@ static void skb_clone_fraglist(struct sk
 
 void skb_release_data(struct sk_buff *skb)
 {
-       if (!skb->cloned ||
+       if (!(skb->flags & SKB_CLONED) ||
            atomic_dec_and_test(&(skb_shinfo(skb)->dataref))) {
                if (skb_shinfo(skb)->nr_frags) {
                        int i;
@@ -352,7 +352,7 @@ struct sk_buff *skb_clone(struct sk_buff
        C(data_len);
        C(csum);
        C(local_df);
-       n->cloned = 1;
+       n->flags = skb->flags | SKB_CLONED;
        C(pkt_type);
        C(ip_summed);
        C(priority);
@@ -395,7 +395,7 @@ struct sk_buff *skb_clone(struct sk_buff
        C(end);
 
        atomic_inc(&(skb_shinfo(skb)->dataref));
-       skb->cloned = 1;
+       skb->flags |= SKB_CLONED;
 
        return n;
 }
@@ -603,7 +603,7 @@ int pskb_expand_head(struct sk_buff *skb
        skb->mac.raw += off;
        skb->h.raw   += off;
        skb->nh.raw  += off;
-       skb->cloned   = 0;
+       skb->flags    &= SKB_CLONED;
        atomic_set(&skb_shinfo(skb)->dataref, 1);
        return 0;
 
--- ../xen-unstable-pristine/linux-2.6.11-xen0/net/core/dev.c   2005-03-02 
01:38:09.000000000 -0600
+++ linux-2.6.11-xen0/net/core/dev.c    2005-05-20 10:20:36.000000000 -0500
@@ -98,6 +98,7 @@
 #include <linux/stat.h>
 #include <linux/if_bridge.h>
 #include <linux/divert.h>
+#include <net/ip.h> 
 #include <net/dst.h>
 #include <net/pkt_sched.h>
 #include <net/checksum.h>
@@ -1182,7 +1183,7 @@ int __skb_linearize(struct sk_buff *skb,
        skb->data    += offset;
 
        /* We are no longer a clone, even if we were. */
-       skb->cloned    = 0;
+       skb->flags    &= ~SKB_CLONED;
 
        skb->tail     += skb->data_len;
        skb->data_len  = 0;
@@ -1236,6 +1237,15 @@ int dev_queue_xmit(struct sk_buff *skb)
            __skb_linearize(skb, GFP_ATOMIC))
                goto out_kfree_skb;
 
+       /* If packet is forwarded to a device that needs a checksum and not 
+        * checksummed, correct the pointers and enable checksumming in the 
+        * next function.
+        */
+       if (skb->flags & SKB_FDW_NO_CSUM) {
+               skb->ip_summed = CHECKSUM_HW;
+               skb->h.raw = (void *)skb->nh.iph + (skb->nh.iph->ihl * 4);
+       }
+
        /* If packet is not checksummed and device does not support
         * checksumming for this protocol, complete checksumming here.
         */

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.