[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads



> -----Original Message-----
> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel-
> bounces@xxxxxxxxxxxxx] On Behalf Of Paul Durrant
> Sent: 07 November 2013 11:02
> To: Ian Campbell
> Cc: Boris Ostrovsky; David Vrabel; xen-devel@xxxxxxxxxxxxx
> Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for
> IPv6 offloads
> 
> > -----Original Message-----
> > From: Ian Campbell
> > Sent: 07 November 2013 10:49
> > To: Paul Durrant
> > Cc: xen-devel@xxxxxxxxxxxxx; Boris Ostrovsky; David Vrabel
> > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for
> > IPv6 offloads
> >
> > On Tue, 2013-11-05 at 11:59 +0000, Paul Durrant wrote:
> > > This patch adds support for IPv6 checksum offload and GSO when those
> > > fetaures are available in the backend.
> > >
> > > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > > Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > Cc: David Vrabel <david.vrabel@xxxxxxxxxx>
> > > ---
> > >  drivers/net/xen-netfront.c |  263
> > ++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 231 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > index dd1011e..27212d8 100644
> > > --- a/drivers/net/xen-netfront.c
> > > +++ b/drivers/net/xen-netfront.c
> > > @@ -616,7 +616,9 @@ static int xennet_start_xmit(struct sk_buff *skb,
> > struct net_device *dev)
> > >           tx->flags |= XEN_NETTXF_extra_info;
> > >
> > >           gso->u.gso.size = skb_shinfo(skb)->gso_size;
> > > -         gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> > > +         gso->u.gso.type = (skb_shinfo(skb)->gso_type &
> > SKB_GSO_TCPV6) ?
> > > +                           XEN_NETIF_GSO_TYPE_TCPV6 :
> > > +                           XEN_NETIF_GSO_TYPE_TCPV4;
> > >           gso->u.gso.pad = 0;
> > >           gso->u.gso.features = 0;
> > >
> > > @@ -808,15 +810,18 @@ static int xennet_set_skb_gso(struct sk_buff
> > *skb,
> > >           return -EINVAL;
> > >   }
> > >
> > > - /* Currently only TCPv4 S.O. is supported. */
> > > - if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
> > > + if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4 &&
> > > +     gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV6) {
> > >           if (net_ratelimit())
> > >                   pr_warn("Bad GSO type %d\n", gso->u.gso.type);
> > >           return -EINVAL;
> > >   }
> > >
> > >   skb_shinfo(skb)->gso_size = gso->u.gso.size;
> > > - skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> > > + skb_shinfo(skb)->gso_type =
> > > +         (gso->u.gso.type == XEN_NETIF_GSO_TYPE_TCPV4) ?
> > > +         SKB_GSO_TCPV4 :
> > > +         SKB_GSO_TCPV6;
> > >
> > >   /* Header must be checked, and gso_segs computed. */
> > >   skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
> > > @@ -856,62 +861,72 @@ static RING_IDX xennet_fill_frags(struct
> > netfront_info *np,
> > >   return cons;
> > >  }
> > >
> > > -static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> > > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> > >  {
> > > - struct iphdr *iph;
> > > - int err = -EPROTO;
> > > - int recalculate_partial_csum = 0;
> > > -
> > > - /*
> > > -  * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> > > -  * peers can fail to set NETRXF_csum_blank when sending a GSO
> > > -  * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> > > -  * recalculate the partial checksum.
> > > -  */
> > > - if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
> > > -         struct netfront_info *np = netdev_priv(dev);
> > > -         np->rx_gso_checksum_fixup++;
> > > -         skb->ip_summed = CHECKSUM_PARTIAL;
> > > -         recalculate_partial_csum = 1;
> > > + if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > > +         /* If we need to pullup then pullup to the max, so we
> > > +          * won't need to do it again.
> > > +          */
> > > +         int target = min_t(int, skb->len, MAX_TCP_HEADER);
> > > +         __pskb_pull_tail(skb, target - skb_headlen(skb));
> >
> > Should the functions "len" argument not be involved somewhere in this?
> > What if it is bigger than MAX_TCP_HEADER for example?
> 
> Hmm, good point.
> 

Actually, now I look again, this is correct. The if statement checks len to see 
if a pullup is necessary, but if one is necessary then it always pulls up to 
the max. I still need to add a failure return to indicate the pullup did not 
achieve the desired length though.

  Paul

> >
> > >   }
> > > +}
> > >
> > > - /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> > > - if (skb->ip_summed != CHECKSUM_PARTIAL)
> > > -         return 0;
> > > +static int checksum_setup_ip(struct sk_buff *skb, int
> > recalculate_partial_csum)
> > > +{
> >
> > > + struct iphdr *iph = (void *)skb->data;
> > > + unsigned int header_size;
> > > + unsigned int off;
> > > + int err = -EPROTO;
> > >
> > > - if (skb->protocol != htons(ETH_P_IP))
> > > -         goto out;
> > > + off = sizeof(struct iphdr);
> > >
> > > - iph = (void *)skb->data;
> > > + header_size = skb->network_header + off + MAX_IPOPTLEN;
> > > + maybe_pull_tail(skb, header_size);
> >
> > You never reuse header_size other than setting it and immediately
> > passing it to maybe_pull_tail, it could be inlined into the function
> > call?
> >
> 
> Yes, it could but I thought this was tidier. It's also the same as the code in
> netback.
> 
> > (I found it confusing because you reset it in the recalculate_partial
> > rather than extending it which is what I expected)
> >
> > maybe_pull_tail doesn't guarantee to pull up to the length of the given
> > parameter, e.g. if it is more then skb->len. Which I think means you
> > need some other explicit skb len checks around the place, don't you?
> > Otherwise you risk running past the end of the skb for a malformed
> > frame.
> >
> 
> Yes, that's true. Probably best to have maybe_pull_tail() return a failure if 
> it
> doesn't managed to pullup the requested amount then.
> 
> > > +
> > > + off = iph->ihl * 4;
> > >
> > >   switch (iph->protocol) {
> > >   case IPPROTO_TCP:
> > > -         if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> > > +         if (!skb_partial_csum_set(skb, off,
> > >                                     offsetof(struct tcphdr, check)))
> > >                   goto out;
> > >
> > >           if (recalculate_partial_csum) {
> > >                   struct tcphdr *tcph = tcp_hdr(skb);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct tcphdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > >                   tcph->check = ~csum_tcpudp_magic(iph->saddr, iph-
> > >daddr,
> > > -                                                  skb->len - iph->ihl*4,
> > > +                                                  skb->len - off,
> > >                                                    IPPROTO_TCP, 0);
> > >           }
> > >           break;
> > >   case IPPROTO_UDP:
> > > -         if (!skb_partial_csum_set(skb, 4 * iph->ihl,
> > > +         if (!skb_partial_csum_set(skb, off,
> > >                                     offsetof(struct udphdr, check)))
> > >                   goto out;
> > >
> > >           if (recalculate_partial_csum) {
> > >                   struct udphdr *udph = udp_hdr(skb);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct udphdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > >                   udph->check = ~csum_tcpudp_magic(iph->saddr,
> > iph->daddr,
> > > -                                                  skb->len - iph->ihl*4,
> > > +                                                  skb->len - off,
> > >                                                    IPPROTO_UDP, 0);
> > >           }
> > >           break;
> > >   default:
> > >           if (net_ratelimit())
> > > -                 pr_err("Attempting to checksum a non-TCP/UDP
> > packet, dropping a protocol %d packet\n",
> > > +                 pr_err("Attempting to checksum a non-TCP/UDP
> > packet, "
> > > +                        "dropping a protocol %d packet\n",
> >
> > I think Linux's coding style explicitly relaxes the 80-column limit for
> > string literals.
> >
> 
> Ok. Good to know.
> 
> > >                          iph->protocol);
> > >           goto out;
> > >   }
> > > @@ -922,6 +937,158 @@ out:
> > >   return err;
> > >  }
> > >
> > > +static int checksum_setup_ipv6(struct sk_buff *skb, int
> > recalculate_partial_csum)
> > > +{
> > > + int err = -EPROTO;
> > > + struct ipv6hdr *ipv6h = (void *)skb->data;
> > > + u8 nexthdr;
> > > + unsigned int header_size;
> > > + unsigned int off;
> > > + bool fragment;
> > > + bool done;
> > > +
> > > + done = false;
> > > +
> > > + off = sizeof(struct ipv6hdr);
> > > +
> > > + header_size = skb->network_header + off;
> > > + maybe_pull_tail(skb, header_size);
> >
> > Same comment about skb lengths?
> >
> > > +
> > > + nexthdr = ipv6h->nexthdr;
> > > +
> > > + while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> > &&
> > > +        !done) {
> > > +         switch (nexthdr) {
> > > +         case IPPROTO_DSTOPTS:
> > > +         case IPPROTO_HOPOPTS:
> > > +         case IPPROTO_ROUTING: {
> > > +                 struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct ipv6_opt_hdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > > +                 nexthdr = hp->nexthdr;
> > > +                 off += ipv6_optlen(hp);
> > > +                 break;
> > > +         }
> > > +         case IPPROTO_AH: {
> > > +                 struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct ip_auth_hdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > > +                 nexthdr = hp->nexthdr;
> > > +                 off += (hp->hdrlen+2)<<2;
> >
> > I wonder if the core header would welcome a ipv6_ahlen cf optlen?
> >
> 
> Yes, it would be nicer wouldn't it.
> 
> > > +                 break;
> > > +         }
> > > +         case IPPROTO_FRAGMENT:
> > > +                 fragment = true;
> > > +                 /* fall through */
> > > +         default:
> > > +                 done = true;
> > > +                 break;
> > > +         }
> > > + }
> > > +
> > > + if (!done) {
> > > +         if (net_ratelimit())
> > > +                 pr_err("Failed to parse packet header\n");
> >
> > Is it a requirement of the protocol that the list of IPPROTO_* we handle
> > must be followed by some other header? Do we really care or is it more
> > the upper stacks problem? Presumably it can get similarly malformed
> > packets off the wire?
> 
> I'd have to check but I'm pretty sure none of the protos we handle can be
> terminating so if done is not set then the packet is malformed.
> 
> >
> > Also you can use net_err_ratelimited.
> 
> Ok.
> 
> >
> > > +         goto out;
> > > + }
> > > +
> > > + if (fragment) {
> > > +         if (net_ratelimit())
> > > +                 pr_err("Packet is a fragment!\n");
> >
> > Is that a bad thing?
> >
> 
> Well, you can't do TCP or UDP checksum offload on a fragment, can you?
> 
> > > +         goto out;
> > > + }
> > > +
> > > + switch (nexthdr) {
> > > + case IPPROTO_TCP:
> > > +         if (!skb_partial_csum_set(skb, off,
> > > +                                   offsetof(struct tcphdr, check)))
> > > +                 goto out;
> > > +
> > > +         if (recalculate_partial_csum) {
> > > +                 struct tcphdr *tcph = tcp_hdr(skb);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct tcphdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > > +                 tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> > > +                                                &ipv6h->daddr,
> > > +                                                skb->len - off,
> > > +                                                IPPROTO_TCP, 0);
> > > +         }
> > > +         break;
> > > + case IPPROTO_UDP:
> > > +         if (!skb_partial_csum_set(skb, off,
> > > +                                   offsetof(struct udphdr, check)))
> > > +                 goto out;
> > > +
> > > +         if (recalculate_partial_csum) {
> > > +                 struct udphdr *udph = udp_hdr(skb);
> > > +
> > > +                 header_size = skb->network_header +
> > > +                         off +
> > > +                         sizeof(struct udphdr);
> > > +                 maybe_pull_tail(skb, header_size);
> > > +
> > > +                 udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
> > > +                                                &ipv6h->daddr,
> > > +                                                skb->len - off,
> > > +                                                IPPROTO_UDP, 0);
> > > +         }
> > > +         break;
> > > + default:
> > > +         if (net_ratelimit())
> > > +                 pr_err("Attempting to checksum a non-TCP/UDP
> > packet, "
> > > +                        "dropping a protocol %d packet\n",
> > > +                        nexthdr);
> > > +         goto out;
> > > + }
> > > +
> > > + err = 0;
> > > +
> > > +out:
> > > + return err;
> > > +}
> > > +
> > > +static int checksum_setup(struct net_device *dev, struct sk_buff *skb)
> > > +{
> > > + int err = -EPROTO;
> > > + int recalculate_partial_csum = 0;
> > > +
> > > + /*
> > > +  * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
> > > +  * peers can fail to set NETRXF_csum_blank when sending a GSO
> > > +  * frame. In this case force the SKB to CHECKSUM_PARTIAL and
> > > +  * recalculate the partial checksum.
> > > +  */
> > > + if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
> >
> > Is it worth making this ipv4 only. Assuming that anything which adds v6
> > support has already fixed this bug? That would save time in
> > checksum_setup_ipv6.
> >
> 
> That's true.
> 
> > Speaking of which can checksum_setup_* be shared with netfront
> > somehow?
> > Or maybe even put in common code? (Perhaps as a future cleanup)
> >
> 
> I think it really should be in common code, but I think that would be better
> handled as a separate patch to remove the duplication.
> 
> Thanks for the comprehensive review!
> 
>   Paul
> 
> > > +         struct netfront_info *np = netdev_priv(dev);
> > > +         np->rx_gso_checksum_fixup++;
> > > +         skb->ip_summed = CHECKSUM_PARTIAL;
> > > +         recalculate_partial_csum = 1;
> > > + }
> > > +
> > > + /* A non-CHECKSUM_PARTIAL SKB does not require setup. */
> > > + if (skb->ip_summed != CHECKSUM_PARTIAL)
> > > +         return 0;
> > > +
> > > + if (skb->protocol == htons(ETH_P_IP))
> > > +         err = checksum_setup_ip(skb, recalculate_partial_csum);
> > > + else if (skb->protocol == htons(ETH_P_IPV6))
> > > +         err = checksum_setup_ipv6(skb, recalculate_partial_csum);
> > > +
> > > + return err;
> > > +}
> > > +
> > >  static int handle_incoming_queue(struct net_device *dev,
> > [...]
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.