[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest



> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@xxxxxxxxxx]
> Sent: 08 October 2013 14:31
> To: Paul Durrant
> Cc: xen-devel@xxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6
> checksum offload from guest
> 
> On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
> [...]
> > -/*
> > - * This is the amount of packet we copy rather than map, so that the
> > - * guest can't fiddle with the contents of the headers while we do
> > - * packet processing on them (netfilter, routing, etc).
> > +/* This is a miniumum size for the linear area to avoid lots of
> > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> >   */
> > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > -                    VLAN_HLEN + \
> > -                    sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > -                    sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > +#define PKT_PROT_LEN 128
> >
> 
> Where does 128 come from?
> 

It's just an arbitrary power of 2 that was chosen because it seems to cover 
most likely v6 headers and all v4 headers.

> >  static u16 frag_get_pending_idx(skb_frag_t *frag)
> >  {
> > @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
> >     return 0;
> >  }
> >
> > -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> >  {
> > -   struct iphdr *iph;
> > +   if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > +           int target = min_t(int, skb->len, len);
> > +           __pskb_pull_tail(skb, target - skb_headlen(skb));
> > +   }
> > +}
> > +
> > +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> > +                        int recalculate_partial_csum)
> > +{
> [...]
> >
> >             if (recalculate_partial_csum) {
> >                     struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +                   header_size = skb->network_header +
> > +                           off +
> > +                           sizeof(struct tcphdr) +
> > +                           MAX_TCP_OPTION_SPACE;
> > +                   maybe_pull_tail(skb, header_size);
> > +
> 
> I presume this function is checksum_setup stripped down to handle IPv4
> packet. What's the purpose of changing its behaviour? Why the pull_tail
> here?
> 

We have to make sure that the TCP header is in the linear area as we are about 
to write to the checksum field. In practice, the 128 byte pull should guarantee 
this but in case that is varied later I wanted to make sure things did not 
start to fail in an add way.

> > +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> > +                          int recalculate_partial_csum)
> > +{
> > +   int err = -EPROTO;
> > +   struct ipv6hdr *ipv6h = (void *)skb->data;
> > +   u8 nexthdr;
> > +   unsigned int header_size;
> > +   unsigned int off;
> > +   bool done;
> > +
> > +   done = false;
> > +   off = sizeof(struct ipv6hdr);
> > +
> > +   header_size = skb->network_header + off;
> > +   maybe_pull_tail(skb, header_size);
> > +
> > +   nexthdr = ipv6h->nexthdr;
> > +
> > +   while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > +          !done) {
> > +           /* We only access at most the first 2 bytes of any option
> header
> > +            * to determine the next one.
> > +            */
> > +           header_size = skb->network_header + off + 2;
> > +           maybe_pull_tail(skb, header_size);
> > +
> 
> Will this cause performance problem? Is it possible that you pull too
> many times?
> 

I guess it means we may get two pulls for the TCP/UDP headers rather than one 
so could push the pulls into the individual cases if you think it will affect 
performance that badly.

> > +           switch (nexthdr) {
> > +           case IPPROTO_FRAGMENT: {
> > +                   struct frag_hdr *hp = (void *)(skb->data + off);
> > +
> > +                   nexthdr = hp->nexthdr;
> > +                   off += 8;
> > +                   break;
> > +           }
> > +           case IPPROTO_DSTOPTS:
> > +           case IPPROTO_HOPOPTS:
> > +           case IPPROTO_ROUTING: {
> > +                   struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +
> > +                   nexthdr = hp->nexthdr;
> > +                   off += ipv6_optlen(hp);
> > +                   break;
> > +           }
> > +           case IPPROTO_AH: {
> > +                   struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +
> > +                   nexthdr = hp->nexthdr;
> > +                   off += (hp->hdrlen+2)<<2;
> > +                   break;
> > +           }
> > +           default:
> > +                   done = true;
> > +                   break;
> 
> Can you make the logic explicit here?
> 
>    case IPPROTO_TCP:
>    case IPPROTO_UDP:
>         done = true;
>       break;
>    default:
>         break;
> 
> Another minor suggestion is that change "done" to "found" because you're
> trying to find the two type of headers.
> 

Yes, I could code it that way if you prefer.

> > +   switch (nexthdr) {
> > +   case IPPROTO_TCP:
> > +           if (!skb_partial_csum_set(skb, off,
> > +                                     offsetof(struct tcphdr, check)))
> > +                   goto out;
> > +
> > +           if (recalculate_partial_csum) {
> > +                   struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +                   header_size = skb->network_header +
> > +                           off +
> > +                           sizeof(struct tcphdr) +
> > +                           MAX_TCP_OPTION_SPACE;
> > +                   maybe_pull_tail(skb, header_size);
> > +
> 
> Same question as IPv4 counterpart, why do you need to pull here?
> 

Same answer as before ;-)

  Paul

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.