Xen project Mailing List

Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Tue, 19 Nov 2013 11:24:04 +0000

Accept-language: en-GB, en-US

Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 19 Nov 2013 11:24:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHO2h6EANDr8Zxuy0SBk1C5wcoKZ5oZiNYAgAASBBCADJmDwIAGKugAgAAdY0A=

Thread-topic: [Xen-devel] [PATCH net-next] xen-netfront: Add support for IPv6 offloads

> -----Original Message----- > From: Ian Campbell > Sent: 19 November 2013 10:29 > To: Paul Durrant > Cc: Boris Ostrovsky; David Vrabel; xen-devel@xxxxxxxxxxxxx > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for > IPv6 offloads > > On Fri, 2013-11-15 at 11:19 +0000, Paul Durrant wrote: > > > -----Original Message----- > > > From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > > > bounces@xxxxxxxxxxxxx] On Behalf Of Paul Durrant > > > Sent: 07 November 2013 11:02 > > > To: Ian Campbell > > > Cc: Boris Ostrovsky; David Vrabel; xen-devel@xxxxxxxxxxxxx > > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support for > > > IPv6 offloads > > > > > > > -----Original Message----- > > > > From: Ian Campbell > > > > Sent: 07 November 2013 10:49 > > > > To: Paul Durrant > > > > Cc: xen-devel@xxxxxxxxxxxxx; Boris Ostrovsky; David Vrabel > > > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netfront: Add support > for > > > > IPv6 offloads > > > > > > > > On Tue, 2013-11-05 at 11:59 +0000, Paul Durrant wrote: > > > > > This patch adds support for IPv6 checksum offload and GSO when > those > > > > > fetaures are available in the backend. > > > > > > > > > > Signed-off-by: Paul Durrant <paul.durrant@xxxxxxxxxx> > > > > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > > > > Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> > > > > > Cc: David Vrabel <david.vrabel@xxxxxxxxxx> > > > > > --- > > > > > drivers/net/xen-netfront.c | 263 > > > > ++++++++++++++++++++++++++++++++++++++------ > > > > > 1 file changed, 231 insertions(+), 32 deletions(-) > > > > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > > > > > index dd1011e..27212d8 100644 > > > > > --- a/drivers/net/xen-netfront.c > > > > > +++ b/drivers/net/xen-netfront.c > > > > > @@ -616,7 +616,9 @@ static int xennet_start_xmit(struct sk_buff > *skb, > > > > struct net_device *dev) > > > > > tx->flags |= XEN_NETTXF_extra_info; > > > > > > > > > > gso->u.gso.size = skb_shinfo(skb)->gso_size; > > > > > - gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; > > > > > + gso->u.gso.type = (skb_shinfo(skb)->gso_type & > > > > SKB_GSO_TCPV6) ? > > > > > + XEN_NETIF_GSO_TYPE_TCPV6 : > > > > > + XEN_NETIF_GSO_TYPE_TCPV4; > > > > > gso->u.gso.pad = 0; > > > > > gso->u.gso.features = 0; > > > > > > > > > > @@ -808,15 +810,18 @@ static int xennet_set_skb_gso(struct > sk_buff > > > > *skb, > > > > > return -EINVAL; > > > > > } > > > > > > > > > > - /* Currently only TCPv4 S.O. is supported. */ > > > > > - if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) { > > > > > + if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4 && > > > > > + gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV6) { > > > > > if (net_ratelimit()) > > > > > pr_warn("Bad GSO type %d\n", gso- > >u.gso.type); > > > > > return -EINVAL; > > > > > } > > > > > > > > > > skb_shinfo(skb)->gso_size = gso->u.gso.size; > > > > > - skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4; > > > > > + skb_shinfo(skb)->gso_type = > > > > > + (gso->u.gso.type == XEN_NETIF_GSO_TYPE_TCPV4) > ? > > > > > + SKB_GSO_TCPV4 : > > > > > + SKB_GSO_TCPV6; > > > > > > > > > > /* Header must be checked, and gso_segs computed. */ > > > > > skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY; > > > > > @@ -856,62 +861,72 @@ static RING_IDX xennet_fill_frags(struct > > > > netfront_info *np, > > > > > return cons; > > > > > } > > > > > > > > > > -static int checksum_setup(struct net_device *dev, struct sk_buff > *skb) > > > > > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int > len) > > > > > { > > > > > - struct iphdr *iph; > > > > > - int err = -EPROTO; > > > > > - int recalculate_partial_csum = 0; > > > > > - > > > > > - /* > > > > > - * A GSO SKB must be CHECKSUM_PARTIAL. However some > buggy > > > > > - * peers can fail to set NETRXF_csum_blank when sending a > GSO > > > > > - * frame. In this case force the SKB to CHECKSUM_PARTIAL > and > > > > > - * recalculate the partial checksum. > > > > > - */ > > > > > - if (skb->ip_summed != CHECKSUM_PARTIAL && > skb_is_gso(skb)) { > > > > > - struct netfront_info *np = netdev_priv(dev); > > > > > - np->rx_gso_checksum_fixup++; > > > > > - skb->ip_summed = CHECKSUM_PARTIAL; > > > > > - recalculate_partial_csum = 1; > > > > > + if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) { > > > > > + /* If we need to pullup then pullup to the max, so we > > > > > + * won't need to do it again. > > > > > + */ > > > > > + int target = min_t(int, skb->len, MAX_TCP_HEADER); > > > > > + __pskb_pull_tail(skb, target - skb_headlen(skb)); > > > > > > > > Should the functions "len" argument not be involved somewhere in > this? > > > > What if it is bigger than MAX_TCP_HEADER for example? > > > > > > Hmm, good point. > > > > > > > Actually, now I look again, this is correct. The if statement checks > > len to see if a pullup is necessary, but if one is necessary then it > > always pulls up to the max. > > So it is guaranteed that len <= MAX_TCP_HEADER in all circumstances? > That might surprise some callers, since the name doesn't really indicate > this has anything to do with TCP. How about passing max in as an > argument, where all callers would currently pass MAX_TCP_HEADER? > Ok, that sounds reasonable. It's unclear to me whether there is any upper bound on the size of an IPv6 header, but google hits a recent thread in the IETF mail archives (starting at http://www.ietf.org/mail-archive/web/ipv6/current/msg18199.html ) suggesting that 256 might be a good number to choose but I agree it's not directly related to 256, so maybe an argument and a local #define IPV6_MAX_HEADER of 256 would be better. Paul > > I still need to add a failure return to indicate the pullup did not > > achieve the desired length though. > > > > Paul > > > > > > > > > > > } > > > > > +} > > > > > > > > > > - /* A non-CHECKSUM_PARTIAL SKB does not require setup. > */ > > > > > - if (skb->ip_summed != CHECKSUM_PARTIAL) > > > > > - return 0; > > > > > +static int checksum_setup_ip(struct sk_buff *skb, int > > > > recalculate_partial_csum) > > > > > +{ > > > > > > > > > + struct iphdr *iph = (void *)skb->data; > > > > > + unsigned int header_size; > > > > > + unsigned int off; > > > > > + int err = -EPROTO; > > > > > > > > > > - if (skb->protocol != htons(ETH_P_IP)) > > > > > - goto out; > > > > > + off = sizeof(struct iphdr); > > > > > > > > > > - iph = (void *)skb->data; > > > > > + header_size = skb->network_header + off + > MAX_IPOPTLEN; > > > > > + maybe_pull_tail(skb, header_size); > > > > > > > > You never reuse header_size other than setting it and immediately > > > > passing it to maybe_pull_tail, it could be inlined into the function > > > > call? > > > > > > > > > > Yes, it could but I thought this was tidier. It's also the same as the > > > code in > > > netback. > > > > > > > (I found it confusing because you reset it in the recalculate_partial > > > > rather than extending it which is what I expected) > > > > > > > > maybe_pull_tail doesn't guarantee to pull up to the length of the given > > > > parameter, e.g. if it is more then skb->len. Which I think means you > > > > need some other explicit skb len checks around the place, don't you? > > > > Otherwise you risk running past the end of the skb for a malformed > > > > frame. > > > > > > > > > > Yes, that's true. Probably best to have maybe_pull_tail() return a > > > failure if > it > > > doesn't managed to pullup the requested amount then. > > > > > > > > + > > > > > + off = iph->ihl * 4; > > > > > > > > > > switch (iph->protocol) { > > > > > case IPPROTO_TCP: > > > > > - if (!skb_partial_csum_set(skb, 4 * iph->ihl, > > > > > + if (!skb_partial_csum_set(skb, off, > > > > > offsetof(struct tcphdr, > check))) > > > > > goto out; > > > > > > > > > > if (recalculate_partial_csum) { > > > > > struct tcphdr *tcph = tcp_hdr(skb); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct tcphdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > tcph->check = ~csum_tcpudp_magic(iph- > >saddr, iph- > > > > >daddr, > > > > > - skb->len - > iph->ihl*4, > > > > > + skb->len - > off, > > > > > > IPPROTO_TCP, 0); > > > > > } > > > > > break; > > > > > case IPPROTO_UDP: > > > > > - if (!skb_partial_csum_set(skb, 4 * iph->ihl, > > > > > + if (!skb_partial_csum_set(skb, off, > > > > > offsetof(struct udphdr, > check))) > > > > > goto out; > > > > > > > > > > if (recalculate_partial_csum) { > > > > > struct udphdr *udph = udp_hdr(skb); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct udphdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > udph->check = ~csum_tcpudp_magic(iph- > >saddr, > > > > iph->daddr, > > > > > - skb->len - > iph->ihl*4, > > > > > + skb->len - > off, > > > > > > IPPROTO_UDP, 0); > > > > > } > > > > > break; > > > > > default: > > > > > if (net_ratelimit()) > > > > > - pr_err("Attempting to checksum a non- > TCP/UDP > > > > packet, dropping a protocol %d packet\n", > > > > > + pr_err("Attempting to checksum a non- > TCP/UDP > > > > packet, " > > > > > + "dropping a protocol %d packet\n", > > > > > > > > I think Linux's coding style explicitly relaxes the 80-column limit for > > > > string literals. > > > > > > > > > > Ok. Good to know. > > > > > > > > iph->protocol); > > > > > goto out; > > > > > } > > > > > @@ -922,6 +937,158 @@ out: > > > > > return err; > > > > > } > > > > > > > > > > +static int checksum_setup_ipv6(struct sk_buff *skb, int > > > > recalculate_partial_csum) > > > > > +{ > > > > > + int err = -EPROTO; > > > > > + struct ipv6hdr *ipv6h = (void *)skb->data; > > > > > + u8 nexthdr; > > > > > + unsigned int header_size; > > > > > + unsigned int off; > > > > > + bool fragment; > > > > > + bool done; > > > > > + > > > > > + done = false; > > > > > + > > > > > + off = sizeof(struct ipv6hdr); > > > > > + > > > > > + header_size = skb->network_header + off; > > > > > + maybe_pull_tail(skb, header_size); > > > > > > > > Same comment about skb lengths? > > > > > > > > > + > > > > > + nexthdr = ipv6h->nexthdr; > > > > > + > > > > > + while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h- > >payload_len)) > > > > && > > > > > + !done) { > > > > > + switch (nexthdr) { > > > > > + case IPPROTO_DSTOPTS: > > > > > + case IPPROTO_HOPOPTS: > > > > > + case IPPROTO_ROUTING: { > > > > > + struct ipv6_opt_hdr *hp = (void *)(skb->data > + off); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct ipv6_opt_hdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > + nexthdr = hp->nexthdr; > > > > > + off += ipv6_optlen(hp); > > > > > + break; > > > > > + } > > > > > + case IPPROTO_AH: { > > > > > + struct ip_auth_hdr *hp = (void *)(skb->data > + off); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct ip_auth_hdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > + nexthdr = hp->nexthdr; > > > > > + off += (hp->hdrlen+2)<<2; > > > > > > > > I wonder if the core header would welcome a ipv6_ahlen cf optlen? > > > > > > > > > > Yes, it would be nicer wouldn't it. > > > > > > > > + break; > > > > > + } > > > > > + case IPPROTO_FRAGMENT: > > > > > + fragment = true; > > > > > + /* fall through */ > > > > > + default: > > > > > + done = true; > > > > > + break; > > > > > + } > > > > > + } > > > > > + > > > > > + if (!done) { > > > > > + if (net_ratelimit()) > > > > > + pr_err("Failed to parse packet header\n"); > > > > > > > > Is it a requirement of the protocol that the list of IPPROTO_* we handle > > > > must be followed by some other header? Do we really care or is it more > > > > the upper stacks problem? Presumably it can get similarly malformed > > > > packets off the wire? > > > > > > I'd have to check but I'm pretty sure none of the protos we handle can be > > > terminating so if done is not set then the packet is malformed. > > > > > > > > > > > Also you can use net_err_ratelimited. > > > > > > Ok. > > > > > > > > > > > > + goto out; > > > > > + } > > > > > + > > > > > + if (fragment) { > > > > > + if (net_ratelimit()) > > > > > + pr_err("Packet is a fragment!\n"); > > > > > > > > Is that a bad thing? > > > > > > > > > > Well, you can't do TCP or UDP checksum offload on a fragment, can you? > > > > > > > > + goto out; > > > > > + } > > > > > + > > > > > + switch (nexthdr) { > > > > > + case IPPROTO_TCP: > > > > > + if (!skb_partial_csum_set(skb, off, > > > > > + offsetof(struct tcphdr, > check))) > > > > > + goto out; > > > > > + > > > > > + if (recalculate_partial_csum) { > > > > > + struct tcphdr *tcph = tcp_hdr(skb); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct tcphdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > + tcph->check = ~csum_ipv6_magic(&ipv6h- > >saddr, > > > > > + &ipv6h->daddr, > > > > > + skb->len - off, > > > > > + IPPROTO_TCP, 0); > > > > > + } > > > > > + break; > > > > > + case IPPROTO_UDP: > > > > > + if (!skb_partial_csum_set(skb, off, > > > > > + offsetof(struct udphdr, > check))) > > > > > + goto out; > > > > > + > > > > > + if (recalculate_partial_csum) { > > > > > + struct udphdr *udph = udp_hdr(skb); > > > > > + > > > > > + header_size = skb->network_header + > > > > > + off + > > > > > + sizeof(struct udphdr); > > > > > + maybe_pull_tail(skb, header_size); > > > > > + > > > > > + udph->check = ~csum_ipv6_magic(&ipv6h- > >saddr, > > > > > + &ipv6h->daddr, > > > > > + skb->len - off, > > > > > + IPPROTO_UDP, 0); > > > > > + } > > > > > + break; > > > > > + default: > > > > > + if (net_ratelimit()) > > > > > + pr_err("Attempting to checksum a non- > TCP/UDP > > > > packet, " > > > > > + "dropping a protocol %d packet\n", > > > > > + nexthdr); > > > > > + goto out; > > > > > + } > > > > > + > > > > > + err = 0; > > > > > + > > > > > +out: > > > > > + return err; > > > > > +} > > > > > + > > > > > +static int checksum_setup(struct net_device *dev, struct sk_buff > *skb) > > > > > +{ > > > > > + int err = -EPROTO; > > > > > + int recalculate_partial_csum = 0; > > > > > + > > > > > + /* > > > > > + * A GSO SKB must be CHECKSUM_PARTIAL. However some > buggy > > > > > + * peers can fail to set NETRXF_csum_blank when sending a > GSO > > > > > + * frame. In this case force the SKB to CHECKSUM_PARTIAL > and > > > > > + * recalculate the partial checksum. > > > > > + */ > > > > > + if (skb->ip_summed != CHECKSUM_PARTIAL && > skb_is_gso(skb)) { > > > > > > > > Is it worth making this ipv4 only. Assuming that anything which adds v6 > > > > support has already fixed this bug? That would save time in > > > > checksum_setup_ipv6. > > > > > > > > > > That's true. > > > > > > > Speaking of which can checksum_setup_* be shared with netfront > > > > somehow? > > > > Or maybe even put in common code? (Perhaps as a future cleanup) > > > > > > > > > > I think it really should be in common code, but I think that would be > better > > > handled as a separate patch to remove the duplication. > > > > > > Thanks for the comprehensive review! > > > > > > Paul > > > > > > > > + struct netfront_info *np = netdev_priv(dev); > > > > > + np->rx_gso_checksum_fixup++; > > > > > + skb->ip_summed = CHECKSUM_PARTIAL; > > > > > + recalculate_partial_csum = 1; > > > > > + } > > > > > + > > > > > + /* A non-CHECKSUM_PARTIAL SKB does not require setup. > */ > > > > > + if (skb->ip_summed != CHECKSUM_PARTIAL) > > > > > + return 0; > > > > > + > > > > > + if (skb->protocol == htons(ETH_P_IP)) > > > > > + err = checksum_setup_ip(skb, > recalculate_partial_csum); > > > > > + else if (skb->protocol == htons(ETH_P_IPV6)) > > > > > + err = checksum_setup_ipv6(skb, > recalculate_partial_csum); > > > > > + > > > > > + return err; > > > > > +} > > > > > + > > > > > static int handle_incoming_queue(struct net_device *dev, > > > > [...] > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@xxxxxxxxxxxxx > > > http://lists.xen.org/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.