[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] Re: openvswitch problem with XCP 1.0 RC3
On Tuesday 15 March 2011 20:23:21 Ben Pfaff wrote: > Ben Pfaff <blp@xxxxxxxxxxxxxxx> writes: > > Testing is taking longer than I expected, but we're still > > planning to release this soon. > > Here's the tested version of the kernel patch that we already > passed along to Citrix. I have already posted the corresponding > OVS patch to ovs-dev here: > http://openvswitch.org/pipermail/dev/2011-March/028884.html > > --8<--------------------------cut here-------------------------->8-- > > From: Ben Pfaff <blp@xxxxxxxxxx> > Date: Fri, 25 Feb 2011 16:47:48 -0800 > Subject: [PATCH] 8021q: Add kluge to support 802.1Q VLANs on devices with > buggy drivers. > > Some Linux network drivers support a feature called "VLAN > acceleration", associated with a data structure called a "vlan_group". > A vlan_group is, abstractly, a dictionary that maps from a VLAN ID (in > the range 0...4095) to a VLAN device, that is, a Linux network device > associated with a particular VLAN, e.g. "eth0.9" for VLAN 9 on eth0. > > Some drivers that support VLAN acceleration have bugs that fall > roughly into the following categories: > > * Some NICs strip VLAN tags on receive if no vlan_group is > registered, so that the tag is completely lost. > > * Some drivers size their receive buffers based on whether a > vlan_group is enabled, meaning that a maximum size packet with a > VLAN tag will not fit if a vlan_group is not configured. > > * On transmit some drivers expect that VLAN acceleration will be > used if it is available (which can only be done if a vlan_group > is configured). In these cases, the driver may fail to parse > the packet and correctly setup checksum offloading and/or TSO. > > The correct long term solution is to fix these driver bugs. FYI: VLAN trunks work for us with igb, e1000, e1000e with all versions of XCP and openvswitch-1.1.0pre2. The limitation is the incorrect receive buffer size which must be adjusted manually. The drivers are patched to avoid transmit checksum errors, see http://patchwork.ozlabs.org/patch/24670/, but note that the patch is buggy in line 19. > > This commit implements a workaround, which has been tested to work with the > "igb" driver, which does not otherwise work with VLANs on Open vSwitch: > > * Implement a new VLAN device ioctl that sets up a vlan_group in > which all 4096 VLANs point back to the physical device. > > * In the code that handles received VLAN-accelerated packets, save > the VLAN on which the packet was received to a new sk_buff > member before clearing the VLAN number. > > * Check the value of the new member in the OVS kernel module > packet receive code. > > This requires only minimal, low-risk changes to the core of the kernel > network stack, as well as minimal changes to the Open vSwitch kernel > module. > > Driver Details > -------------- > > The following drivers in linux-2.6.32.12-0.7.1.xs1.0.0.311.170586 are > the ones that use "vlan_group" and are relevant to Open vSwitch on > XenServer. We have not tested any version of most of these drivers, > so we do not know whether they have a VLAN problem that needs to be > fixed: > > 8139cp > acenic > amd8111e > atl1c > atl1e > atl1 > atl2 > be2net > bna > bnx2 > bnx2x > cnic > cxgb > cxgb3 > e1000 > e1000e > enic > forcedeth > gianfar > igb > igbvf > ixgb > ixgbe > jme > mlx4_en > ns83820 > qlge > r8169 > s2io > sky2 > starfire > tehuti > tg3 > typhoon > via-velocity > vxge > > Of the drivers that use "vlan_group" listed above, the following > drivers inspect vlan devices in ways that imply that the VLAN > workaround would not work for them. These drivers have not yet been > tested, so we do not know whether they need to use the VLAN workaround > anyhow: > > cnic > cxgb3 > > Of the drivers that use "vlan_group" listed above, the following > drivers appear to have bugs that would limit them to VLANs with VIDs > in the range 0...63 with the kernel patch fix. This is based on quick > code inspection, so it is not necessarily an exhaustive list. The > drivers on this list have not yet been tested, so we do not know > whether they need to use the VLAN workaround anyhow: > > via-velocity > > The following drivers use "vlan_group" but are irrelevant to Open > vSwitch on XenServer: > > bonding (not used with Open vSwitch) > ehea (cannot be built on x86; IBM Power architecture only) > stmmac (cannot be built on x86; SH4 architecture) > vmxnet3 (not shipped with XenServer; for use inside VMware VMs only) > > Signed-off-by: Ben Pfaff <blp@xxxxxxxxxx> > --- > include/linux/if_vlan.h | 5 ++- > net/8021q/vlan.c | 131 > ++++++++++++++++++++++++++++++++++++++++++++-- net/8021q/vlan_core.c | > 8 +++- > net/core/dev.c | 1 + > 4 files changed, 137 insertions(+), 8 deletions(-) > > diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h > index 46e2bf4..a0260d7 100644 > --- a/include/linux/if_vlan.h > +++ b/include/linux/if_vlan.h > @@ -84,6 +84,7 @@ struct vlan_group { > struct hlist_node hlist; /* linked list */ > struct net_device **vlan_devices_arrays[VLAN_GROUP_ARRAY_SPLIT_PARTS]; > struct rcu_head rcu; > + bool loopback; > }; > > static inline struct net_device *vlan_group_get_device(struct vlan_group > *vg, @@ -325,7 +326,9 @@ enum vlan_ioctl_cmds { > SET_VLAN_NAME_TYPE_CMD, > SET_VLAN_FLAG_CMD, > GET_VLAN_REALDEV_NAME_CMD, /* If this works, you know it's a VLAN > device, > btw */ - GET_VLAN_VID_CMD /* Get the VID of this VLAN (specified by name) > */ + GET_VLAN_VID_CMD, /* Get the VID of this VLAN (specified by name) */ > + ADD_ALL_VLANS_CMD, > + DEL_ALL_VLANS_CMD > }; > > enum vlan_flags { > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index bd8a167..f74d10e 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -101,7 +101,8 @@ static void vlan_group_free(struct vlan_group *grp) > kfree(grp); > } > > -static struct vlan_group *vlan_group_alloc(struct net_device *real_dev) > +static struct vlan_group *vlan_group_alloc(struct net_device *real_dev, > + bool loopback) > { > struct vlan_group *grp; > > @@ -110,6 +111,7 @@ static struct vlan_group *vlan_group_alloc(struct > net_device *real_dev) return NULL; > > grp->real_dev = real_dev; > + grp->loopback = loopback; > hlist_add_head_rcu(&grp->hlist, > &vlan_group_hash[vlan_grp_hashfn(real_dev->ifindex)]); > return grp; > @@ -242,7 +244,7 @@ int register_vlan_dev(struct net_device *dev) > > grp = __vlan_find_group(real_dev); > if (!grp) { > - ngrp = grp = vlan_group_alloc(real_dev); > + ngrp = grp = vlan_group_alloc(real_dev, false); > if (!grp) > return -ENOBUFS; > err = vlan_gvrp_init_applicant(real_dev); > @@ -363,6 +365,90 @@ out_free_newdev: > return err; > } > > +/* Attach a "loopback" VLAN device to every VLAN on dev. > + * Returns 0 if successful, a negative error code otherwise. > + */ > +static int register_all_vlans(struct net_device *dev) > +{ > + struct net_device **array; > + struct vlan_group *grp; > + unsigned int size; > + int err; > + int i; > + > + ASSERT_RTNL(); > + > + if (__vlan_find_group(dev)) > + return -EBUSY; > + > + err = vlan_check_real_dev(dev, 0); > + if (err < 0) > + return err; > + > + grp = vlan_group_alloc(dev, true); > + if (!grp) > + return -ENOBUFS; > + > + size = sizeof(struct net_device *) * VLAN_GROUP_ARRAY_PART_LEN; > + array = kmalloc(size, GFP_KERNEL); > + if (array == NULL) > + return -ENOBUFS; > + for (i = 0; i < VLAN_GROUP_ARRAY_PART_LEN; i++) > + array[i] = dev; > + for (i = 0; i < VLAN_GROUP_ARRAY_SPLIT_PARTS; i++) > + grp->vlan_devices_arrays[i] = array; > + > + /* Account for reference in struct vlan_dev_info */ > + dev_hold(dev); > + > + grp->nr_vlans = VLAN_GROUP_ARRAY_LEN; > + > + if (dev->features & NETIF_F_HW_VLAN_RX) > + dev->netdev_ops->ndo_vlan_rx_register(dev, grp); > + if (dev->features & NETIF_F_HW_VLAN_FILTER) > + for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) > + dev->netdev_ops->ndo_vlan_rx_add_vid(dev, i); > + > + return 0; > +} > + > +static int unregister_all_vlans(struct net_device *dev) > +{ > + struct vlan_group *grp; > + int i; > + > + ASSERT_RTNL(); > + > + grp = __vlan_find_group(dev); > + if (!grp || !grp->loopback) > + return -EINVAL; > + > + /* Take it out of our own structures, but be sure to interlock with > + * HW accelerating devices or SW vlan input packet processing. > + */ > + if (dev->features & NETIF_F_HW_VLAN_FILTER) > + for (i = 0; i < VLAN_GROUP_ARRAY_LEN; i++) > + dev->netdev_ops->ndo_vlan_rx_kill_vid(dev, i); > + > + for (i = 0; i < VLAN_GROUP_ARRAY_PART_LEN; i++) > + vlan_group_set_device(grp, i, NULL); > + grp->nr_vlans = 0; > + > + if (dev->features & NETIF_F_HW_VLAN_RX) > + dev->netdev_ops->ndo_vlan_rx_register(dev, NULL); > + > + hlist_del_rcu(&grp->hlist); > + > + synchronize_net(); > + kfree(grp->vlan_devices_arrays[0]); > + kfree(grp); > + > + /* Get rid of the vlan's reference to dev */ > + dev_put(dev); > + > + return 0; > +} > + > static void vlan_sync_address(struct net_device *dev, > struct net_device *vlandev) > { > @@ -444,6 +530,12 @@ static int vlan_device_event(struct notifier_block > *unused, unsigned long event, * as we run under the RTNL lock. > */ > > + if (grp->loopback) { > + if (event == NETDEV_UNREGISTER) > + unregister_all_vlans(dev); > + goto out; > + } > + > switch (event) { > case NETDEV_CHANGE: > /* Propagate real device state to vlan devices */ > @@ -581,13 +673,18 @@ static int vlan_ioctl_handler(struct net *net, void > __user *arg) case DEL_VLAN_CMD: > case GET_VLAN_REALDEV_NAME_CMD: > case GET_VLAN_VID_CMD: > + case ADD_ALL_VLANS_CMD: > + case DEL_ALL_VLANS_CMD: > err = -ENODEV; > dev = __dev_get_by_name(net, args.device1); > if (!dev) > goto out; > > err = -EINVAL; > - if (args.cmd != ADD_VLAN_CMD && !is_vlan_dev(dev)) > + if (args.cmd != ADD_VLAN_CMD && > + args.cmd != ADD_ALL_VLANS_CMD && > + args.cmd != DEL_ALL_VLANS_CMD && > + !is_vlan_dev(dev)) > goto out; > } > > @@ -667,6 +764,20 @@ static int vlan_ioctl_handler(struct net *net, void > __user *arg) err = -EFAULT; > break; > > + case ADD_ALL_VLANS_CMD: > + err = -EPERM; > + if (!capable(CAP_NET_ADMIN)) > + break; > + err = register_all_vlans(dev); > + break; > + > + case DEL_ALL_VLANS_CMD: > + err = -EPERM; > + if (!capable(CAP_NET_ADMIN)) > + break; > + err = unregister_all_vlans(dev); > + break; > + > default: > err = -EOPNOTSUPP; > break; > @@ -769,9 +880,17 @@ static void __exit vlan_cleanup_module(void) > > dev_remove_pack(&vlan_packet_type); > > - /* This table must be empty if there are no module references left. */ > - for (i = 0; i < VLAN_GRP_HASH_SIZE; i++) > - BUG_ON(!hlist_empty(&vlan_group_hash[i])); > + for (i = 0; i < VLAN_GRP_HASH_SIZE; i++) { > + struct hlist_head *hlist = &vlan_group_hash[i]; > + while (!hlist_empty(hlist)) { > + struct vlan_group *grp; > + int err; > + > + grp = hlist_entry(hlist->first, struct vlan_group, > hlist); > + err = unregister_all_vlans(grp->real_dev); > + BUG_ON(err); > + } > + } > > unregister_pernet_gen_device(vlan_net_id, &vlan_net_ops); > rcu_barrier(); /* Wait for completion of call_rcu()'s */ > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c > index 7f7de1a..8327334 100644 > --- a/net/8021q/vlan_core.c > +++ b/net/8021q/vlan_core.c > @@ -33,6 +33,11 @@ int vlan_hwaccel_do_receive(struct sk_buff *skb) > struct net_device *dev = skb->dev; > struct net_device_stats *stats; > > + if (!is_vlan_dev(dev)) { > + skb->pkt_type = PACKET_OTHERHOST; > + return 0; > + } > + > skb->dev = vlan_dev_info(dev)->real_dev; > netif_nit_deliver(skb); > > @@ -90,7 +95,8 @@ static int vlan_gro_common(struct napi_struct *napi, > struct vlan_group *grp, > > for (p = napi->gro_list; p; p = p->next) { > NAPI_GRO_CB(p)->same_flow = > - p->dev == skb->dev && !compare_ether_header( > + p->dev == skb->dev && p->vlan_tci == skb->vlan_tci && > + !compare_ether_header( > skb_mac_header(p), skb_gro_mac_header(skb)); > NAPI_GRO_CB(p)->flush = 0; > } > diff --git a/net/core/dev.c b/net/core/dev.c > index 0b5bf64..4c99b87 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2742,6 +2742,7 @@ void napi_reuse_skb(struct napi_struct *napi, struct > sk_buff *skb) { > __skb_pull(skb, skb_headlen(skb)); > skb_reserve(skb, NET_IP_ALIGN - skb_headroom(skb)); > + skb->vlan_tci = 0; > > napi->skb = skb; > } _______________________________________________ xen-api mailing list xen-api@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |