[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH][RFC] tools: Add basic VEPA support



This patch adds basic Virtual Ethernet Port Aggregator (VEPA)
capabilities to the Xen tools code.

A Virtual Ethernet Port Aggregator (VEPA) is a capability within
a physical end station that collaborates with an adjacent, external
bridge to provide distributed bridging support between multiple
virtual end stations and external networks. The VEPA collaborates
by forwarding all station-originated frames to the adjacent bridge
for frame processing and frame relay (including so-called 'hairpin'
forwarding) and by steering and replicating frames received from
the VEPA uplink to the appropriate destinations. A VEPA may be
implemented in software or in conjunction with embedded hardware.

In particular, the patch extends the Xen tools networking scripts
to configure a VEPA instead of the standard Ethernet bridge to 
attach guest domains to. By default, it creates one or more VEPA(s)
on the system and configures the physical network interface(s) as
the VEPA uplink port(s).

This patch relies on VEPA capabilities of the Xen Dom0 kernel
which are provided by an additional patch 'net/bridge: Add basic 
VEPA support to Xen Dom0'.

Configuration of VEPA capabilities through Linux userspace bridge
utilities is provided by an additional patch 'bridge-utils: add
basic VEPA support'.

You can find additional information on VEPA here:
http://tech.groups.yahoo.com/group/evb/
http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdf

Signed-off-by: Paul Congdon <paul.congdon@xxxxxx>
Signed-off-by: Anna Fischer <anna.fischer@xxxxxx>

---

This patch follows some of the suggested changes introduced by 
Zhigang Wang's recently published patch 'Change default network 
schema in network-bridge'.

---

diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -10,6 +10,7 @@
 XEN_SCRIPTS = network-bridge vif-bridge
 XEN_SCRIPTS += network-route vif-route
 XEN_SCRIPTS += network-nat vif-nat
+XEN_SCRIPTS += network-vepa network-vepas
 XEN_SCRIPTS += block
 XEN_SCRIPTS += block-enbd block-nbd
 XEN_SCRIPTS += vtpm vtpm-delete
diff --git a/tools/hotplug/Linux/network-vepa b/tools/hotplug/Linux/network-vepa
new file mode 100644
--- /dev/null
+++ b/tools/hotplug/Linux/network-vepa
@@ -0,0 +1,299 @@
+#!/bin/bash
+#============================================================================
+# Default Xen network start/stop script.
+# Xend calls a network script when it starts.
+# The script name to use is defined in /etc/xen/xend-config.sxp
+# in the network-script field.
+#
+# This script creates a bridge in VEPA mode (default vepa0), adds a device
+# (defaults to the device on the default gateway route) to it and sets that
+# device as the VEPA uplink port, then copies the IP addresses from the
+# device to the VEPA and adjusts the routes accordingly.
+#
+# If all goes well, this should ensure that networking stays up.
+# However, some configurations are upset by this, especially
+# NFS roots. If the bridged setup does not meet your needs,
+# configure a different script, for example using routing instead.
+#
+# Usage:
+#
+# network-vepa (start|stop|status) {VAR=VAL}*
+#
+# Vars:
+#
+# bridge     The bridge to use in VEPA mode (default vepa0).
+# netdev     The interface to add to the VEPA (default gateway device or eth0).
+#
+# start:
+# Creates the bridge in VEPA mode
+# Copies the IP and MAC addresses from netdev to VEPA
+# Enslaves netdev to VEPA
+#
+# stop:
+# Removes netdev from the VEPA
+# Transfers addresses, routes from VEPA to netdev
+# Deletes VEPA
+#
+# status:
+# Print addresses, interfaces, routes
+#
+#============================================================================
+
+
+dir=$(dirname "$0")
+. "$dir/xen-script-common.sh"
+. "$dir/xen-network-common.sh"
+
+findCommand "$@"
+evalVariables "$@"
+
+bridge=${bridge:-vepa0}
+
+is_network_root () {
+    local rootfs=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $3; }}' 
/etc/mtab)
+    local rootopts=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $4; }}' 
/etc/mtab)
+
+    [[ "$rootfs" =~ "^nfs" ]] || [[ "$rootopts" =~ "_netdev" ]] && 
has_nfsroot=1 || has_nfsroot=0
+    if [ $has_nfsroot -eq 1 ]; then
+        local bparms=$(cat /proc/cmdline)
+        for p in $bparms; do
+            local ipaddr=$(echo $p | awk /nfsroot=/'{ print 
substr($1,9,index($1,":")-9) }')
+            if [ "$ipaddr" != "" ]; then
+                local nfsdev=$(ip route get $ipaddr | awk /$ipaddr/'{ print $3 
}')
+                [[ "$nfsdev" == "$netdev" ]] && return 0 || return 1
+            fi
+        done
+    fi
+    return 1
+}
+
+find_alt_device () {
+    local interf=$1
+    local prefix=${interf%[[:digit:]]}
+    local ifs=$(ip link show | grep " $prefix" |\
+                gawk '{ printf ("%s",substr($2,1,length($2)-1)) }' |\
+                sed s/$interf//)
+    echo "$ifs"
+}
+
+get_ip_info() {
+    addr_pfx=`ip addr show dev $1 | egrep '^ *inet' | sed -e 's/ *inet //' -e 
's/ .*//'`
+    gateway=`ip route show dev $1 | fgrep default | sed 's/default via //'`
+}
+
+do_ifup() {
+    if [ $1 != "${netdev}" ] || ! ifup $1 ; then
+        if [ -n "$addr_pfx" ] ; then
+            # use the info from get_ip_info()
+            ip addr flush $1
+            ip addr add ${addr_pfx} dev $1
+        fi
+        ip link set dev $1 up
+        [ -n "$gateway" ] && ip route add default via ${gateway}
+    fi
+}
+
+# Usage: transfer_addrs src dst
+# Copy all IP addresses (including aliases) from device $src to device $dst.
+transfer_addrs () {
+    local src=$1
+    local dst=$2
+    # Don't bother if $dst already has IP addresses.
+    if ip addr show dev ${dst} | egrep -q '^ *inet ' ; then
+        return
+    fi
+    # Address lines start with 'inet' and have the device in them.
+    # Replace 'inet' with 'ip addr add' and change the device name $src
+    # to 'dev $src'.
+    ip addr show dev ${src} | egrep '^ *inet ' | sed -e "
+s/inet/ip addr add/
+s@\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+/[0-9]\+\)@\1@
+s/${src}/dev ${dst} label ${dst}/
+s/secondary//
+" | sh -e
+    # Remove automatic routes on destination device
+    ip route list | sed -ne "
+/dev ${dst}\( \|$\)/ {
+  s/^/ip route del /
+  p
+}" | sh -e
+}
+
+# Usage: transfer_routes src dst
+# Get all IP routes to device $src, delete them, and
+# add the same routes to device $dst.
+# The original routes have to be deleted, otherwise adding them
+# for $dst fails (duplicate routes).
+transfer_routes () {
+    local src=$1
+    local dst=$2
+    # List all routes and grep the ones with $src in.
+    # Stick 'ip route del' on the front to delete.
+    # Change $src to $dst and use 'ip route add' to add.
+    ip route list | sed -ne "
+/dev ${src}\( \|$\)/ {
+  h
+  s/^/ip route del /
+  P
+  g
+  s/${src}/${dst}/
+  s/^/ip route add /
+  P
+  d
+}" | sh -e
+}
+
+
+##
+# link_exists interface
+#
+# Returns 0 if the interface named exists (whether up or down), 1 otherwise.
+#
+link_exists()
+{
+    if ip link show "$1" >/dev/null 2>/dev/null
+    then
+        return 0
+    else
+        return 1
+    fi
+}
+
+op_status () {
+    netdev=${netdev:-$(brctl show | awk /$bridge/'{print $4}')}
+    echo '============================================================'
+    if [ -n "${netdev}" ]; then
+        ip addr show ${netdev}
+    fi
+    ip addr show ${bridge}
+    echo ' '
+    brctl showstp ${bridge}
+    echo ' '
+    ip route list
+    echo ' '
+    route -n
+    echo '============================================================'
+}
+
+op_start () {
+    netdev=${netdev:-$(ip route list 0.0.0.0/0  | \
+                       sed 's/.*dev \([a-z]\+[0-9]\+\).*$/\1/')}
+    if is_network_root ; then
+        altdevs=$(find_alt_device $netdev)
+        for netdev in $altdevs; do break; done
+        if [ -z "$netdev" ]; then
+            [ -x /usr/bin/logger ] && /usr/bin/logger "network-bridge: 
bridging not supported on network root; not starting"
+            exit
+        fi
+    fi
+    netdev=${netdev:-eth0}
+
+    if [ "${bridge}" = "null" ] ; then
+       return
+    fi
+
+    if link_exists "$bridge"; then
+        # The device is already up.
+        return
+    fi
+
+    create_bridge ${bridge}
+
+    preiftransfer ${netdev}
+    transfer_addrs ${netdev} ${bridge}
+    # Remember slaves for bonding interface.
+    if [ -e /sys/class/net/${netdev}/bonding/slaves ]; then
+       slaves=`cat /sys/class/net/${netdev}/bonding/slaves`
+    fi
+    # Remember the IP details for do_ifup.
+    get_ip_info ${netdev}
+    if ! ifdown ${netdev}; then
+       ip link set ${netdev} down
+       ip addr flush ${netdev}
+    fi
+
+    setup_bridge_port ${netdev}
+
+    # Restore slaves
+    if [ -n "${slaves}" ]; then
+       ip link set ${netdev} up
+       ifenslave ${netdev} ${slaves}
+    fi
+    add_to_bridge2 ${bridge} ${netdev}
+
+    # VEPA configuration
+    config_bridge_vepa ${bridge} on
+    set_vepa_uplink ${bridge} ${netdev}
+
+    do_ifup ${bridge}
+}
+
+op_stop () {
+    if [ "${bridge}" = "null" ]; then
+       return
+    fi
+    if ! link_exists "$bridge"; then
+       return
+    fi
+
+    netdev=${netdev:-$(brctl show | awk /$bridge/'{print $4}')}
+    if [ -z "${netdev}" ]; then
+       return
+    fi
+
+    transfer_addrs ${bridge} ${netdev}
+    if ! ifdown ${bridge}; then
+       get_ip_info ${bridge}
+    fi
+    ip link set ${netdev} down
+    ip addr flush ${bridge}
+
+    brctl delif ${bridge} ${netdev}
+    ip link set ${bridge} down
+
+    do_ifup ${netdev}
+
+    brctl delbr ${bridge}
+}
+
+# adds $dev to $bridge but waits for $dev to be in running state first
+add_to_bridge2() {
+    local bridge=$1
+    local dev=$2
+    local maxtries=10
+
+    echo -n "Waiting for ${dev} to negotiate link."
+    ip link set ${dev} up
+    for i in `seq ${maxtries}` ; do
+       if ifconfig ${dev} | grep -q RUNNING ; then
+           break
+       else
+           echo -n '.'
+           sleep 1
+       fi
+    done
+
+    if [ ${i} -eq ${maxtries} ] ; then echo -n '(link isnt in running state)' 
; fi
+    echo
+
+    add_to_bridge ${bridge} ${dev}
+}
+
+case "$command" in
+    start)
+       op_start
+       ;;
+
+    stop)
+       op_stop
+       ;;
+
+    status)
+       op_status
+       ;;
+
+    *)
+       echo "Unknown command: $command" >&2
+       echo 'Valid commands are: start, stop, status' >&2
+       exit 1
+esac
diff --git a/tools/hotplug/Linux/network-vepas 
b/tools/hotplug/Linux/network-vepas
new file mode 100644
--- /dev/null
+++ b/tools/hotplug/Linux/network-vepas
@@ -0,0 +1,19 @@
+#!/bin/bash
+#
+# Runs network-vepa against each ethernet card.
+#
+
+dir=$(dirname "$0")
+
+run_all_ethernets()
+{
+    for f in /sys/class/net/*; do
+        netdev=$(basename $f)
+        if [[ $netdev =~ "^eth[0-9]+$" ]]; then
+            devnum=${netdev:3}
+            $dir/network-vepa "$@" "netdev=${netdev}" "bridge=vepa${devnum}"
+        fi
+    done
+}
+
+run_all_ethernets "$@"
diff --git a/tools/hotplug/Linux/xen-network-common.sh 
b/tools/hotplug/Linux/xen-network-common.sh
--- a/tools/hotplug/Linux/xen-network-common.sh
+++ b/tools/hotplug/Linux/xen-network-common.sh
@@ -102,6 +102,24 @@
     fi
 }
 
+# Usage: config_bridge_vepa bridge [on|off]
+config_bridge_vepa() {
+    local bridge=$1
+    if [ -e "/sys/class/net/${bridge}/bridge" ]; then
+       brctl vepa  ${bridge} $2
+    fi
+}
+
+# Usage: set_vepa_uplink bridge port
+set_vepa_uplink() {
+    local bridge=$1
+    local port=$2
+
+    if [ -e "/sys/class/net/${bridge}/bridge" ]; then
+       brctl vepauplink ${bridge} ${port}
+    fi
+}
+
 # Usage: add_to_bridge bridge dev
 add_to_bridge () {
     local bridge=$1

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.