[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 4/5] remus: implement remus network buffering for nic devices



On Wed, Jun 4, 2014 at 8:34 PM, Yang Hongyang <yanghy@xxxxxxxxxxxxxx> wrote:
1.Add two members in libxl_domain_remus_info:
  netbuf: whether netbuf is enabled
  netbufscript: the path of the script which will be run to setup
   Âand tear down the guest's interface.
2.introduces remus-netbuf-setup hotplug script responsible for
 setting up and tearing down the necessary infrastructure required for
 network output buffering in Remus. ÂThis script is intended to be invoked
 by libxl for each guest interface, when starting or stopping Remus.

 Apart from returning success/failure indication via the usual hotplug
 entries in xenstore, this script also writes to xenstore, the name of
 the IFB device to be used to control the vif's network output.

 The script relies on libnl3 command line utilities to perform various
 setup/teardown functions. The script is confined to Linux platforms only
 since NetBSD does not seem to have libnl3.

 The following steps are taken during init:
  a) establish a dedicated remus context containing libnl related
   Âstate (netlink sockets, qdisc caches, etc.,)

 The following steps are taken for each vif during setup:
  a) call the hotplug script to setup its network buffer

  b) Obtain handles to plug qdiscs installed on the IFB devices
   Âchosen by the hotplug scripts.

 And during teardown, the netlink resources are released, followed by
 invocation of hotplug scripts to remove the ifb devices.
3.implement the remus device interface. setup, teardown, etc.

Signed-off-by: Shriram Rajagopalan <rshriram@xxxxxxxxx>
Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxx>
Reviewed-by: Wen Congyang <wency@xxxxxxxxxxxxxx>
---
Âdocs/misc/xenstore-paths.markdown   Â|  4 +
Âtools/hotplug/Linux/Makefile      |  1 +
Âtools/hotplug/Linux/remus-netbuf-setup | 183 ++++++++++++
Âtools/libxl/libxl.c          Â| Â18 ++
Âtools/libxl/libxl.h          Â| Â13 +
Âtools/libxl/libxl_internal.h      |  3 +
Âtools/libxl/libxl_netbuffer.c     Â| 519 +++++++++++++++++++++++++++++++++
Âtools/libxl/libxl_nonetbuffer.c    Â| Â67 +++++
Âtools/libxl/libxl_remus_device.c    | Â22 +-
Âtools/libxl/libxl_types.idl      Â|  2 +
Â10 files changed, 831 insertions(+), 1 deletion(-)
Âcreate mode 100644 tools/hotplug/Linux/remus-netbuf-setup

diff --git a/docs/misc/xenstore-paths.markdown b/docs/misc/xenstore-paths.markdown
index 70ab7f4..039eaea 100644
--- a/docs/misc/xenstore-paths.markdown
+++ b/docs/misc/xenstore-paths.markdown
@@ -385,6 +385,10 @@ The guest's virtual time offset from UTC in seconds.

ÂThe device model version for a domain.

+#### /libxl/$DOMID/remus/netbuf/$DEVID/ifb = STRING [n,INTERNAL]
+
+ifb device used by Remus to buffer network output from the associated vif.
+
Â[BLKIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,blkif.h.html
Â[FBIF]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,io,fbif.h.html
Â[HVMPARAMS]: http://xenbits.xen.org/docs/unstable/hypercall/include,public,hvm,params.h.html
diff --git a/tools/hotplug/Linux/Makefile b/tools/hotplug/Linux/Makefile
index 4874ec5..13e1f5f 100644
--- a/tools/hotplug/Linux/Makefile
+++ b/tools/hotplug/Linux/Makefile
@@ -15,6 +15,7 @@ XEN_SCRIPTS += vif-nat
ÂXEN_SCRIPTS += vif-openvswitch
ÂXEN_SCRIPTS += vif2
ÂXEN_SCRIPTS += vif-setup
+XEN_SCRIPTS-$(CONFIG_REMUS_NETBUF) += remus-netbuf-setup
ÂXEN_SCRIPTS += block
ÂXEN_SCRIPTS += block-enbd block-nbd
ÂXEN_SCRIPTS-$(CONFIG_BLKTAP1) += blktap
diff --git a/tools/hotplug/Linux/remus-netbuf-setup b/tools/hotplug/Linux/remus-netbuf-setup
new file mode 100644
index 0000000..aed2583
--- /dev/null
+++ b/tools/hotplug/Linux/remus-netbuf-setup
@@ -0,0 +1,183 @@
+#!/bin/bash
+#============================================================================
+# ${XEN_SCRIPT_DIR}/remus-netbuf-setup
+#
+# Script for attaching a network buffer to the specified vif (in any mode).
+# The hotplugging system will call this script when starting remus via libxl
+# API, libxl_domain_remus_start.
+#
+# Usage:
+# remus-netbuf-setup (setup|teardown)
+#
+# Environment vars:
+# vifname   vif interface name (required).
+# XENBUS_PATH path in Xenstore, where the IFB device details will be stored
+# Â Â Â Â Â Â Â Â Â Â Âor read from (required).
+# Â Â Â Â Â Â (libxl passes /libxl/<domid>/remus/netbuf/<devid>)
+# IFB Â Â Â Â ifb interface to be cleaned up (required). [for teardown op only]
+
+# Written to the store: (setup operation)
+# XENBUS_PATH/ifb=<ifbdevName> the IFB device serving
+# Âas the intermediate buffer through which the interface's network output
+# Âcan be controlled.
+#
+# To install a network buffer on a guest vif (vif1.0) using ifb (ifb0)
+# we need to do the following
+#
+# Âip link set dev ifb0 up
+# Âtc qdisc add dev vif1.0 ingress
+# Âtc filter add dev vif1.0 parent ffff: proto ip \
+# Â Âprio 10 u32 match u32 0 0 action mirred egress redirect dev ifb0
+# Ânl-qdisc-add --dev=ifb0 --parent root plug
+# Ânl-qdisc-add --dev=ifb0 --parent root --update plug --limit=10000000
+# Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â(10MB limit on buffer)
+#
+# So order of operations when installing a network buffer on vif1.0
+# 1. find a free ifb and bring up the device
+# 2. redirect traffic from vif1.0 to ifb:
+# Â 2.1 add ingress qdisc to vif1.0 (to capture outgoing packets from guest)
+# Â 2.2 use tc filter command with actions mirred egress + redirect
+# 3. install plug_qdisc on ifb device, with which we can buffer/release
+# Â Âguest's network output from vif1.0
+#
+#
+
+#============================================================================
+
+# Unlike other vif scripts, vif-common is not needed here as it executes vif
+#specific setup code such as renaming.
+dir=$(dirname "$0")
+. "$dir/xen-hotplug-common.sh"
+
+findCommand "$@"
+
+if [ "$command" != "setup" -a Â"$command" != "teardown" ]
+then
+ Âecho "Invalid command: $command"
+ Âlog err "Invalid command: $command"
+ Âexit 1
+fi
+
+evalVariables "$@"
+
+: ${vifname:?}
+: ${XENBUS_PATH:?}
+
+check_libnl_tools() {
+ Â Âif ! command -v nl-qdisc-list > /dev/null 2>&1; then
+ Â Â Â Âfatal "Unable to find nl-qdisc-list tool"
+ Â Âfi
+ Â Âif ! command -v nl-qdisc-add > /dev/null 2>&1; then
+ Â Â Â Âfatal "Unable to find nl-qdisc-add tool"
+ Â Âfi
+ Â Âif ! command -v nl-qdisc-delete > /dev/null 2>&1; then
+ Â Â Â Âfatal "Unable to find nl-qdisc-delete tool"
+ Â Âfi
+}
+
+# We only check for modules. We don't load them.
+# User/Admin is supposed to load ifb during boot time,
+# ensuring that there are enough free ifbs in the system.
+# Other modules will be loaded automatically by tc commands.
+check_modules() {
+ Â Âfor m in ifb sch_plug sch_ingress act_mirred cls_u32
+ Â Âdo
+ Â Â Â Âif ! modinfo $m > /dev/null 2>&1; then
+ Â Â Â Â Â Âfatal "Unable to find $m kernel module"
+ Â Â Â Âfi
+ Â Âdone
+}
+
+setup_ifb() {
+
+ Â Âfor ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1`
+ Â Âdo
+ Â Â Â Âlocal installed=`nl-qdisc-list -d $ifb`
+ Â Â Â Â[ -n "$installed" ] && continue
+ Â Â Â ÂIFB="$ifb"
+ Â Â Â Âbreak
+ Â Âdone
+
+ Â Âif [ -z "$IFB" ]
+ Â Âthen
+ Â Â Â Âfatal "Unable to find a free IFB device for $vifname"
+ Â Âfi
+
+ Â Âdo_or_die ip link set dev "$IFB" up
+}
+
+redirect_vif_traffic() {
+ Â Âlocal vif=$1
+ Â Âlocal ifb=$2
+
+ Â Âdo_or_die tc qdisc add dev "$vif" ingress
+
+ Â Âtc filter add dev "$vif" parent ffff: proto ip prio 10 \
+ Â Â Â Âu32 match u32 0 0 action mirred egress redirect dev "$ifb" >/dev/null 2>&1
+
+ Â Âif [ $? -ne 0 ]
+ Â Âthen
+ Â Â Â Âdo_without_error tc qdisc del dev "$vif" ingress
+ Â Â Â Âfatal "Failed to redirect traffic from $vif to $ifb"
+ Â Âfi
+}
+
+add_plug_qdisc() {
+ Â Âlocal vif=$1
+ Â Âlocal ifb=$2
+
+ Â Ânl-qdisc-add --dev="$ifb" --parent root plug >/dev/null 2>&1
+ Â Âif [ $? -ne 0 ]
+ Â Âthen
+ Â Â Â Âdo_without_error tc qdisc del dev "$vif" ingress
+ Â Â Â Âfatal "Failed to add plug qdisc to $ifb"
+ Â Âfi
+
+ Â Â#set ifb buffering limit in bytes. Its okay if this command fails
+ Â Ânl-qdisc-add --dev="$ifb" --parent root \
+ Â Â Â Â--update plug --limit=10000000 >/dev/null 2>&1 || true
+}
+
+teardown_netbuf() {
+ Â Âlocal vif=$1
+ Â Âlocal ifb=$2
+
+ Â Âif [ "$ifb" ]; then
+ Â Â Â Âdo_without_error ip link set dev "$ifb" down
+ Â Â Â Âdo_without_error nl-qdisc-delete --dev="$ifb" --parent root plug >/dev/null 2>&1
+ Â Â Â Âxenstore-rm -t "$XENBUS_PATH/ifb" 2>/dev/null || true
+ Â Âfi
+ Â Âdo_without_error tc qdisc del dev "$vif" ingress
+ Â Âxenstore-rm -t "$XENBUS_PATH/hotplug-status" 2>/dev/null || true
+ Â Âxenstore-rm -t "$XENBUS_PATH/hotplug-error" 2>/dev/null || true
+}
+
+xs_write_failed() {
+ Â Âlocal vif=$1
+ Â Âlocal ifb=$2
+ Â Âteardown_netbuf "$vifname" "$IFB"
+ Â Âfatal "failed to write ifb name to xenstore"
+}
+
+case "$command" in
+ Â Âsetup)
+ Â Â Â Âcheck_libnl_tools
+ Â Â Â Âcheck_modules
+
+ Â Â Â Âclaim_lock "pickifb"
+ Â Â Â Âsetup_ifb
+ Â Â Â Âredirect_vif_traffic "$vifname" "$IFB"
+ Â Â Â Âadd_plug_qdisc "$vifname" "$IFB"
+ Â Â Â Ârelease_lock "pickifb"
+
+ Â Â Â Â#not using xenstore_write that automatically exits on error
+ Â Â Â Â#because we need to cleanup
+ Â Â Â Â_xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB"
+ Â Â Â Âsuccess
+ Â Â Â Â;;
+ Â Âteardown)
+ Â Â Â Âteardown_netbuf "$vifname" "$IFB"
+ Â Â Â Â;;
+esac
+
+log debug "Successful remus-netbuf-setup $command for $vifname, ifb $IFB."
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0cdf348..2701ebe 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -764,6 +764,24 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,

  Â/* Convenience aliases */
  Âlibxl__remus_state *const rs = &dss->rs;
+
+ Â Â/* Setup network buffering */
+ Â Âif (info->netbuf) {
+ Â Â Â Âif (!libxl__netbuffer_enabled(gc)) {
+ Â Â Â Â Â ÂLOG(ERROR, "Remus: No support for network buffering");
+ Â Â Â Â Â Âgoto out;
+ Â Â Â Â}
+
+ Â Â Â Âif (info->netbufscript) {
+ Â Â Â Â Â Ârs->netbufscript =
+ Â Â Â Â Â Â Â Âlibxl__strdup(gc, info->netbufscript);
+ Â Â Â Â} else {
+ Â Â Â Â Â Ârs->netbufscript =
+ Â Â Â Â Â Â Â ÂGCSPRINTF("%s/remus-netbuf-setup",
+ Â Â Â Â Â Â Â Âlibxl__xen_script_dir_path());
+ Â Â Â Â}
+ Â Â}
+
  Ârs->ao = ao;
  Ârs->domid = domid;
  Ârs->saved_rc = 0;
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 80947c3..db30a97 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -437,6 +437,19 @@
Â#define LIBXL_HAVE_DRIVER_DOMAIN_CREATION 1

Â/*
+ * LIBXL_HAVE_REMUS_NETBUF 1
+ *
+ * If this is defined, then the libxl_domain_remus_info structure will
+ * have a boolean field (netbuf) and a string field (netbufscript).
+ *
+ * netbuf, if true, indicates that network buffering should be enabled.
+ *
+ * netbufscript, if set, indicates the path to the hotplug script to
+ * setup or teardown network buffers.
+ */
+#define LIBXL_HAVE_REMUS_NETBUF 1
+
+/*
 * LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
 *
 * If this is defined:
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 20601b2..f221f97 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2517,6 +2517,7 @@ struct libxl__remus_device_state {
  Â/* devices that have been setuped */
  Âlibxl__remus_device **dev;

+ Â Âlibxl_device_nic *nics;
  Âint num_nics;
  Âint num_disks;

@@ -2555,6 +2556,8 @@ struct libxl__remus_state {
  Âlibxl__ao *ao;
  Âuint32_t domid;
  Âlibxl__remus_callback *callback;
+ Â Â/* Script to setup/teardown network buffers */
+ Â Âconst char *netbufscript;

  Â/* private */
  Âint saved_rc;
diff --git a/tools/libxl/libxl_netbuffer.c b/tools/libxl/libxl_netbuffer.c
index 8e23d75..8729a3f 100644
--- a/tools/libxl/libxl_netbuffer.c
+++ b/tools/libxl/libxl_netbuffer.c
@@ -17,11 +17,530 @@

Â#include "libxl_internal.h"

+#include <netlink/cache.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/route/link.h>
+#include <netlink/route/route.h>
+#include <netlink/route/qdisc.h>
+#include <netlink/route/qdisc/plug.h>
+
+typedef struct libxl__remus_netbuf_state {
+ Â Âlibxl__ao *ao;
+ Â Âuint32_t domid;
+ Â Âconst char *netbufscript;
+
+ Â Âstruct nl_sock *nlsock;
+ Â Âstruct nl_cache *qdisc_cache;
+} libxl__remus_netbuf_state;
+
+typedef struct libxl__remus_device_nic {
+ Â Âconst char *vif;
+ Â Âconst char *ifb;
+ Â Âstruct rtnl_qdisc *qdisc;
+} libxl__remus_device_nic;
+
Âint libxl__netbuffer_enabled(libxl__gc *gc)
Â{
  Âreturn 1;
Â}

+/* If the device has a vifname, then use that instead of
+ * the vifX.Y format.
+ */
+static const char *get_vifname(libxl__remus_device *dev,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const libxl_device_nic *nic)
+{
+ Â Âlibxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+ Â Âconst char *vifname = NULL;
+ Â Âconst char *path;
+ Â Âint rc;
+
+ Â ÂSTATE_AO_GC(netbuf_state->ao);
+
+ Â Â/* Convenience aliases */
+ Â Âconst uint32_t domid = netbuf_state->domid;
+
+ Â Âpath = libxl__sprintf(gc, "%s/backend/vif/%d/%d/vifname",
+ Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__xs_get_dompath(gc, 0), domid, nic->devid);
+ Â Ârc = libxl__xs_read_checked(gc, XBT_NULL, path, &vifname);
+ Â Âif (!rc && !vifname) {
+ Â Â Â Â/* use the default name */
+ Â Â Â Âvifname = libxl__device_nic_devname(gc, domid,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Ânic->devid,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Ânic->nictype);
+ Â Â}
+
+ Â Âreturn vifname;
+}
+
+static void free_qdisc(libxl__remus_device_nic *remus_nic)
+{
+ Â Â/* free qdiscs */
+ Â Âif (remus_nic->qdisc == NULL)
+ Â Â Â Âreturn;
+
+ Â Ânl_object_put((struct nl_object *)(remus_nic->qdisc));
+ Â Âremus_nic->qdisc = NULL;
+}
+
+static int init_qdisc(libxl__remus_netbuf_state *netbuf_state,
+ Â Â Â Â Â Â Â Â Â Â Âlibxl__remus_device_nic *remus_nic)
+{
+ Â Âint ret, ifindex;
+ Â Âstruct rtnl_link *ifb = NULL;
+ Â Âstruct rtnl_qdisc *qdisc = NULL;
+
+ Â ÂSTATE_AO_GC(netbuf_state->ao);
+
+ Â Â/* Now that we have brought up IFB device with plug qdisc for
+ Â Â * this vif, so we need to refill the qdisc cache.
+ Â Â */
+ Â Âret = nl_cache_refill(netbuf_state->nlsock, netbuf_state->qdisc_cache);
+ Â Âif (ret < 0) {
+ Â Â Â ÂLOG(ERROR, "cannot refill qdisc cache");
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Â/* get a handle to the IFB interface */
+ Â Âifb = NULL;
+ Â Âret = rtnl_link_get_kernel(netbuf_state->nlsock, 0,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â remus_nic->ifb, &ifb);
+ Â Âif (ret) {
+ Â Â Â ÂLOG(ERROR, "cannot obtain handle for %s: %s", remus_nic->ifb,
+ Â Â Â Â Â Ânl_geterror(ret));
+ Â Â Â Âret = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âret = ERROR_FAIL;
+ Â Âifindex = rtnl_link_get_ifindex(ifb);
+ Â Âif (!ifindex) {
+ Â Â Â ÂLOG(ERROR, "interface %s has no index", remus_nic->ifb);
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Â/* Get a reference to the root qdisc installed on the IFB, by
+ Â Â * querying the qdisc list we obtained earlier. The netbufscript
+ Â Â * sets up the plug qdisc as the root qdisc, so we don't have to
+ Â Â * search the entire qdisc tree on the IFB dev.
+
+ Â Â * There is no need to explicitly free this qdisc as its just a
+ Â Â * reference from the qdisc cache we allocated earlier.
+ Â Â */
+ Â Âqdisc = rtnl_qdisc_get_by_parent(netbuf_state->qdisc_cache, ifindex,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â TC_H_ROOT);
+
+ Â Âif (qdisc) {
+ Â Â Â Âconst char *tc_kind = rtnl_tc_get_kind(TC_CAST(qdisc));
+ Â Â Â Â/* Sanity check: Ensure that the root qdisc is a plug qdisc. */
+ Â Â Â Âif (!tc_kind || strcmp(tc_kind, "plug")) {
+ Â Â Â Â Â Ânl_object_put((struct nl_object *)qdisc);
+ Â Â Â Â Â ÂLOG(ERROR, "plug qdisc is not installed on %s", remus_nic->ifb);
+ Â Â Â Â Â Âgoto out;
+ Â Â Â Â}
+ Â Â Â Âremus_nic->qdisc = qdisc;
+ Â Â Â Âret = 0;
+ Â Â} else {
+ Â Â Â ÂLOG(ERROR, "Cannot get qdisc handle from ifb %s", remus_nic->ifb);
+ Â Â}
+
+out:
+ Â Âif (ifb)
+ Â Â Â Ârtnl_link_put(ifb);
+
+ Â Âreturn ret;
+}
+
+/*
+ * In return, the script writes the name of IFB device (during setup) to be
+ * used for output buffering into XENBUS_PATH/ifb
+ */
+static void netbuf_setup_script_cb(libxl__egc *egc,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â libxl__async_exec_state *aes,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â int status)
+{
+ Â Âlibxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+ Â Âlibxl__remus_device_nic *remus_nic = dev->data;
+ Â Âlibxl__remus_netbuf_state *netbuf_state = dev->ops->data;
+ Â Âconst char *out_path_base, *hotplug_error = NULL;
+ Â Âint rc;
+
+ Â Â/* Convenience aliases */
+ Â Âconst uint32_t domid = netbuf_state->domid;
+ Â Âconst int devid = dev->devid;
+ Â Âconst char *const vif = remus_nic->vif;
+ Â Âconst char **const ifb = &remus_nic->ifb;
+
+ Â ÂSTATE_AO_GC(netbuf_state->ao);
+
+ Â Âif (status) {
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âout_path_base = GCSPRINTF("%s/remus/netbuf/%d",
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__xs_libxl_path(gc, domid), devid);
+
+ Â Ârc = libxl__xs_read_checked(gc, XBT_NULL,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂGCSPRINTF("%s/hotplug-error", out_path_base),
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â&hotplug_error);
+ Â Âif (rc) {
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âif (hotplug_error) {
+ Â Â Â ÂLOG(ERROR, "netbuf script %s setup failed for vif %s: %s",
+ Â Â Â Â Â Ânetbuf_state->netbufscript, vif, hotplug_error);
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Ârc = libxl__xs_read_checked(gc, XBT_NULL,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂGCSPRINTF("%s/remus/netbuf/%d/ifb",
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__xs_libxl_path(gc, domid),
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âdevid),
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âifb);
+ Â Âif (rc) {
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âif (!(*ifb)) {
+ Â Â Â ÂLOG(ERROR, "Cannot get ifb dev name for domain %u dev %s",
+ Â Â Â Â Â Âdomid, vif);
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â ÂLOG(DEBUG, "%s will buffer packets from vif %s", *ifb, vif);
+ Â Ârc = init_qdisc(netbuf_state, remus_nic);
+
+out:
+ Â Âdev->callback(egc, dev, rc);
+}
+
+static void netbuf_teardown_script_cb(libxl__egc *egc,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__async_exec_state *aes,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âint status)
+{
+ Â Âint rc;
+ Â Âlibxl__remus_device *dev = CONTAINER_OF(aes, *dev, aes);
+ Â Âlibxl__remus_device_nic *remus_nic = dev->data;
+
+ Â Âif (status)
+ Â Â Â Ârc = ERROR_FAIL;
+ Â Âelse
+ Â Â Â Ârc = 0;
+
+ Â Âfree_qdisc(remus_nic);
+
+ Â Âdev->callback(egc, dev, rc);
+}
+
+/* the script needs the following env & args
+ * $vifname
+ * $XENBUS_PATH (/libxl/<domid>/remus/netbuf/<devid>/)
+ * $IFB (for teardown)
+ * setup/teardown as command line arg.
+ */
+static void setup_async_exec(libxl__async_exec_state *aes,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â char *op, libxl__remus_device *dev)
+{
+ Â Âint arraysize, nr = 0;
+ Â Âchar **env = NULL, **args = NULL;
+ Â Âlibxl__remus_device_nic *remus_nic = dev->data;
+ Â Âlibxl__remus_netbuf_state *ns = dev->ops->data;
+ Â ÂSTATE_AO_GC(ns->ao);
+
+ Â Â/* Convenience aliases */
+ Â Âchar *const script = libxl__strdup(gc, ns->netbufscript);
+ Â Âconst uint32_t domid = ns->domid;
+ Â Âconst int dev_id = dev->devid;
+ Â Âconst char *const vif = remus_nic->vif;
+ Â Âconst char *const ifb = remus_nic->ifb;
+
+ Â Âarraysize = 7;
+ Â ÂGCNEW_ARRAY(env, arraysize);
+ Â Âenv[nr++] = "vifname";
+ Â Âenv[nr++] = libxl__strdup(gc, vif);
+ Â Âenv[nr++] = "XENBUS_PATH";
+ Â Âenv[nr++] = GCSPRINTF("%s/remus/netbuf/%d",
+ Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__xs_libxl_path(gc, domid), dev_id);
+ Â Âif (!strcmp(op, "teardown") && ifb) {
+ Â Â Â Âenv[nr++] = "IFB";
+ Â Â Â Âenv[nr++] = libxl__strdup(gc, ifb);
+ Â Â}
+ Â Âenv[nr++] = NULL;
+ Â Âassert(nr <= arraysize);
+
+ Â Âarraysize = 3; nr = 0;
+ Â ÂGCNEW_ARRAY(args, arraysize);
+ Â Âargs[nr++] = script;
+ Â Âargs[nr++] = op;
+ Â Âargs[nr++] = NULL;
+ Â Âassert(nr == arraysize);
+
+ Â Âaes->ao = ns->ao;
+ Â Âaes->what = GCSPRINTF("%s %s", args[0], args[1]);
+ Â Âaes->env = env;
+ Â Âaes->args = args;
+ Â Âaes->timeout_ms = LIBXL_HOTPLUG_TIMEOUT * 1000;
+ Â Âaes->stdfds[0] = -1;
+ Â Âaes->stdfds[1] = -1;
+ Â Âaes->stdfds[2] = -1;
+
+ Â Âif (!strcmp(op, "teardown"))
+ Â Â Â Âaes->callback = netbuf_teardown_script_cb;
+ Â Âelse
+ Â Â Â Âaes->callback = netbuf_setup_script_cb;
+}
+
+static int nic_init(libxl__remus_device_ops *self,
+ Â Â Â Â Â Â Â Â Â Âlibxl__remus_state *rs)
+{
+ Â Âint rc;
+ Â Âlibxl__remus_netbuf_state *ns;
+
+ Â ÂSTATE_AO_GC(rs->ao);
+
+ Â ÂGCNEW(ns);
+ Â Âself->data = ""> +
+ Â Âns->nlsock = nl_socket_alloc();
+ Â Âif (!ns->nlsock) {
+ Â Â Â ÂLOG(ERROR, "cannot allocate nl socket");
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Ârc = nl_connect(ns->nlsock, NETLINK_ROUTE);
+ Â Âif (rc) {
+ Â Â Â ÂLOG(ERROR, "failed to open netlink socket: %s",
+ Â Â Â Â Â Ânl_geterror(rc));
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Â/* get list of all qdiscs installed on network devs. */
+ Â Ârc = rtnl_qdisc_alloc_cache(ns->nlsock, &ns->qdisc_cache);
+ Â Âif (rc) {
+ Â Â Â ÂLOG(ERROR, "failed to allocate qdisc cache: %s",
+ Â Â Â Â Â Ânl_geterror(rc));
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âns->ao = rs->ao;
+ Â Âns->domid = rs->domid;
+ Â Âns->netbufscript = rs->netbufscript;
+
+ Â Âreturn 0;
+
+out:
+ Â Âreturn ERROR_FAIL;
+}
+
+static void nic_destroy(libxl__remus_device_ops *self)
+{
+ Â Âlibxl__remus_netbuf_state *ns = self->data;
+
+ Â Âif (!self->data)
+ Â Â Â Âreturn;
+
+ Â Â/* free qdisc cache */
+ Â Âif (ns->qdisc_cache) {
+ Â Â Â Ânl_cache_clear(ns->qdisc_cache);
+ Â Â Â Ânl_cache_free(ns->qdisc_cache);
+ Â Â Â Âns->qdisc_cache = NULL;
+ Â Â}
+
+ Â Â/* close & free nlsock */
+ Â Âif (ns->nlsock) {
+ Â Â Â Ânl_close(ns->nlsock);
+ Â Â Â Ânl_socket_free(ns->nlsock);
+ Â Â Â Âns->nlsock = NULL;
+ Â Â}
+}
+
+static void async_call_done(libxl__egc *egc,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__ev_child *child,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Âpid_t pid, int status)
+{
+ Â Âlibxl__remus_device *dev = CONTAINER_OF(child, *dev, child);
+ Â Âlibxl__remus_device_state *rds = dev->rds;
+ Â ÂSTATE_AO_GC(rds->ao);
+
+ Â Âif (WIFEXITED(status)) {
+ Â Â Â Âdev->callback(egc, dev, -WEXITSTATUS(status));
+ Â Â} else {
+ Â Â Â Âdev->callback(egc, dev, ERROR_FAIL);
+ Â Â}
+}
+
+static void nic_match_async(const libxl__remus_device_ops *self,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__remus_device *dev)
+{
+ Â Âif (dev->kind == LIBXL__REMUS_DEVICE_NIC)
+ Â Â Â Â_exit(0);
+
+ Â Â_exit(-ERROR_NOT_MATCH);
+}
+
+static void nic_match(libxl__remus_device_ops *self,
+ Â Â Â Â Â Â Â Â Â Â Âlibxl__remus_device *dev)
+{
+ Â Âint pid = -1;
+ Â ÂSTATE_AO_GC(dev->rds->ao);
+
+ Â Â/* Fork and call */
+ Â Âpid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+ Â Âif (pid == -1) {
+ Â Â Â ÂLOG(ERROR, "unable to fork");
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âif (!pid) {
+ Â Â Â Â/* child */
+ Â Â Â Ânic_match_async(self, dev);
+ Â Â Â Â/* notreached */
+ Â Â Â Âabort();
+ Â Â}
+
+ Â Âreturn;
+
+out:
+ Â Âdev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void nic_setup(libxl__remus_device *dev)
+{
+ Â Âlibxl__remus_device_nic *remus_nic;
+ Â Âlibxl__remus_netbuf_state *ns = dev->ops->data;
+ Â Âconst libxl_device_nic *nic = dev->backend_dev;
+
+ Â ÂSTATE_AO_GC(ns->ao);
+
+ Â ÂGCNEW(remus_nic);
+ Â Âdev->data = ""> + Â Âremus_nic->vif = get_vifname(dev, nic);
+
+ Â Âsetup_async_exec(&dev->aes, "setup", dev);
+ Â Âif (libxl__async_exec_start(gc, &dev->aes)) {
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âreturn;
+
+out:
+ Â Âdev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/*
+ * Note: This function will be called in the same gc context as
+ * libxl__remus_netbuf_setup, created during the libxl_domain_remus_start
+ * API call.
+ */
+static void nic_teardown(libxl__remus_device *dev)
+{
+ Â Âlibxl__remus_netbuf_state *ns = dev->ops->data;
+
+ Â ÂSTATE_AO_GC(ns->ao);
+
+ Â Âsetup_async_exec(&dev->aes, "teardown", dev);
+
+ Â Âif (libxl__async_exec_start(gc, &dev->aes)) {
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âreturn;
+
+out:
+ Â Âdev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+/* The buffer_op's value, not the value passed to kernel */
+enum {
+ Â Âtc_buffer_start,
+ Â Âtc_buffer_release
+};
+
+static void remus_netbuf_op_async(libxl__remus_device_nic *remus_nic,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âlibxl__remus_netbuf_state *netbuf_state,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Âint buffer_op)
+{
+ Â Âint ret;
+
+ Â ÂSTATE_AO_GC(netbuf_state->ao);
+
+ Â Âif (buffer_op == tc_buffer_start)
+ Â Â Â Âret = rtnl_qdisc_plug_buffer(remus_nic->qdisc);
+ Â Âelse
+ Â Â Â Âret = rtnl_qdisc_plug_release_one(remus_nic->qdisc);
+
+ Â Âif (!ret) {
+ Â Â Â Âret = rtnl_qdisc_add(netbuf_state->nlsock,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â remus_nic->qdisc,
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â NLM_F_REQUEST);
+ Â Â Â Âif (ret)
+ Â Â Â Â Â Âgoto out;
+ Â Â}
+
+ Â Â_exit(0);
+
+out:
+ Â ÂLOG(ERROR, "Remus: cannot do netbuf op %s on %s:%s",
+ Â Â Â Â((buffer_op == tc_buffer_start) ?
+ Â Â Â Â"start_new_epoch" : "release_prev_epoch"),
+ Â Â Â Âremus_nic->ifb, nl_geterror(ret));
+ Â Â_exit(-ERROR_FAIL);
+}
+
+static void netbuf_epoch_op(libxl__remus_device *dev, int buffer_op)
+{
+ Â Âint pid = -1;
+ Â Âlibxl__remus_device_nic *remus_nic = dev->data;
+ Â Âlibxl__remus_netbuf_state *ns = dev->ops->data;
+ Â ÂSTATE_AO_GC(dev->rds->ao);
+
+ Â Â/* Fork and call */
+ Â Âpid = libxl__ev_child_fork(gc, &dev->child, async_call_done);
+ Â Âif (pid == -1) {
+ Â Â Â ÂLOG(ERROR, "unable to fork");
+ Â Â Â Âgoto out;
+ Â Â}
+
+ Â Âif (!pid) {
+ Â Â Â Â/* child */
+ Â Â Â Âremus_netbuf_op_async(remus_nic, ns, buffer_op);
+ Â Â Â Â/* notreached */
+ Â Â Â Âabort();
+ Â Â}
+
+ Â Âreturn;
+
+out:
+ Â Âdev->callback(dev->rds->egc, dev, ERROR_FAIL);
+}
+
+static void nic_postsuspend(libxl__remus_device *dev)
+{
+ Â Ânetbuf_epoch_op(dev, tc_buffer_start);
+}
+
+static void nic_commit(libxl__remus_device *dev)
+{
+ Â Ânetbuf_epoch_op(dev, tc_buffer_release);
+}
+

The async execution for each netlink call is an overkill. ÂThese rtnl calls complete
in a matter of few microseconds utmost. On the other hand, this code structure,Â
fork/execs a new process for every checkpoint just to execute a single library callÂ
(netbuf_epoch_op), which in turn issues just a syscall.Â

Correct me if I am wrong. I am assuming that the libxl__ev_child_fork eventually
leads to a fork() and exec() call.

Per remus checkpoint
Â2 ops for netbuf, 2 for disk.
Â1 fork & exec per op for a total of 4 forks per checkpoint. (based on this patch and the drbd patch)

ÂAt 25 checkpoints per second, you are looking at roughly a 100 fork/execs per second.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.