[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] XEN: bug in vif-bridge script
On 05/03/14 03:00, Ian Campbell wrote: > (Roger, I've trimmed the quotes fairly aggressively, > https://bugs.gentoo.org/show_bug.cgi?id=502570 or > http://lists.xen.org/archives/html/xen-users/2014-03/msg00013.html for > the full thing but in brief the vif is gone by the time the hotplug > script runs and this results in errors from e.g. brctl delif, which are > correctly ignored but are also logged. I presume Atom2 is running a log > scanning tool or something and would like to avoid spurious log > messages, which seems fair) > > On Tue, 2014-03-04 at 17:06 +0100, Atom2 wrote: >> If this rather needs to go to the xen-devel ML, I am sure Ian Campbell >> (or somebody else) will shortly be around and move it or asks me to >> resend to the other list. > [...] >> Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: brctl delif >> xenbr0 vif1.0 failed >> Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: ifconfig vif1.0 >> down failed > [...] >> Upon investigating it seems that the problem is related to the fact that >> the network device (at least for paravirtualized guests using the >> netfront/netback device model) has already been destroyed by the dom0 >> kernel when the script is being run. > > This sounds very plausible to me. > > Are you using the xm or xl toolstack? The way the new xl toolstack > handles hotplug scripts ought to be a lot less prone to this sort of > race (but I don't know if it avoids this particular one). Roger, do you > have any thoughts? I've tried with xl and I can confirm this happens at least with xl (not tested xm). In libxl we wait for the backends to switch to state 6 (closed) before executing hotplug scripts. My guess is that netback removes the vif before switching to state 6, so by the time hotplug scripts are executed the vif is long gone. This messages appear on /var/log/debug on my system, which also contains other messages from hotplug scripts, that are part of the normal hotplug execution: Mar 5 09:33:06 loki royger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/2/51712 Mar 5 09:33:06 loki royger: /etc/xen/scripts/vif-bridge: online type_if=vif XENBUS_PATH=backend/vif/2/0 Mar 5 09:33:06 loki royger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif2.0, bridge bridge0. Mar 5 09:33:06 loki royger: /etc/xen/scripts/vif-bridge: Writing backend/vif/2/0/hotplug-status connected to xenstore. Mar 5 09:33:22 loki royger: /etc/xen/scripts/block: remove XENBUS_PATH=backend/vbd/2/51712 Mar 5 09:33:22 loki royger: /etc/xen/scripts/vif-bridge: offline type_if=vif XENBUS_PATH=backend/vif/2/0 Mar 5 09:33:22 loki royger: /etc/xen/scripts/vif-bridge: brctl delif bridge0 vif2.0 failed Mar 5 09:33:22 loki royger: /etc/xen/scripts/vif-bridge: ifconfig vif2.0 down failed Mar 5 09:33:22 loki royger: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for vif2.0, bridge bridge0. >> Suggested fix: >> for brctl: check whether the interface still exists and is also still >> linked to the bridge prior to invoking the brctl command >> for ifconfig: check whether the interface still exists and is also still >> up prior to invoking the ifconfig command as follows: >> ------------------------------------------- >> case "$command" in >> online) >> setup_virtual_bridge_port "$dev" >> mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`" >> if [ -n "$mtu" ] && [ "$mtu" -gt 0 ] >> then >> ip link set $dev mtu $mtu || : >> fi >> add_to_bridge "$bridge" "$dev" >> ;; >> >> offline) >> if brctl show "$bridge" | grep "$dev" > /dev/null 2>&1 ; then >> do_without_error brctl delif "$bridge" "$dev" >> fi >> if ifconfig -s "$dev" > /dev/null 2>&1 ; then >> do_without_error ifconfig "$dev" down >> fi >> ;; >> >> add) >> setup_virtual_bridge_port "$dev" >> add_to_bridge "$bridge" "$dev" >> ;; >> esac > > If this issue does affect xl then I would like to see this fixed > upstream, preferably by fixing xl to not race hotplug scripts against > device tear down. If that is impossible (I don't think it should be, but > Roger?) then the script change which you propose seems like a very > reasonable fallback option. IMHO we are doing the right thing in libxl, wait for the backend to switch to state 6, and then execute hotplug scripts. I think your proposed fix is racy, by looking at netback code: set_backend_state(be, XenbusStateClosed); unregister_hotplug_status_watch(be); if (be->vif) { kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE); xenbus_rm(XBT_NIL, dev->nodename, "hotplug-status"); xenvif_free(be->vif); be->vif = NULL; } kfree(be); dev_set_drvdata(&dev->dev, NULL); Netback sets the "Closed" state before removing the vif, so there's a possible race between the gate that you propose to add to the hotplug script and the removal work done by netback. I think a suitable fix would be to either entirely remove those two lines from the hotplug script, or simply don't log any error in case they fail. Roger. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |