[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b)



On Wed, 2015-07-29 at 11:45 -0400, Konrad Rzeszutek Wilk wrote:
> relying on the (stale) 4.5 rules file having the UDEV_CALL=1 in them.
> 
> I don't exactly understand how the hotplug scripts are invoked via 'xl'.

They are called when the backend gets to (or passes through)
XenbusStateInitWait (attach) or XenbusStateClosed (detach).

> With udev it was pretty clear and easy to me.

It was also, unfortunately, racy.

In particular on tear down there was no interlock between the scripts
(executed asynchronously by udev) and the toolstack. Some backends have
interlock between the backend and the script, but that's not the
same/sufficient.

This race means that the xenstore dir could be removed before the script
runs, and the script may need information from xenstore in order to do the
tear down.

This was a particular problem for detaching a vif on a vswitch system,
since vswitch (unlike Linux bridge) does not automatically remove a port
when the device disappears, so we need xenstore info (specifically the
bridge node) to clean up.

I believe there were also similar issues with block-iscsi (to logout of the
target) and even regular block devices where loopback was in use (to see
the type and know whether to losetup -d or not, this was the reason why we
didn't do loopback for file:// devices with libxl for quite a while).

It was also completely different for each backend platform (Linux,
BSD,etc), which was problematic from a support PoV.

> Note that I see this problem regardless of me having 'xl devd' running or 
> not.

You definitely do _not_ want to run xl devd in dom0 (or more precisely in
your toolstack domain). Having both xl and devd doing this operations will
not result in anything you want.

Might be worth having some interlock on that, if we don't already.

> > Another option would be to install an empty xen-backend.rules for the
> > 4.6 release, and then remove it for 4.7.
> 
> Or trim down the udev rules ?

The udev scripts should have been unused since 4.5.0-rc1, where they were
by default gated from running in dom0 in favour of the libxl version. In
the default configuration the scripts detected when they were called via
udev and exited immediately without doing anything, leaving them to do the
real work when called directly from the toolstack.

Have you been seeing this issue since then and "fixing" it by manually
reverting to the udev behaviour in /etc/xen/xl.conf (or elsewhere for other
libxl clients)?

If not then there is some unintentional change in 2ba368d138934 as well as
the unintentional removal of the udev scripts. There really should have
been no semantic change compared with the default behaviour from 4.5.0-rc1.

Putting back the udev rules (even a trimmed down version, whatever that
means) is just papering over the underlying issue, whatever that is. Only
once we have understood the underlying issue can we consider whether the
appropriate remedial action for 4.6 is to put udev back (i.e. if the real
fix is too intrusive etc)

I think the next thing to try should be to revert only the tools/libxl
portion of 2ba368d138934, i.e. return to the old toolstack code without
putting the udev scripts back (being careful to clear up any remnants of
the previous larger revert from the installed system). 

That should also be a change with no functional difference. So it will, I
think, help rule in/out any unintentional change in behaviour in (lib)xl as
opposed to some weird interaction with the inactive udev scripts.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.