[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] libxl/devd: fix a race with concurrent device addition/removal



On Wed, May 10, 2017 at 02:03:58PM +0100, Roger Pau Monne wrote:
> On Wed, May 10, 2017 at 11:47:31AM +0100, Wei Liu wrote:
> > On Wed, May 10, 2017 at 11:43:57AM +0100, Roger Pau Monne wrote:
> > > On Wed, May 10, 2017 at 11:32:45AM +0100, Ian Jackson wrote:
> > > > Roger Pau Monne writes ("[PATCH 1/2] libxl/devd: fix a race with 
> > > > concurrent device addition/removal"):
> > > > > Current code can free the libxl__device inside of the 
> > > > > libxl__ddomain_device
> > > > > before the addition has finished if a removal happens while an 
> > > > > addition is
> > > > > still in process:
> > > > > 
> > > > >   backend_watch_callback
> > > > >             |
> > > > >             v
> > > > >        add_device
> > > > >             |                 backend_watch_callback
> > > > >     (async operation)                   |
> > > > >             |                           v
> > > > >             |                     remove_device
> > > > >             |                           |
> > > > >             |                           V
> > > > >             |                    device_complete
> > > > >             |                 (free libxl__device)
> > > > >             v
> > > > >      device_complete
> > > > >   (deref libxl__device)
> > > > > 
> > > > > Fix this by creating a temporary copy of the libxl__device, that's
> > > > > tracked by the GC of the nested async operation. This ensures that
> > > > > the libxl__device used by the async operations cannot be freed while
> > > > > being used.
> > > > 
> > > > Doesn't this arrange that the remove hotplug script will be invoked
> > > > while the add hotplug script is still running ?
> > > 
> > > That's indeed possible (either with the current code or with this patch),
> > > although unlikely. The async code called by remove_device will wait for 
> > > the
> > > backend to switch to state 6, while the add_device code will wait for 
> > > state 2
> > > IIRC (one can change those states to make them clash probably).
> > > 
> > > > Is that really desirable (or allowed!) ?
> > > 
> > > Hm, no, I don't think it's desirable at all. I still think this is better 
> > > that
> > > the previous code (at lest it doesn't dereference libxl__device anymore), 
> > > but
> > > clearly needs further improvements.
> > > 
> > > Also, it seems to me the same can happen even without driver domains, if 
> > > a user
> > > executes concurrent block-{attach/detach} operations, but maybe I'm 
> > > missing
> > > something?
> > > 
> > 
> > There is a lot of locking for all the device add / remove code. See
> > libxl_internal.h L2588.
> 
> It's been a long time since I've played with libxl, last time none of this
> existed, sorry. Sadly devd completely bypasses all this, a simple way to fix
> this would be to call libxl__lock_domain_userdata from {add/remove}_device
> maybe? (and drop the lock at device_complete).

As discussed on IRC, this is not possible because in order to lock the user
data lock the CTX lock must be held, and leaving libxl with the CTX lock held
is not allowed (as would happen when waiting for an async operation to
finish).

I think we have concluded that this, although nice to have, is not worse than
the situation when running hotplug scripts from Dom0, hence the patch still
stands.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.