[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD 
flexibility"):
> BTW, I only see the crash when the save/restore script is running.  I
> stopped the other scripts and domains, running only save/restore on a
> single domain, and see the crash rather quickly (within 10 iterations).

I'll look at the libvirt code, but:

With a recurring timeout, how can you ever know it's cancelled ?
There might be threads out there, which don't hold any locks, which
are in the process of executing a callback for a timeout.  That might
be arbitrarily delayed from the pov of the rest of the program.

E.g.:

 Thread A                                             Thread B

   invoke some libxl operation
X    do some libxl stuff
X    register timeout (libxl)
XV     record timeout info
X    do some more libxl stuff
     ...
X    do some more libxl stuff
X    deregister timeout (libxl internal)
X     converted to request immediate timeout
XV     record new timeout info
X      release libvirt event loop lock
                                            entering libvirt event loop
                                       V     observe timeout is immediate
                                       V      need to do callback
                                               call libxl driver

      entering libvirt event loop
 V     observe timeout is immediate
 V      need to do callback
         call libxl driver
           call libxl
  X          libxl sees timeout is live
  X          libxl does libxl stuff
         libxl driver deregisters
 V         record lack of timeout
         free driver's timeout struct
                                               call libxl
                                      X          libxl sees timeout is dead
                                      X          libxl does nothing
                                             libxl driver deregisters
                                       V       CRASH due to deregistering
                                       V        already-deregistered timeout

If this is how things are, then I think there is no sane way to use
libvirt's timeouts (!)

In principle I guess the driver could keep its per-timeout structs
around forever and remember whether they've been deregistered or not.
Each one would have to have a lock in it.

But if you think about it, if you have 10 threads all running the
event loop and you set a timeout to zero, doesn't that mean that every
thread's event loop should do the timeout callback as fast as it can ?
That could be a lot of wasted effort.

The best solution would appear to be to provide a non-recurring
callback.

> I'm not so thrilled with the timeout handling code in the libvirt libxl
> driver.  The driver maintains a list of active timeouts because IIRC,
> there were cases when the driver received timeout deregistrations when
> calling libxl_ctx_free, at which point some of the associated structures
> were freed.  The idea was to call libxl_osevent_occurred_timeout on any
> active timeouts before freeing libxlDomainObjPrivate and its contents.

libxl does deregister fd callbacks in libxl_ctx_free.

But libxl doesn't currently "deregister" any timeouts in
libxl_ctx_free; indeed it would be a bit daft for it to do so as at
libxl_ctx_free there are no aos running so there would be nothing to
time out.

But there is a difficulty with timeouts which libxl has set to occur
immediately but which have not yet actually had the callback.  The the
application cannot call libxl_ctx_free with such timeouts outstanding,
because that would imply later calling back into libxl with a stale
ctx.

(Looking at the code I see that the "nexi" are never actually freed.
Bah.)

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.