[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



[Adding libvirt list...]

Ian Jackson wrote:
> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD 
> flexibility"):
>   
>> BTW, I only see the crash when the save/restore script is running.  I
>> stopped the other scripts and domains, running only save/restore on a
>> single domain, and see the crash rather quickly (within 10 iterations).
>>     
>
> I'll look at the libvirt code, but:
>
> With a recurring timeout, how can you ever know it's cancelled ?
> There might be threads out there, which don't hold any locks, which
> are in the process of executing a callback for a timeout.  That might
> be arbitrarily delayed from the pov of the rest of the program.
>
> E.g.:
>
>  Thread A                                             Thread B
>
>    invoke some libxl operation
> X    do some libxl stuff
> X    register timeout (libxl)
> XV     record timeout info
> X    do some more libxl stuff
>      ...
> X    do some more libxl stuff
> X    deregister timeout (libxl internal)
> X     converted to request immediate timeout
> XV     record new timeout info
> X      release libvirt event loop lock
>                                             entering libvirt event loop
>                                        V     observe timeout is immediate
>                                        V      need to do callback
>                                                call libxl driver
>
>       entering libvirt event loop
>  V     observe timeout is immediate
>  V      need to do callback
>          call libxl driver
>            call libxl
>   X          libxl sees timeout is live
>   X          libxl does libxl stuff
>          libxl driver deregisters
>  V         record lack of timeout
>          free driver's timeout struct
>                                                call libxl
>                                       X          libxl sees timeout is dead
>                                       X          libxl does nothing
>                                              libxl driver deregisters
>                                        V       CRASH due to deregistering
>                                        V        already-deregistered timeout
>
> If this is how things are, then I think there is no sane way to use
> libvirt's timeouts (!)
>   

Looking at the libvirt code again, it seems a single thread services the
event loop. See virNetServerRun() in src/util/virnetserver.c. Indeed, I
see the same thread ID in all the timer and fd callbacks. One of the
libvirt core devs can correct me if I'm wrong.

Regards,
Jim


> In principle I guess the driver could keep its per-timeout structs
> around forever and remember whether they've been deregistered or not.
> Each one would have to have a lock in it.
>
> But if you think about it, if you have 10 threads all running the
> event loop and you set a timeout to zero, doesn't that mean that every
> thread's event loop should do the timeout callback as fast as it can ?
> That could be a lot of wasted effort.
>
> The best solution would appear to be to provide a non-recurring
> callback.
>
>   
>> I'm not so thrilled with the timeout handling code in the libvirt libxl
>> driver.  The driver maintains a list of active timeouts because IIRC,
>> there were cases when the driver received timeout deregistrations when
>> calling libxl_ctx_free, at which point some of the associated structures
>> were freed.  The idea was to call libxl_osevent_occurred_timeout on any
>> active timeouts before freeing libxlDomainObjPrivate and its contents.
>>     
>
> libxl does deregister fd callbacks in libxl_ctx_free.
>
> But libxl doesn't currently "deregister" any timeouts in
> libxl_ctx_free; indeed it would be a bit daft for it to do so as at
> libxl_ctx_free there are no aos running so there would be nothing to
> time out.
>
> But there is a difficulty with timeouts which libxl has set to occur
> immediately but which have not yet actually had the callback.  The the
> application cannot call libxl_ctx_free with such timeouts outstanding,
> because that would imply later calling back into libxl with a stale
> ctx.
>
> (Looking at the code I see that the "nexi" are never actually freed.
> Bah.)
>
> Thanks,
> Ian.
>
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.