|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility
Ian Jackson wrote:
> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD
> flexibility"):
>
>> BTW, I only see the crash when the save/restore script is running. I
>> stopped the other scripts and domains, running only save/restore on a
>> single domain, and see the crash rather quickly (within 10 iterations).
>>
>
> I'll look at the libvirt code, but:
>
> With a recurring timeout, how can you ever know it's cancelled ?
> There might be threads out there, which don't hold any locks, which
> are in the process of executing a callback for a timeout. That might
> be arbitrarily delayed from the pov of the rest of the program.
>
> E.g.:
>
> Thread A Thread B
>
> invoke some libxl operation
> X do some libxl stuff
> X register timeout (libxl)
> XV record timeout info
> X do some more libxl stuff
> ...
> X do some more libxl stuff
> X deregister timeout (libxl internal)
> X converted to request immediate timeout
> XV record new timeout info
> X release libvirt event loop lock
> entering libvirt event loop
> V observe timeout is immediate
> V need to do callback
> call libxl driver
>
> entering libvirt event loop
> V observe timeout is immediate
> V need to do callback
> call libxl driver
> call libxl
> X libxl sees timeout is live
> X libxl does libxl stuff
> libxl driver deregisters
> V record lack of timeout
> free driver's timeout struct
> call libxl
> X libxl sees timeout is dead
> X libxl does nothing
> libxl driver deregisters
> V CRASH due to deregistering
> V already-deregistered timeout
>
> If this is how things are, then I think there is no sane way to use
> libvirt's timeouts (!)
>
Looking at libvirt's default event loop impl, and the current libxl
driver code, I think this is how things are :-/. But maybe you have
just described a bug in the libxl driver. In the timer callback,
libxlDomainObjPrivate is locked, the timeout is disabled in libvirt
event loop, libxlDomainObjPrivate is unlocked, and
libxl_osevent_occurred_timeout is called. Could the issue be solved by
checking if the timeout is still valid in the callback, while holding a
lock on libxlDomainObjPrivate? The first thread running the callback
could mark the timeout invalid before releasing the lock and calling
libxl_osevent_occurred_timeout. After acquiring the
libxlDomainObjPrivate lock, subsequent threads running the callback
would see the timer is invalid and return.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |