[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility



Jim Fehlig wrote:
> Ian Jackson wrote:
>   
>> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD 
>> flexibility"):
>>   
>>     
>>> I let this run over the weekend and today noticed libvirtd was deadlocked
>>>     
>>>       
>> I have just retested xl with:
>>   * my 3-patch 4.4 fixes series
>>   * v2 of my fork series
>>   * the extra mutex patch "libxl: fork: Fixup SIGCHLD sharing"
>>   * "13/12" and "14/12" just posted
>> and it WFM.
>>
>> Of course I don't have the same setup as Jim.
>>
>> Jim: if it's not too much trouble, I'd appreciate it if you could try
>> that combination.
>>
>> For your convenience you can find a git branch of it at
>>   
>> http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=shortlog;h=refs/tags/wip.enumerate-pids-v2.1
>> aka
>>   git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
>>   
>>     
>
> I've been testing this branch and notice an occasional libvirtd segfault
> that always occurs when calling libxl_domain_create_restore().  By
> occasional, I mean my save/restore script might cause the segfault after
> 2 iterations, or 20 iterations, or ...  But the segfault always occurs
> in libxl_domain_create_restore()
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffeef59700 (LWP 12083)]
> 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
>     at util/virobject.c:362
> 362         return virClassIsDerivedFrom(obj->klass, klass);
> (gdb) bt
> #0  0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
>     at util/virobject.c:362
> #1  0x00007ffff745765b in virObjectLock (anyobj=0x2f302f6e69616d6f) at
> util/virobject.c:314
> #2  0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
>     hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> #3  0x00007fffe96f8fed in time_deregister (gc=0x7fffeef58220,
> ev=0x5555559eee48)
>     at libxl_event.c:294
> #4  0x00007fffe96facfd in afterpoll_internal (egc=0x7fffeef58220,
> poller=0x5555559a4c70, nfds=3,
>     fds=0x5555559c09d0, now=...) at libxl_event.c:1008
> #5  0x00007fffe96fc312 in eventloop_iteration (egc=0x7fffeef58220,
> poller=0x5555559a4c70)
>     at libxl_event.c:1455
> #6  0x00007fffe96fce58 in libxl__ao_inprogress (ao=0x5555559e9690,
>     file=0x7fffe970fadb "libxl_create.c", line=1356,
>     func=0x7fffe97105f0 <__func__.16344> "do_domain_create") at
> libxl_event.c:1700
> #7  0x00007fffe96d711f in do_domain_create (ctx=0x5555559d9fa0,
> d_config=0x7fffeef58490,
>     domid=0x7fffeef5840c, restore_fd=89, checkpointed_stream=0,
> ao_how=0x0, aop_console_how=0x0)
>     at libxl_create.c:1356
> #8  0x00007fffe96d7238 in libxl_domain_create_restore
> (ctx=0x5555559d9fa0, d_config=0x7fffeef58490,
>     domid=0x7fffeef5840c, restore_fd=89, params=0x7fffeef58400,
> ao_how=0x0, aop_console_how=0x0)
>     at libxl_create.c:1387
> #...
> (gdb) f 2
> #2  0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
>     hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> 302         virObjectLock(info->priv);
> (gdb) p info->priv
> $3 = (libxlDomainObjPrivatePtr) 0x2f302f6e69616d6f
> (gdb) f 9
> #9  0x00007fffe993f2c7 in libxlVmStart (driver=0x5555558c2e50,
> vm=0x5555558e6a50,
>     start_paused=false, restore_fd=89) at libxl/libxl_driver.c:635
> 635             res = libxl_domain_create_restore(priv->ctx, &d_config,
> &domid,
> (gdb) p priv
> $2 = (libxlDomainObjPrivatePtr) 0x5555558fc310
>
> It looks like the libxlDomainObjPrivatePtr, stashed as part of
> for_app_registration_out when registering the timeout, has been
> trampled.  Not sure if the problem is in libvirt or libxl, but it is
> late here and I'm calling it a night :).
>   

It appears the timeout_modify callback is invoked on a previously
deregistered timeout.  I didn't notice the segfault when running
libvirtd under valgrind, but did see

==14653== Invalid read of size 8
==14653==    at 0x134ACD1C: libxlDomainObjTimeoutModifyEventHook
(libxl_domain.c:309)
==14653==    by 0x13730FEC: time_deregister (libxl_event.c:294)
==14653==    by 0x13732CFC: afterpoll_internal (libxl_event.c:1008)
==14653==    by 0x13734311: eventloop_iteration (libxl_event.c:1455)
==14653==    by 0x13734E57: libxl__ao_inprogress (libxl_event.c:1700)
==14653==    by 0x1370F11E: do_domain_create (libxl_create.c:1356)
==14653==    by 0x1370F237: libxl_domain_create_restore
(libxl_create.c:1387)
==14653==    by 0x134AF332: libxlVmStart (libxl_driver.c:635)
==14653==    by 0x134B382A: libxlDomainRestoreFlags (libxl_driver.c:2047)
==14653==    by 0x134B3975: libxlDomainRestore (libxl_driver.c:2070)
==14653==    by 0x53B5AC7: virDomainRestore (libvirt.c:2678)
==14653==    by 0x130ADC: remoteDispatchDomainRestore
(remote_dispatch.h:6657)
==14653==  Address 0x18000178 is 8 bytes inside a block of size 32 free'd
==14653==    at 0x4C28ADC: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14653==    by 0x529B08F: virFree (viralloc.c:580)
==14653==    by 0x134AC578: libxlDomainObjEventHookInfoFree
(libxl_domain.c:110)
==14653==    by 0x52BE3DB: virEventPollCleanupTimeouts (vireventpoll.c:535)
==14653==    by 0x52BEA4C: virEventPollRunOnce (vireventpoll.c:651)
==14653==    by 0x52BC960: virEventRunDefaultImpl (virevent.c:306)

which is consistent with the gdb findings.  I've audited the timeout
handling code in libvirt and didn't notice any problems.  I'll have some
time tomorrow to continue poking.

Regards,
Jim

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.