|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility
Jim Fehlig wrote:
> Ian Jackson wrote:
>
>> Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD
>> flexibility"):
>>
>>
>>> I let this run over the weekend and today noticed libvirtd was deadlocked
>>>
>>>
>> I have just retested xl with:
>> * my 3-patch 4.4 fixes series
>> * v2 of my fork series
>> * the extra mutex patch "libxl: fork: Fixup SIGCHLD sharing"
>> * "13/12" and "14/12" just posted
>> and it WFM.
>>
>> Of course I don't have the same setup as Jim.
>>
>> Jim: if it's not too much trouble, I'd appreciate it if you could try
>> that combination.
>>
>> For your convenience you can find a git branch of it at
>>
>> http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=shortlog;h=refs/tags/wip.enumerate-pids-v2.1
>> aka
>> git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1
>>
>>
>
> I've been testing this branch and notice an occasional libvirtd segfault
> that always occurs when calling libxl_domain_create_restore(). By
> occasional, I mean my save/restore script might cause the segfault after
> 2 iterations, or 20 iterations, or ... But the segfault always occurs
> in libxl_domain_create_restore()
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffeef59700 (LWP 12083)]
> 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
> at util/virobject.c:362
> 362 return virClassIsDerivedFrom(obj->klass, klass);
> (gdb) bt
> #0 0x00007ffff74577ef in virObjectIsClass (anyobj=0x2f302f6e69616d6f,
> klass=0x5555558a1310)
> at util/virobject.c:362
> #1 0x00007ffff745765b in virObjectLock (anyobj=0x2f302f6e69616d6f) at
> util/virobject.c:314
> #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
> hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> #3 0x00007fffe96f8fed in time_deregister (gc=0x7fffeef58220,
> ev=0x5555559eee48)
> at libxl_event.c:294
> #4 0x00007fffe96facfd in afterpoll_internal (egc=0x7fffeef58220,
> poller=0x5555559a4c70, nfds=3,
> fds=0x5555559c09d0, now=...) at libxl_event.c:1008
> #5 0x00007fffe96fc312 in eventloop_iteration (egc=0x7fffeef58220,
> poller=0x5555559a4c70)
> at libxl_event.c:1455
> #6 0x00007fffe96fce58 in libxl__ao_inprogress (ao=0x5555559e9690,
> file=0x7fffe970fadb "libxl_create.c", line=1356,
> func=0x7fffe97105f0 <__func__.16344> "do_domain_create") at
> libxl_event.c:1700
> #7 0x00007fffe96d711f in do_domain_create (ctx=0x5555559d9fa0,
> d_config=0x7fffeef58490,
> domid=0x7fffeef5840c, restore_fd=89, checkpointed_stream=0,
> ao_how=0x0, aop_console_how=0x0)
> at libxl_create.c:1356
> #8 0x00007fffe96d7238 in libxl_domain_create_restore
> (ctx=0x5555559d9fa0, d_config=0x7fffeef58490,
> domid=0x7fffeef5840c, restore_fd=89, params=0x7fffeef58400,
> ao_how=0x0, aop_console_how=0x0)
> at libxl_create.c:1387
> #...
> (gdb) f 2
> #2 0x00007fffe993cc96 in libxlDomainObjTimeoutModifyEventHook
> (priv=0x5555558fc310,
> hndp=0x5555559e5d88, abs_t=...) at libxl/libxl_domain.c:302
> 302 virObjectLock(info->priv);
> (gdb) p info->priv
> $3 = (libxlDomainObjPrivatePtr) 0x2f302f6e69616d6f
> (gdb) f 9
> #9 0x00007fffe993f2c7 in libxlVmStart (driver=0x5555558c2e50,
> vm=0x5555558e6a50,
> start_paused=false, restore_fd=89) at libxl/libxl_driver.c:635
> 635 res = libxl_domain_create_restore(priv->ctx, &d_config,
> &domid,
> (gdb) p priv
> $2 = (libxlDomainObjPrivatePtr) 0x5555558fc310
>
> It looks like the libxlDomainObjPrivatePtr, stashed as part of
> for_app_registration_out when registering the timeout, has been
> trampled. Not sure if the problem is in libvirt or libxl, but it is
> late here and I'm calling it a night :).
>
It appears the timeout_modify callback is invoked on a previously
deregistered timeout. I didn't notice the segfault when running
libvirtd under valgrind, but did see
==14653== Invalid read of size 8
==14653== at 0x134ACD1C: libxlDomainObjTimeoutModifyEventHook
(libxl_domain.c:309)
==14653== by 0x13730FEC: time_deregister (libxl_event.c:294)
==14653== by 0x13732CFC: afterpoll_internal (libxl_event.c:1008)
==14653== by 0x13734311: eventloop_iteration (libxl_event.c:1455)
==14653== by 0x13734E57: libxl__ao_inprogress (libxl_event.c:1700)
==14653== by 0x1370F11E: do_domain_create (libxl_create.c:1356)
==14653== by 0x1370F237: libxl_domain_create_restore
(libxl_create.c:1387)
==14653== by 0x134AF332: libxlVmStart (libxl_driver.c:635)
==14653== by 0x134B382A: libxlDomainRestoreFlags (libxl_driver.c:2047)
==14653== by 0x134B3975: libxlDomainRestore (libxl_driver.c:2070)
==14653== by 0x53B5AC7: virDomainRestore (libvirt.c:2678)
==14653== by 0x130ADC: remoteDispatchDomainRestore
(remote_dispatch.h:6657)
==14653== Address 0x18000178 is 8 bytes inside a block of size 32 free'd
==14653== at 0x4C28ADC: free (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==14653== by 0x529B08F: virFree (viralloc.c:580)
==14653== by 0x134AC578: libxlDomainObjEventHookInfoFree
(libxl_domain.c:110)
==14653== by 0x52BE3DB: virEventPollCleanupTimeouts (vireventpoll.c:535)
==14653== by 0x52BEA4C: virEventPollRunOnce (vireventpoll.c:651)
==14653== by 0x52BC960: virEventRunDefaultImpl (virevent.c:306)
which is consistent with the gdb findings. I've audited the timeout
handling code in libvirt and didn't notice any problems. I'll have some
time tomorrow to continue poking.
Regards,
Jim
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |