|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] error handling in libxl_domain_suspend
Ian, Wei,
we got a report about a crash from libxl_domain_suspend like this, from 'virsh
migrate --live xen+ssh://host':
#1 helper_done (egc=0x7fc0284aa6c0, shs=0x7fc0180256c8) at
libxl_save_callout.c:371
helper_failed
helper_stop
libxl__save_helper_abort
#2 check_all_finished (egc=0x7fc0284aa6c0, stream=0x7fc018025698, rc=-3) at
libxl_stream_write.c:671
stream_done
stream_complete
write_done
dc->callback == write_done
efd->func == datacopier_writable
#3 afterpoll_internal (egc=egc@entry=0x7fc0284aa6c0,
poller=poller@entry=0x7fc018003f20, nfds=4, fds=0x7fc018002d00, now=...) at
libxl_event.c:1269
I inserted the extra call trace manually for better understanding.
The issue is a failed poll will crash libxl, the actual error was:
libxl_aoutils.c:328:datacopier_writable: unexpected poll event 0x1c on fd 37
(should be POLLOUT) writing libxc header during copy of save v2 stream
In this case revents in datacopier_writable is POLLHUP|POLLERR|POLLOUT, which
triggers datacopier_callback.
In helper_done, shs->completion_callback is still zero:
(gdb) p stream.shs
$32 = {ao = 0x7f3fa4002d10, domid = 0, callbacks = {
save = {a = {suspend = 0x7f3f99c8e220 <libxl__domain_suspend_callback>,
postcopy = 0x0, checkpoint = 0x0, wait_checkpoint = 0x0, switch_qemu_logdirty =
0x7f3f99c8eca0 <libxl__domain_suspend_common_switch_qemu_logdirty>}},
restore = {a = {suspend = 0x7f3f99c8e220 <libxl__domain_suspend_callback>,
postcopy = 0x0, checkpoint = 0x0, wait_checkpoint = 0x0, restore_results =
0x7f3f99c8eca0 <libxl__domain_suspend_common_switch_qemu_logdirty>}}},
recv_callback = 0x0, completion_callback = 0x0,
caller_state = 0x0, need_results = 0, rc = 0, completed = 0, retval = 0,
errnoval = 0, abrt = {ao = 0x0, callback = 0x0, registered = false,
entry = { le_next = 0x0, le_prev = 0x0}}, pipes = {0x0, 0x0}, readable = {fd =
-1, events = 0, func = 0x0, entry = {le_next = 0x0, le_prev = 0x0}, nexus =
0x0},
child = {pid = -1, callback = 0x0, entry = {le_next = 0x0, le_prev = 0x0}},
stdin_what = 0x0, stdout_what = 0x0, egc = 0x0}
Even if helper_done would check if shs->completion_callback is valid,
check_all_finished would apparently cycle forever:
(gdb) p stream.completion_callback
$35 = (void (*)(libxl__egc *, libxl__stream_write_state *, int)) 0x7f3f99c8e890
<stream_done>
stream_done would call check_all_finished again.
My understanding of the code is that libxl__xc_domain_save fills dss.sws.shs.
But that function is only called after stream_header_done. Any error before
that will leave dss partly uninitialized.
How is this supposed to be fixed?
Olaf
Attachment:
pgptnq8Yp_zHA.pgp _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |