[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Migration vmdesc and xen-save-devices-state



Hi,

At some point, QEMU changed to add a json VM description (vmdesc)
after the migration data.  The vmdesc is not needed to restore the
migration data, but qemu_loadvm_state() will read and discard the
vmdesc to clear the stream when should_send_vmdesc() is true.

xen-save-devices-state generates its migration data without a vmdesc.
xen-load-devices-state in turn calls qemu_loadvm_state() which tries
to load vmdesc since should_send_vmdesc is true for xen.  When
restoring from a file, this is fine since it'll return EOF, print
"Expected vmdescription section, but got 0" and end the restore
successfully.

Linux stubdoms load their migration data over a console, so they don't
hit the EOF and end up waiting.  There does seem to be a timeout
though and restore continues after a delay, but we'd like to eliminate
the delay.

Two options to address this are to either:
1) set suppress_vmdesc for the Xen machines to bypass the
should_send_vmdesc() check.
or
2) just send the vmdesc data.

Since vmdesc is just discarded, maybe #1 should be followed.

If going with #2, qemu_save_device_state() needs to generate the
vmdesc data.  Looking at qemu_save_device_state() and
qemu_savevm_state_complete_precopy_non_iterable(), they are both very
similar and could seemingly be merged.  qmp_xen_save_devices_state()
could even leverage the bdrv_inactivate_all() call in
qemu_savevm_state_complete_precopy_non_iterable().

The would make qemu_save_device_state a little more heavywight, which
could impact COLO.  I'm not sure how performance sensitive the COLO
code is, and I haven't measured anything.

Does anyone have thoughts or opinions on the subject?

Thanks,
Jason



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.