[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH V8 00/12] fix migration of suspended runstate
On 12/18/2023 12:14 AM, Peter Xu wrote: > On Wed, Dec 13, 2023 at 10:35:33AM -0500, Steven Sistare wrote: >> Hi Peter, all have RB's, with all i's dotted and t's crossed - steve > > Yes this seems to be more migration related so maybe good candidate for a > pull from migration submodule. > > But since this is still solving a generic issue, I'm copying a few more > people from get_maintainers.pl that this series touches, just in case > they'll have something to say before dev cycle starts. The key aspects are summarized by the cover letter and the commit messages pasted below for the first 6 patches: https://lore.kernel.org/qemu-devel/1702481421-375368-1-git-send-email-steven.sistare@xxxxxxxxxx --------------------------------------------------------------------------- [PATCH V8 00/12] fix migration of suspended runstate Migration of a guest in the suspended runstate is broken. The incoming migration code automatically tries to wake the guest, which is wrong; the guest should end migration in the same runstate it started. Further, after saving a snapshot in the suspended state and loading it, the vm_start fails. The runstate is RUNNING, but the guest is not. --------------------------------------------------------------------------- [PATCH V8 01/12] cpus: vm_was_suspended Add a state variable to remember if a vm previously transitioned into a suspended state. --------------------------------------------------------------------------- [PATCH V8 02/12] cpus: stop vm in suspended runstate Currently, a vm in the suspended state is not completely stopped. The VCPUs have been paused, but the cpu clock still runs, and runstate notifiers for the transition to stopped have not been called. This causes problems for live migration. Stale cpu timers_state is saved to the migration stream, causing time errors in the guest when it wakes from suspend, and state that would have been modified by runstate notifiers is wrong. Modify vm_stop to completely stop the vm if the current state is suspended, transition to RUN_STATE_PAUSED, and remember that the machine was suspended. Modify vm_start to restore the suspended state. This affects all callers of vm_stop and vm_start, notably, the qapi stop and cont commands. For example: (qemu) info status VM status: paused (suspended) (qemu) stop (qemu) info status VM status: paused (qemu) system_wakeup Error: Unable to wake up: guest is not in suspended state (qemu) cont (qemu) info status VM status: paused (suspended) (qemu) system_wakeup (qemu) info status VM status: running --------------------------------------------------------------------------- [PATCH V8 03/12] cpus: check running not RUN_STATE_RUNNING When a vm transitions from running to suspended, runstate notifiers are not called, so the notifiers still think the vm is running. Hence, when we call vm_start to restore the suspended state, we call vm_state_notify with running=1. However, some notifiers check for RUN_STATE_RUNNING. They must check the running boolean instead. No functional change. --------------------------------------------------------------------------- [PATCH V8 04/12] cpus: vm_resume Define the vm_resume helper, for use in subsequent patches. --------------------------------------------------------------------------- [PATCH V8 05/12] migration: propagate suspended runstate If the outgoing machine was previously suspended, propagate that to the incoming side via global_state, so a subsequent vm_start restores the suspended state. To maintain backward and forward compatibility, reclaim some space from the runstate member. --------------------------------------------------------------------------- [PATCH V8 06/12] migration: preserve suspended runstate A guest that is migrated in the suspended state automaticaly wakes and continues execution. This is wrong; the guest should end migration in the same state it started. The root cause is that the outgoing migration code automatically wakes the guest, then saves the RUNNING runstate in global_state_store(), hence the incoming migration code thinks the guest is running and continues the guest if autostart is true. On the outgoing side, delete the call to qemu_system_wakeup_request(). Now that vm_stop completely stops a vm in the suspended state (from the preceding patches), the existing call to vm_stop_force_state is sufficient to correctly migrate all vmstate. On the incoming side, call vm_start if the pre-migration state was running or suspended. For the latter, vm_start correctly restores the suspended state, and a future system_wakeup monitor request will cause the vm to resume running. ---------------------------------------------------------------------------
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |