[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 04/34] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration



On Mon, 2026-06-15 at 23:47 -0700, Dongli Zhang wrote:
> I tested patches 02, 03, 04, and 26 by customizing QEMU to support kexec live
> updates (LUO and KHO), preserving the memfd across kexec.

Thank you.

> For my use case, I used KVM_[GS]ET_CLOCK_GUEST instead of the existing
> KVM_[GS]ET_CLOCK. I didn't account the downtime in my QEMU code, although host
> TSC never resets across kexec.
> 
> Clock drift was zero, and I did not observe any unnecessary master clock 
> updates
> after KVM_SET_CLOCK_GUEST completed.

The kvmclock drift won't have been *zero*; it will have been a
nanosecond or two. Which most people won't notice, but is annoying me.

It believe it comes from both pvclock_update_vm_gtod_copy() and
kvm_vcpu_ioctl_set_clock_guest() rounding *down*. I think we should
tweak the latter to round *up* so they're at least not biasing in the
same direction.

We could also do better at picking a snapshot cycle count which
*doesn't* lose in the rounding. But those are definitely improvements
for another day; this series is long and complex enough and has already
gained a dependency on fixes in core timekeeping snapshots.

> Another interesting observation from my experiments is that tsc_khz changes
> across kexec. Since the TSC value itself does not reset across kexec, I'm
> wondering whether there is any reason to switch to the new tsc_khz value after
> the kexec.

This is the host timekeeping, yes?

We really ought to pass over *all* the NTP synchronization data across
KHO — not just the frequency. There's no excuse for the new kernel not
reporting *precisely* the same time that the old kernel would, for a
given TSC reading.

The work I've been doing at 
https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/ffclock
lays the groundwork for exporting and importing the full reference
data, and maybe I should use KHO as the example use case while we
continue to bikeshed the userspace and vmclock parts.

> While live migration involves two different machines, kexec is performed on 
> the
> same machine. Given that the TSC value itself is preserved across kexec, would
> it make sense to reuse the pre-kexec tsc_khz value instead of using the new
> tsc_khz after kexec?
> 
> I tested this by using LUO to preserve tsc_khz across kexec, and the results
> looked good.

Of course, what we should really be doing is exporting the timekeeping
reference to see what frequency the source host TSC is *actually*
running at, at the time of migration. That gives us a function of guest
TSC to TAI. Then we can restore the TSC on the destination host as if
it has been running at precisely that frequency during the migration.

The TSC might be at a slightly different frequency on the new host, but
we provide vmclock and the guest can clamp its timekeeping to that
fairly much immediately (see qemu patch I've been posting with the
ffclock/timekeeping series).

Attachment: smime.p7s
Description: S/MIME cryptographic signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.