Re: [Xen-devel] Ever increasing time offset for HVM domain / Huge amounts of drift


At 13:37 +0000 on 14 Jan (1358170621), Phil Evans wrote:
> I am currently running Xen 4.2.1 (this has also happened in 4.2.0 as
> well).  We have been having a major problem with sometimes huge
> amounts of clock drift in Windows VMs.  Sometimes the clock on a VM
> could suddenly jump by over a week (usually forwards, however time has
> been known to go backwards as well).
> The steps to reproduce this (for me at least), is to simply do a
> manual NTP sync on a Windows VM.  Upon monitoring the qemu-dm log file
> for the VM, I see similar to the following:

Which version of Windows are you using for this?
Did you see this on older (4.1.x) Xen versions?

> Time offset set 489, added offset 480
> Time offset set 436, added offset -53
> Time offset set 496, added offset 60
> Time offset set 494, added offset -2
> Time offset set 554, added offset 60
> Time offset set 565, added offset 11
> Time offset set 606, added offset 41
> Time offset set -1974, added offset -2580
> Time offset set 1626, added offset 3600
> Time offset set 1579, added offset -47
> Time offset set 1639, added offset 60
> It seems to add the same number of seconds to the offset as has passed
> since the last sync.

This printout is from some code that gets given a _change_ in time
offset from Xen; it prints out the new value and the change, so they
should always add up like that.

But yes, it's striking that the VM is (mostly) drifting forward over

>  The offset just keeps on increasing, eventually
> resulting in huge numbers equating to days.  Occasionally the offset
> may jump a bit and go down but the general trend is up.  Although this
> does not affect the VM immediately, at some point I am guessing it
> syncs itself with the CMOS clock (which is now a large number of
> seconds offset from the actual time), resulting in a huge jump in
> time.  A reboot is a guaranteed way to get the new, incorrect time.

That makes sense; if the RTC is being set to the wrong time, a reboot
will copy the error into the new OS time. 

> Although I do not understand all of the underlying code, I presume the
> correct way this should work is it should be comparing the CMOS time
> that?s just been set with the hardware clock on the physical machine,
> resulting in an offset between the two.

More or less.  In fact IIRC it compares it with current CMOS time, and
propagates the difference into an offset from hardware-clock.

It's possible that the code to calculate 'current CMOS time' for a VM is
buggy -- that was changed in 4.2.  Cc'ing the people who touched that
code in 4.2 for their opinions.


