[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH] x86/vhpet: Add option to always fire hpet timer on resume



 
27.08.2025, 12:09, "Roger Pau Monné" <roger.pau@xxxxxxxxxx>:

On Wed, Aug 27, 2025 at 09:01:08AM +0300, Vyacheslav Legoshin wrote:

 The following issue was observed on Windows 10 21H2 x64+: when the domain state
 is saved while all cores are executing the 'halt' instruction,


I assume this when executing `xl save` or equivalent command from a
different toolstack?

Yes, snapshot was taken with 'xl save'.

IIRC in that case the guest would be paused while the memory dump to
disk is done, and hence guest vCPUs won't be executing the `halt`
instruction, they wouldn't be executing at all.

Yes, the domain was paused, I made a mistake in wording.
 and the memory
 save takes a relatively long time (tens of seconds), the HPET counter may
 overflow as follows:
 counter = 11243f3e4a
 comparator = 910cb70f
 
 In such cases, the fix implemented in commit
 b144cf45d50b603c2909fc32c6abf7359f86f1aa does not work (because the 'diff' is
 not negative), resulting in the guest VM becoming unresponsive for
 approximately 30 seconds.
 
 This patch adds an option to always adjust the HPET timer to fire immediately
 after restore.


What happens if the guest is left running after the save? I assume
that using `xl save -c <domain>` would leave the domain in a similarly
wedged state, and your proposed workaround won't be effective there,
since there's no restoring process? Or that's not the case there
because Xen is still keeping track of the internal timer, and would
set an interrupt as pending anyway?

I've seen broken snapshots, but was not paying attention to the state of the
running domain after the save. So unfortunately I can't answer this question.
 
I'll try to reproduce it, but it is like one snapshot per month or less.
Also, if there is no 'halt' - then the domain is restored without any issues
(even if I break the saved HPET state intentionally).

Should we maybe store the last guest time at context save, and check
against that to see whether comparators should have triggered, instead
of playing this games?

At first I tried to change memory save and HVM save order (currently save()
from tools/libs/guest/xg_sr_save.c first saves the memory and ony after that
it saves HVM state (HPET included, which was counting forward all that time).
But since the domain pausing and HPET state save can't be done atomically
(at least AFAIK), there always will be a case like:
1) Windows sets compator to 0xffffffff
2) Counter reaches 0x1fffffffe
3) Xen pauses the domain
4) Xen saves HPET state
 
I think there is no way to say for sure, if it is an overflow or it is just
a long wait. Such a long wait is almost impossible (at least on WIndows),
but nothing prevents other guest OS from it.
Also, if there is a VM with huge memory space and the disk device is very slow
 or busy) - HPET counter may reach any value.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.