[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 1/1] x86/arch: VM resume: avoid RDTSC emulation due to host clock drift
On 03/09/2019 08:54, Jan Beulich wrote: > On 02.09.2019 20:27, Edwin Török wrote: >> On a Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz the host frequency drifts: >> ``` >> (XEN) [ 6.607693] Detected 2600.004 MHz processor. >> (XEN) [ 2674.213081] dom1(hvm): >> mode=0,ofs=0xfffee6f70b7faa48,khz=2600018,inc=3 >> (XEN) [ 2674.213087] dom2(hvm): >> mode=0,ofs=0xfffee6fd499835c0,khz=2600018,inc=2 >> ``` >> >> The 2 domains were suspended prior to rebooting the host and applying a >> xen/microcode patch. After the reboot the frequency of the host was deemed to >> be slightly different, and therefore switching on RDTSC emulation for the >> Linux >> HVM guest, even though the difference was only 5 ppm. This CPU doesn't >> support >> TSC scaling. >> >> Therefore we should either measure the standard deviation of our calibration >> and have a range of acceptable frequencies as "same", or have a static >> tolerance value. >> >> The platform timer's clock frequency accuracy is: >> * IA-PC HPET Specification 1.0a sections 2.2 and 2.4.1: 500 ppm or better >> * ACPI PM timer, and PIT timer do not have defined accuracies >> * Intel 300 Series datasheet section 25.6: 24 MHz crystal 100 ppm or better >> * NTP FAQ section 3.3 Clock Quality: 11 ppm drift due to temperature >> * section 2.2.2 claims that PIT/ACPI PM timer share the same crystal as HPET >> and >> thus 500 ppm as an upper bound, "the real drift is usually smaller than >> 30ppm" >> >> For simplicity and determinism opted for a static tolerance value of 100 ppm >> here, such that the any error would be well within the error you would get >> with >> HPET/Linux's calibration. NTP can cope with a drift < 500 ppm. >> Most importantly this should stop Xen from claiming that the clock frequency >> on >> the same host is different across reboots. Specifications do not currently >> mandate an accuracy higher than 100 ppm, therefore OSes should already be >> able >> to cope with such drift on real hardware. Any improvements in accuracy from >> future specifications/motherboards wouldn't be applicable, because they would >> also come with newer CPUs that support TSC scaling. >> >> If the CPU does support TSC scaling Xen will of course still attempt to match >> the exact frequency value it thinks the guest had when it was suspended. >> See below for `if ( hvm_tsc_scaling_supported && !d->arch.vtsc )` (not >> visible >> in patch context). >> >> llabs() doesn't appear to be available when building xen, hence the 2 >> comparisons. >> >> After this patch when suspending a VM, and rebooting the host I get this >> output: >> ``` >> (XEN) [ 6.614703] Detected 2600.010 MHz processor. >> (XEN) [ 138.924342] TSC marked as reliable, warp = 0 (count=2) >> (XEN) [ 138.924346] dom1(hvm): >> mode=0,ofs=0xfffed01901016d18,khz=2600012,inc=2 >> ``` >> >> Signed-off-by: Edwin Török <edvin.torok@xxxxxxxxxx> > > First of all - are you aware that there had been multiple iterations > of a patch (by Olaf Hering) making this a command line and/or guest > config controllable setting? No, I'll take a look at them and the associated discussion. Found a '[PATCH v10] new config option vtsc_tolerance_khz to avoid TSC emulation' in the archives from 9 months ago. > If so, it would have been nice if at > least the cover letter identified the correlation. If not, please > take a look. After all it hasn't gone in so far because of objections > by Andrew. > > Using a hardcoded tolerance value in any event raises the question > of how you know whether a particular guest OS can actually cope. > >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -2171,6 +2171,12 @@ void tsc_get_info(struct domain *d, uint32_t >> *tsc_mode, >> *elapsed_nsec = 0; >> } >> >> +static inline int frequency_same_with_tolerance(int64_t khz1, int64_t khz2) > > The return type wants to be bool. And there wants to be an explaining > comment ahead of the function, (re-)stating some of what you have in > the description. > >> +{ >> + int64_t ppm = (khz2 - khz1) * 1000000 / khz2; >> + return -100 < ppm && ppm < 100; > > While we have no llabs(), you should imo use either ABS() or > __builtin_labs() / __builtin_llabs(). > > Furthermore, could we limit this behavior to the case of there not > being TSC scaling available (due to running on old hardware, or due > to it being a PV guest)? Yes, that'd make sense. Best regards, --Edwin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |