[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: xen 4.14.3 incorrect (~3x) cpu frequency reported
On 10/01/2022 13:11, Jan Beulich wrote: > On 10.01.2022 13:37, Roger Pau Monné wrote: >> On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote: >>> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p >>> >>> pts->frequency = hpet_rate; >>> >>> +for(i = 0; i < 16; ++i) {//temp >>> count = hpet_read32(HPET_COUNTER); >>> start = rdtsc_ordered(); >>> target = count + CALIBRATE_VALUE(hpet_rate); >>> if ( target < count ) >>> while ( hpet_read32(HPET_COUNTER) >= count ) >>> continue; >>> - while ( hpet_read32(HPET_COUNTER) < target ) >>> + while ( (count = hpet_read32(HPET_COUNTER)) < target ) >>> continue; >>> >>> - return (rdtsc_ordered() - start) * CALIBRATE_FRAC; >>> + expired = rdtsc_ordered() - start; >> There's also a window between the HPET read and the TSC read where an >> SMI/NMI could cause a wrong frequency detection. > Right, as said in an earlier reply I did notice this myself (actually > on the way home on Friday). As also said there, for now I can't see > any real (i.e. complete) solution to this and the similar instances > of the issue elsewhere. Calibration loops like this cannot be made robust. This is specifically why frequency information is being made available via architectural means, and is available via non-architectural means in other cases. There's a whole bunch of situations (#STOPCLK, TERM and PSMI off the top of my head, but I bet HDC will screw with things too) which will mess with any calibration loop, even if you could disable SMIs. While, mechanically, we can disable SMIs on AMD with CLGI, we absolutely shouldn't run a calibration loop like this with SMIs disabled. Much as I hate to suggest it, we should parse the frequency out of the long CPUID string, and compare it to the calibration time. (This technique is mandated in the Intel BWG during very early setup.) If it is different by a large margin, rerun the calibration, and if it is still different, complain loudly into the logs. This will fix a one-off-spurious event, whereas if e.g. the system is thermally throttling due to a bad heat sync, there is nothing software can do to recover the system. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |