[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/time: further improve TSC / CPU freq calibration accuracy
On 18.01.2022 13:45, Roger Pau Monné wrote: > On Thu, Jan 13, 2022 at 02:41:31PM +0100, Jan Beulich wrote: >> Calibration logic assumes that the platform timer (HPET or ACPI PM >> timer) and the TSC are read at about the same time. This assumption may >> not hold when a long latency event (e.g. SMI or NMI) occurs between the >> two reads. Reduce the risk of reading uncorrelated values by doing at >> least four pairs of reads, using the tuple where the delta between the >> enclosing TSC reads was smallest. From the fourth iteration onwards bail >> if the new TSC delta isn't better (smaller) than the best earlier one. > > Do you have some measurements as to whether this makes the frequency > detection any more accurate? In the normal case (no SMI or alike) I haven't observed any further improvement on top of the earlier patch. >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> Obviously (I think) instead of having both read_{hpet,pmtmr}_and_tsc() >> the calibration logic could be folded between HPET and PMTMR, at the > > As an intermediate solution you could have a read_counter_and_tsc I > would think? Not sure in which way you view this as "intermediate". >> expense of a couple more indirect calls (which may not be that much of a >> problem as this is all boot-time only). Really such folding would have >> been possible already before, just that the amount of duplicate code >> hasn't been this large so far. IOW if so desired, then perhaps the >> folding should be a separate prereq patch. > > You could even pass a platform_timesource as a parameter to that > generic read counter and TSC helper. This was the plan, yes, in case I was asked to follow that route (or if I decided myself that I'd prefer that significantly over the redundancy). >> --- a/xen/arch/x86/time.c >> +++ b/xen/arch/x86/time.c >> @@ -392,9 +392,36 @@ static u64 read_hpet_count(void) >> return hpet_read32(HPET_COUNTER); >> } >> >> +static uint32_t __init read_hpet_and_tsc(uint64_t *tsc) >> +{ >> + uint64_t tsc_prev = *tsc = rdtsc_ordered(), tsc_min = ~0; >> + uint32_t best = best; >> + unsigned int i; >> + >> + for ( i = 0; ; ++i ) >> + { >> + uint32_t hpet = hpet_read32(HPET_COUNTER); >> + uint64_t tsc_cur = rdtsc_ordered(); >> + uint64_t tsc_delta = tsc_cur - tsc_prev; >> + >> + if ( tsc_delta < tsc_min ) >> + { >> + tsc_min = tsc_delta; >> + *tsc = tsc_cur; >> + best = hpet; >> + } >> + else if ( i > 2 ) >> + break; >> + >> + tsc_prev = tsc_cur; > > Would it be better to set tsc_prev to the current TSC value here in > order to minimize the window you are measuring? Or else tsc_delta will > also account for the time in the loop code, and there's more > likeliness from being interrupted? I did consider this (or some variant thereof), but did for the moment conclude that it wouldn't make enough of a difference. If you think it would be meaningfully better that way, I'll switch. Both here and for the aspect above, please express you preference (i.e. not in the form of a question). > I guess being interrupted four times so that all loops are bad is very > unlikely. > > Since this loop could in theory run multiple times, do we need to > check that the counter doesn't overflow? Or else the elapsed counter > value would be miscalculated. I don't think I see any issue with overflow here. It's the callers who need to care about that. And I don't think this function will run for long enough such that a counter would not only wrap, but also reach its initial count again. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |