[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] BUG: NOW() seems to (sometimes) go backwards!
On 06/08/2016 02:43 PM, Dario Faggioli wrote: > On Wed, 2016-06-08 at 04:42 -0600, Jan Beulich wrote: >>>>> On 07.06.16 at 17:54, <dario.faggioli@xxxxxxxxxx> wrote: >>> So, it looks to me that the TSC is actually ok, and it could be the >>> local_tsc_stamp and scale_delta() magic done in get_s_time_fixed() >>> (which, AFAICUI, is there to convert cycles to time) that does not >>> guarantee the result to be monotonic, even if the input is... >>> Thoughts? >> Indeed this smells like an issue in the scaling. However, the scaling >> values vary only when !CONSTANT_TSC. Can you check that this >> flag is actually set on that system? >> > Checked. I do have it. I instrumented the code to print stuff if it > finds it, and it prints. > > Also: > root@Zhaman:~# xl debug-keys s > (XEN) [ 406.719464] TSC marked as reliable, warp = 0 (count=3) > (XEN) [ 406.719467] dom1(hvm): > mode=0,ofs=0xffffd9279716c276,khz=2394069,inc=4,vtsc count: 195367 kernel, 0 > user > >> (I hope you're not running a >> strange Dom0 setup with FREQCTL_dom0_kernel in effect.) >> > I've no idea what this is. I've been running 4.1.0, built myself, and > stock Debian unstable 4.5.0, and I'm seeing the issue in both cases. > > Looking FREQCTL_dom0_kernel up, I guess you mean what happens when one > passes cpufreq="dom0-kernel" to xen on boot command line. In which > case, no, I'm not doing that. > >> And >> at the same time that it's time_calibration_tsc_rendezvous() that >> is in use? >> > The code you're referring to should be this: > > /* If we have constant-rate TSCs then scale factor can be shared. */ > if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) > { > /* If TSCs are not marked as 'reliable', re-sync during rendezvous. */ > if ( !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ) > time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous; > } > > And I have both X86_FEATURE_CONSTANT_TSC and X86_FEATURE_TSC_RELIABLE. > > I've again instrumented the code in order to check whether it is > time_calibration_tsc_rendezvous() or time_calibration_std_rendezvous() > that is being used, and it's the latter: > > (XEN) [ 1.795916] TSC reliable. Yay!! Using ffff82d080198362 for > rendevousez > > [dario@Solace xen.git] $ objdump -D xen/xen-syms |grep ffff82d080198362 > ffff82d080198362 <time_calibration_std_rendezvous>: > > which looks correct to me. Probably one other codepath that you would be interested is on local_time_calibration() commenting the chunk on this conditional: if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC ) { ... } And with that you rarely see time going backwards since the scaling would be adjusted to each cpu calibration error - but it would be a (very very close) approximation on software not exactly guaranteed as hardware TSC. But trying this out would be just for experimentation - doesn't look to be a correct way of addressing this. > >> Yet when the scaling values get set only once at boot, monotonic >> (cross-CPU) TSC means monotonic (cross-CPU) returns from NOW(). >> > Yep. And at this point, this is what needs to be verified, I guess... I think get_s_time_fixed doesn't guarantee monotonicity across CPUs being it different socket or (SMT) siblings. local_tsc_stamp is seeded differently on each CPU i.e. rdtsc() right after doing the platform time read on the calibration rendezvous. Plus stime_local_stamp is seeded with values taken from platform timer (HPET, ACPI, PIT) on local_time_calibration which means that get_s_time isn't solely based on TSC and that there will always be a gap between stime_local_stamp and local_tsc_stamp. TSC is indeed monotonic on a TSC invariant box, but the delta that is computed differs from cpu to cpu according to when time calibration happens on each CPU - thus not guaranteeing the desired monotonicity property. Having stime_local_stamp be based on the same timestamp that of the local_tsc_stamp plus having a single local_tsc_stamp as reference would address this behaviour - see also below. > >> In any event - could you try to exclude C- and P-state effects? Of >> course that makes sense only if you can reasonably repro the >> problem situation (and hence can tell from its absence over a certain >> period of time that whatever change was done did have some >> positive effect). >> > It's actually quite hard *NOT* to reproduce the problem... it happens > all the time, and if the load is high enough, I see the "Time went > backwards?" printk several times per second! > > So, trying to do what you suggest in an online fashion, i.e., issueing: > > # xenpm set-max-cstate 0 > # xenpm set-scaling-governor all performance > > did not change the situation (I still see the printks). > > I've then tried passing both cpufreq="none" and max_cstate=0 to xen at > boot, but they made no difference at all either. > > Most of the time, we're speaking of very small skews, e.g.: > > (XEN) [ 59.999959] WARNING: __update_runq_load: Time went backwards? now > 59999946079 llu 59999946085 > (XEN) [ 117.595508] WARNING: __update_runq_load: Time went backwards? now > 117595495295 llu 117595495310 > > i.e., 6 nanoseconds or 15 nanoseconds! > > Then there are instances where the difference is bigger (microseconds > time scale, like in the first email of the thread). It does sound similar to the cross cpu calibration error I was observing but on a place other than guests pv time infos updates. My pvclock TSC stable bit series could address your issue - but only for clocksource=tsc added there, though it would be quite limitative for the other cases i.e. clocksource=hpet|acpi|etc. One patch in the RFC of that series, could fix your issue but it was more of a hacky patch for discussion [0]. It converts a value returned from the platform timer (which monotonically increases) and scales it to the equivalent in TSC, assuming the TSC is always ahead of the other clocksources. But probably the correct fix would be to do as linux does with pvclock? Linux keeps an accumulator representing the latest system time across CPUs and atomically update it if now > last. Thus always returning the most up-to-date value. But then NOW() would take a hit in performance. [0] http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg02832.html Cheers, Joao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |