[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] RE: [PATCH] record max stime skew (was RE: [PATCH] strictly increasing hvm guest time)
> > Well one suspicion I had was that very long hpet reads were > > getting serialized, but I tried clocksource=acpi and > > clocksource=pit and get similar skew range results. > > In fact pit shows a max of >17000 vs hpet and acpi closer > > to 11000. (OTOH, I suppose it IS possible that this is > > roughly how long it takes to read each of these platform > > timers.) > > That ought to be easy to check. I would expect that the PIT, > for example, > could take a couple of microseconds to access. > > -- Keir (I haven't seen the patch applied... since it just collects data, it would be nice if it was applied so others could try it.) To follow up on this, I tried a number of tests but wasn't able to identify the problem and have given up (for now). In case someone else starts looking at this (or if any of my tests suggest a solution to someone), I thought I'd document what I tried. PROBLEM: Xen system time skew between processors local time and platform time is generally "small" but "sometimes" gets quite "large". This is important because, the larger the skew, the more likely an hvm guest will experience time stopping or (in some cases) time going backwards. On my box, "small" is under 1 usec, "large" is 9-18 usec, and "sometimes" is about one out of 500 measurements. Note that my box is a recent vintage Intel single-socket dual-core ("Conroe"). I suspect periodically some lock is being waited for for a long time, or maybe an unexpected interrupt is occurring, but I didn't find anything through code reading or experiments. TEST METHOD: The patch I sent on this thread collects data whenever local_time_calibration() is run (which is 1Hz on each processor) and "xm debug-key t" prints this data so it can be seen with "xm dmesg". To see the problem, one need only boot dom0 and run xm debug-key and xm dmesg. 1) CONJECTURE: Related to how long it takes to read the platform timer The max skew (and distribution) are definitely different depending on whether clocksource=hpet or clocksource=pit. For hpet, I am almost always seeing a max skew of 11000+ and with pit 17000+. ONCE (over many hours of runs) I saw a skew with hpet of 15000. However, I added code in the platform timer read routine (inside all locks but NOT with interrupts off) to artificially lengthen a platform timer read and it made no difference in the measurements 2) CONJECTURE: Max skew only occurs on some processors (e.g. not on the one that does the platform calibration) Nope, if you wait long enough max skew is fairly close on all processors (though in some cases, it seems to take a long time... perhaps because of unbalanced load?) 3) CONJECTURE: Max skew occurs on platform timer overflow. Possibly, but there is certainly not a 1-1 correspondence. Sometimes there are more large skews than overflows and sometimes less. 4) CONJECTURE: Artifact of ntpd running Nope, same skews whether ntpd is running on dom0 or not 5) CONJECTURE: Related to frequency changes or suspends Nope, none of these happening on my box. 6) CONJECTURE: "Weirdness can happen" comment in time.c Nope, this path isn't getting executed. 7) CONJECTURE: Result of natural skews between platform timer and tsc, plus jitter. Unfixable. Possible, untested, not sure how. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |