|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy
Dan Magenheimer wrote: In EL5u1-32 however it looks like the fractions are accounted for. Indeed the EL5u1-32 "lost tick handling" code resembles the Linux/ia64 code which is what I've always assumed was the "missed tick" model. In this case, I think no policy is necessary and the measured skew should be identical to any physical hpet skew. I'll have to test this hypothesis though.I've tested this hypothesis and it seems to hold true. This means the existing (unpatched) hpet code works fine on EL5-32bit (vcpus=1) when hpet is the clocksource, even when the machine is overcommitted. A second hypothesis still needs to be tested that Dave's patch will not make this worse. Interesting, thanks for pointing this out and confirming. (Note that per previous discussion, my EL5u1-32bit guest running on an Intel dual-core physical box chose tsc as the best clocksource and I had to override it with clock=hpet in the kernel command line.) Is there one setting for all Linux guests that makes them choose hpet? Is it "clock=hpet clocksource=hpet"? I know you wrote at length about this before. Yes, that makes sense and concurs with what I remember from the EL4u5-32 code. If this is true, one would expect the default "no missed tick" policy to see time moving faster than an external source -- the first missed tick delivered after a long sleep would "catch up" and then the remainder would each add another tick.Indeed with the existing (unpatched) hpet code, time is running faster on EL4u5-32 (vcpus=1, when overcommited). So Dave's patch is definitely needed here. Its good to get the verification of this. thanks, Dave Will try 64-bit next. Dan-----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] Sent: Monday, June 09, 2008 9:21 PM To: 'Dave Winchell'; 'Keir Fraser' Cc: 'xen-devel'; 'Ben Guthro' Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracyI'll tell you what I recall about this. Tomorrow I'll check the guest code to verify. I think that Linux declares a full tick, even if the interrupt is early. That's the problem.Yes, that makes sense and concurs with what I remember from the EL4u5-32 code. If this is true, one would expect the default "no missed tick" policy to see time moving faster than an external source -- the first missed tick delivered after a long sleep would "catch up" and then the remainder would each add another tick.On the other hand, if the interrupt is late it in effect declaresa tick plus fraction. If it just declared the fraction inthe first place,we could deliver the interrupts whenever we wanted.My read of the EL4u5-32 code is that the fraction is discarded and a new tick period commences at "now", so the fractions eventually accumulate as lost time. In EL5u1-32 however it looks like the fractions are accounted for. Indeed the EL5u1-32 "lost tick handling" code resembles the Linux/ia64 code which is what I've always assumed was the "missed tick" model. In this case, I think no policy is necessary and the measured skew should be identical to any physical hpet skew. I'll have to test this hypothesis though. -----Original Message-----From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]On Behalf Of Dave WinchellSent: Monday, June 09, 2008 5:35 PM To: dan.magenheimer@xxxxxxxxxx; Keir Fraser Cc: Dave Winchell; xen-devel; Ben Guthro Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracyThe Linux policy is more subtle, but is required to go from .1% to .03%.Thanks for the good documentation which I hadn't thoroughly read until now. I now understand that the essence of your hpet missed ticks policy is to ensure that ticks are never delivered too close together. But I'm trying to understand WHY your patch works, in other words, what problem it is countering.I'll tell you what I recall about this. Tomorrow I'll check the guest code to verify. I think that Linux declares a full tick, even if the interrupt is early. That's the problem. On the other hand, if the interrupt is late it in effect declaresa tick plus fraction. If it just declared the fraction in the first place,we could deliver the interrupts whenever we wanted. Its really not that different than the missed ticks policy in vpt.c except that there the period in vpt.c is based on start of interrupt and I have improved that with end-of interrupt as described in the patch note. I don't recall what prompted me to try end-of-interrupt, but I saw a significant improvement. I may have been running a monotonicity test at the same time to explain the lock contention mentioned in the write-up.I care about this for more reasons than just because it is interesting: (1) I'd like to feel confident that it is fixing a bug rather than just a symptom of a bug; and (2) I wonder how universally it is applicable.Its worked well my my small set of guests. You and our QA are going to tell us about the wider set. It doesn't matter if guest A handles interrupts closely spaced or not, just whether it handles them far apart. So it should be pretty universal with guests that really handle missed ticks. I think its interesting that some 32bit Linux guests handle missed ticks for hpet.I see from code examination in mark_offset_hpet() in RHEL4u5/arch/i386/kernel/timers/timer_hpet.c, that the correction for lost ticks is just plain wrong in a virtual environment. (Suppose for example that a virtual tick was delivered every 1.999*hpet_tick... I think the clock would be off by 50%!) Is this the bug that is being "countered" by your policy?I haven't looked at that code, perhaps. I'll check it tomorrow.However, the lost tick handling in RHEL5u1/kernel/timer.c (which I think is used also for hpet) is much better so I am eager to find out if your policy works there too. If the hpet missed tick policy works for both, though, I should be happy, though I wonder about upstream kernels (e.g. the trend toward tickless).I wasn't aware of this trend. If its robust, however, it should handle late interrupts ...That said, I'd rather see this get into Xen 3.3 and worry about upstream kernels later :-)Regards, Dave -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] Sent: Mon 6/9/2008 6:02 PM To: Dave Winchell; Keir Fraser Cc: Ben Guthro; xen-devel Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracyThe Linux policy is more subtle, but is required to go from .1% to .03%.Thanks for the good documentation which I hadn't thoroughly read until now. I now understand that the essence of your hpet missed ticks policy is to ensure that ticks are never delivered too close together. But I'm trying to understand WHY your patch works, in other words, what problem it is countering. I care about this for more reasons than just because it is interesting: (1) I'd like to feel confident that it is fixing a bug rather than just a symptom of a bug; and (2) I wonder how universally it is applicable. I see from code examination in mark_offset_hpet() in RHEL4u5/arch/i386/kernel/timers/timer_hpet.c, that the correction for lost ticks is just plain wrong in a virtual environment. (Suppose for example that a virtual tick was delivered every 1.999*hpet_tick... I think the clock would be off by 50%!) Is this the bug that is being "countered" by your policy? However, the lost tick handling in RHEL5u1/kernel/timer.c (which I think is used also for hpet) is much better so I am eager to find out if your policy works there too. If the hpet missed tick policy works for both, though, I should be happy, though I wonder about upstream kernels (e.g. the trend toward tickless). That said, I'd rather see this get into Xen 3.3 and worry about upstream kernels later :-) -----Original Message----- From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx] Sent: Sunday, June 08, 2008 2:32 PM To: dan.magenheimer@xxxxxxxxxx; Keir Fraser Cc: Ben Guthro; xen-devel; Dave Winchell Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy Hi Dan,While I am fully supportive of offering hardware hpet as an option for hvm guests (let's call it hwhpet=1 for shorthand), I am very surprised by your preliminary results; the most obvious conclusion is that Xen system time is losing time at the rate of 1000 PPM though its possible there's a bug somewhere else in the "time stack". Your Windows result is jaw-dropping and inexplicable, though I have to admit ignorance of how Windows manages time.I think xen system time is fine. You have to add the interrupt delivery policies decribed in the write-up for the patch to get accurate timekeeping in the guest. The windows policy is obvious and results in a large improvement in accuracy. The Linux policy is more subtle, but is required to go from .1% to .03%.I think with my recent patch and hpet=1 (essentially the same as your emulated hpet), hvm guest time should track Xen system time. I wonder if domain0 (which if I understand correctly is directly using Xen system time) is also seeing an error of .1%? Also I wonder for the skew you are seeing (in both hvm guests and domain0) is time moving too fast or two slow?I don't recall the direction. I can look it up in my notes at work tomorrow.Although hwhpet=1 is a fine alternative in many cases, it may be unavailable on some systems and may cause significant performance issues on others. So I think we will still need to track down the poor accuracy when hwhpet=0.Our patch is accurate to < .03% using the physical hpet mode or the simulated mode.And if for some reason Xen system time can't be made accurate enough (< 0.05%), then I think we should consider building Xen system time itself on top of hardware hpet instead of TSC... at least when Xen discovers a capable hpet.In our experience, Xen system time is accurate enough now.One more thought... do you know the accuracy of the TSC crystals on your test systems? I posted a patch awhile ago that was intended to test that, though I guess it was only testing skew of different TSCs on the same system, not TSCs against an external time source.I do not know the tsc accuracy.Or maybe there's a computation error somewhere in the hvm hpet scaling code? Hmmm...Regards, Dave -----Original Message----- From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] Sent: Fri 6/6/2008 4:29 PM To: Dave Winchell; Keir Fraser Cc: Ben Guthro; xen-devel Subject: RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy Dave -- Thanks much for posting the preliminary results! While I am fully supportive of offering hardware hpet as an option for hvm guests (let's call it hwhpet=1 for shorthand), I am very surprised by your preliminary results; the most obvious conclusion is that Xen system time is losing time at the rate of 1000 PPM though its possible there's a bug somewhere else in the "time stack". Your Windows result is jaw-dropping and inexplicable, though I have to admit ignorance of how Windows manages time. I think with my recent patch and hpet=1 (essentially the same as your emulated hpet), hvm guest time should track Xen system time. I wonder if domain0 (which if I understand correctly is directly using Xen system time) is also seeing an error of .1%? Also I wonder for the skew you are seeing (in both hvm guests and domain0) is time moving too fast or two slow? Although hwhpet=1 is a fine alternative in many cases, it may be unavailable on some systems and may cause significant performance issues on others. So I think we will still need to track down the poor accuracy when hwhpet=0. And if for some reason Xen system time can't be made accurate enough (< 0.05%), then I think we should consider building Xen system time itself on top of hardware hpet instead of TSC... at least when Xen discovers a capable hpet. One more thought... do you know the accuracy of the TSC crystals on your test systems? I posted a patch awhile ago that was intended to test that, though I guess it was only testing skew of different TSCs on the same system, not TSCs against an external time source. Or maybe there's a computation error somewhere in the hvm hpet scaling code? Hmmm... Thanks, Dan-----Original Message----- From: Dave Winchell [mailto:dwinchell@xxxxxxxxxxxxxxx] Sent: Friday, June 06, 2008 1:33 PM To: dan.magenheimer@xxxxxxxxxx; Keir Fraser Cc: Ben Guthro; xen-devel; Dave Winchell Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracy Dan, Keir: Preliminary tests results indicate an error of .1% for Linux 64 bit guests configuredfor hpet with xen-unstable as is. As we have discussed manytimes, thentp requirement is .05%.Tests on the patch we just submitted for hpet haveindicated errors of.0012% on this platform under similar test conditions and .03% on other platforms. Windows vista64 has an error of 11% using hpet with the xen-unstable bits. In an overnight test with our hpet patch, the Windows vista error was .008%. The tests are with two or three guests on a physical node, all under load, and with the ratio of vcpus to phys cpus > 1. I will continue to run tests over the next few days. thanks, Dave Dan Magenheimer wrote:Hi Dave and Ben -- When running tests on xen-unstable (without your patch),please ensurethat hpet=1 is set in the hvm config and also I thinkthat when hpetis the clocksource on RHEL4-32, the clock IS resilient tomissed ticksso timer_mode should be 2 (vs when pit is the clocksourceon RHEL4-32,all clock ticks must be delivered and so timer_mode should be 0). Perhttp://lists.xensource.com/archives/html/xen-devel/2008-06/msg 00098.html it'smy intent to clean this up, but I won't get to it until next week. Thanks, Dan -----Original Message----- *From:* xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx]*OnBehalf Of *DaveWinchell *Sent:* Friday, June 06, 2008 4:46 AM *To:* Keir Fraser; Ben Guthro; xen-devel *Cc:* dan.magenheimer@xxxxxxxxxx; Dave Winchell *Subject:* RE: [Xen-devel] [PATCH 0/2] Improve hpet accuracy Keir, I think the changes are required. We'll run some teststoday today sothat we have some data to talk about. -Dave -----Original Message----- From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx on behalfof Keir FraserSent: Fri 6/6/2008 4:58 AM To: Ben Guthro; xen-devel Cc: dan.magenheimer@xxxxxxxxxx Subject: Re: [Xen-devel] [PATCH 0/2] Improve hpet accuracyAre these patches needed now the timers are built onXen systemtime rather than host TSC? Dan has reported much bettertime-keeping with hispatch checked in, and it¹s for sure a lot less invasive thanthis patchset. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |