[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Re: [PATCH] CPUIDLE: revise tsc-save/restore to avoid big tsc skew between cpus
>From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] >Sent: Friday, December 05, 2008 6:13 PM >On 05/12/2008 10:05, "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > >>> From: Tian, Kevin >>> Sent: Friday, December 05, 2008 6:00 PM >>> >>> Then if we agree always aligning TSC to absolute platform timer >>> counter, it doesn't make difference to use cpu_khz or local >tsc_scale >>> since both are using scale factor calculated within a small period >>> to represent the underlying crystal frequency. >>> >> >> Let me hold back above words. As you said, cpu_khz has lower accuracy >> by cutting down lowest bits. > >Yes. Also bear in mind that absolute ongoing synchronisation >between TSCs >*does not matter*. Xen will happily synchronise system time on top of >(slowly enough, constantly enough) diverging TSCs, and of >course HVM VCPUs >re-set their guest TSC offset when moving between host CPUs. We had measurement on following cases: (4 idle up-hvm-rhel5 with 2 cores) a) disable deep C-state b) enable deep C-state, with original tsc save/restore at each C-state entry/exit c) enable deep C-state, and restore TSC based on local calibration stamp and tsc scale d) enable deep C-state, and restore TSC based on monotonic platform stime and cpu_khz system time skew TSC skew a) hundred ns several us b) accumulating larger accumulating larger c) dozens of us accumulating larger d) hundred ns several us Large system time skew can impact both pv and hvm domain. pv domain will complain time went backward when migrating to a cpu with slower NOW(). hvm domain will have delayed vpt expiration when migrating to slower one, or vice versa missed ticks are accounted by xen for some timer mode. Both c) and d) ensures skew within a stable range. Large TSC skew is normally OK with pv domain, since xen time stamps are synced at gettimeofday and timer interrupt within pv guest. Possibly impacted is some user processes which uses rdtsc directly. However larget TSC skew is really bad for hvm guest, especially when guest TSC offset is never adjusted at vcpu migration. That will cause guest itself to catch up missing ticks in a batch, which results softlockup warning or DMA time out. Thus with c) we can still observe guest complains after running a enough long time. I'm not sure whether guest TSC offset can be adjusted accurately, since you need first get TSC skew among cores which may require issuing IPI and adds extra overhead. It just gets really messed to handle an accumulating TSC skew for hvm guest. That's why we go with option d) which really exposes same level of constraints compared to disabled case. This is not perfect solution, but it shows more stable result than others. > >What *does* matter is the possibility of warping a host TSC >value on wake >from deep sleep, compared with its value if the sleep had >never happened. In >this case, system time will be wrong (since we haven't been through a >calibration step since waking up) and HVM timers will be >wrong. And using >start-of-day timestamp plus cpu_khz makes this more likely. The correct >thing to do is obey the most recent set of local calibration values. > I assume you meant S3 for "deep sleep"? If yes, I don't think it an issue. A sane dom0 S3 flow will only happen after other domains has been notified with virtual S3 event, and thus after waken up from dom0 S3, every domain will resume its timekeeping sub-system. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |