Xen project Mailing List

Re: [Xen-devel] [PATCH RFC 0/6] x86/time: PVCLOCK_TSC_STABLE_BIT support

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

From: Joao Martins <joao.m.martins@xxxxxxxxxx>

Date: Tue, 29 Dec 2015 17:37:12 +0000

Cc: Haozhong Zhang <haozhong.zhang@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Tue, 29 Dec 2015 17:37:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 12/29/2015 02:58 PM, Andrew Cooper wrote: > On 28/12/2015 16:59, Joao Martins wrote: >> Hey! >> >> I've been working on pvclock vdso support on Linux guests, and came >> across Xen lacking support for PVCLOCK_TSC_STABLE_BIT flag which is >> required for vdso as of the latest pvclock ABI shared with KVM. > > , and originally borrowed from Xen. > > Please be aware that all of this was originally the Xen ABI (c/s > 1271b793, 2005) and was added to Linux (c/s 7af192c, 2008) for join use > with KVM. In particular, Linux c/s 424c32f1a (which introduces 'flags') > and every subsequent change in pvclock-abi.h violates the comment at the > top, reminding people that the hypervisors must be kept in sync. > OK, I indeed was aware that this came originally from Xen. Should perhaps include a comment similar to what you state above? > By the looks of things, the structures are still compatible, and having > the two in sync is in everyones best interest. The first steps here need > to be Linux upstreaming their local modifications, and further efforts > made to ensuring that ABI changes don't go unnoticed as far as Xen is > concerned (entry in the maintainers file with xen-devel listed?) Good point. Fortunately the changed part was the padding region (which was always zeroed out on update_vcpu_system_time) so compatibility was kept. > >> In addition, I've found some problems which aren't necessarily visible >> to kernel as the pvclock algorithm in linux keeps the highest pvti >> time read among all cpus. But as is, a process using vdso gettimeofday >> observes a significant amount of warps (i.e. time going backwards) and >> it could be due to 1) time calibration skew in xen rendezvous >> algorithm, 2) xen clocksource not in sync with TSC and 3) in >> situations when guests unaware of VCPUS moving to a different PCPU. >> The warps are seen more frequently on PV guests (potentially because >> vcpu time infos are only updated when guest is in kernel mode, and >> perhaps lack of tsc_offset?), and in rare ocasions on HVM guests. And >> it is worth noting that with guests VCPUs pinned, only PV guests see >> these warps. But on HVM guests specifically: such warps only occur >> when one of guest VCPUs is pinned to CPU0. > > These are all concerning findings (especially the pinning on cpu0). > Which version of Xen have you been developing on? > When I started working on this it was based on xen 4.5 (linux <= 4.4), but lately (~ 1 month) I have been working on xen-unstable. > Haozhong Zhang (CC'd) found and fixed several timing related bugs as > part of his VMX TSC Scaling support series (Message root at > <1449435529-12989-1-git-send-email-haozhong.zhang@xxxxxxxxx>). I would > be surprised if your identified bugs and his identified bugs didn't at > least partially overlap. (Note that most of the series has yet to be > applied). I am afraid this might be a different bug: I have been basing my patches on top of his work and also testing with a non-native vtsc_khz. (as part of the testing requested from Boris). But I think these sort of issues might not affect in kernel since pvclock in linux takes care of ensuring monotonicity even when pvti's aren't guaranteed to be. > > As to the specific options you identify, the time calibration rendezvous > is (or should be) a nop on modern boxes with constant/invarient/etc > TSCs. I have a low priority TODO item to investigating conditionally > disabling it, as it is a scalability concern on larger boxes. Option 2 > seems more likely. Hm, from what I understood it's not a nop: it's just that there is a different rendezvous function depending on whether TSC is safe or not. And I believe there to be where it lies the problem. The tsc_timestamp written in the pvti that is used to calculate the delta on the guest takes into account the time that the slave CPU rendezvous with master, thus could mark the TSC at a later instant depending on the CPU. And this would lead to a situation such as: CPU 3 writing an earlier tsc than say CPU 2 (thus leading to a smaller delta between both as seen by the guest, even though TSC is guaranteed to be monotonic) and guest would see a warp when doing a read on CPU 3 first right before CPU 2 and see time going backwards. Patch 5 also extends on that and I followed a similar approach to KVM. But you suggestion of basically being a nop could makes things simpler (and perhaps cleaner?), and eliminates potential scalability issues when host has a large number of CPUs: on a system with a safe TSC every EPOCH it would read platform time and "request" the vCPU to update the pvti next time it's schedule() or continue_running() is called? This way we could remove the CPUS spinning for master to write stime and tsc_timestamp. > > As for option 3, on a modern system it shouldn't make any difference. > On an older system, it can presumably only be fixed by a guest > performing its rdtsc between the two version checks, to ensure that it > sees a consistent timestamp and scale, along with the hypervisor bumping > version on deschedule, and against on schedule. However, this might be > contrary to proposed plan to have one global pv wallclock. > Indeed, and vdso gettimeofday/clock_gettime already do that i.e. versions checks around rdtsc/pvti reads. >> >> PVCLOCK_TSC_STABLE_BIT is the flag telling the guest that the >> vcpu_time_info (pvti) are monotonic as seen by any CPU, >> and supporting it could help fixing the issue mentioned above. > > Surely fixing some of these bugs are prerequisites to supporting > TSC_STABLE_BIT ? Either way, we should see about doing both. > I think so too. >> This >> series aims to propose a solution to that and it's divided as >> following: >> >> * Patch 1: Adds the missing flag field to vcpu_time_info. >> * Patch 2: Adds a new clocksource based on TSC >> * Patch 3, 4: Cleanups for patch 5 >> * Patch 5: Implements the PVCLOCK_TSC_STABLE_BIT. >> * Patch 6: Same as 5 before but for other platform timers >> >> I have some doubts on the correctness of Patch 6 but was the only >> solution I found for supporting PVCLOCK_TSC_STABLE_BIT when using >> other clocksources (i.e. != tsc). The test was running time-warp-test, >> that constantly calls clock_gettime/gettimeofday on every CPU. It >> measures a delta with the previous returned value and mark a warp if >> it's negative. I measured it during periods of 1h and 6h and check how >> many warps and their values (alongside the amount of time skew). >> Measurements are included in individual patches. >> >> Note that most of the testing has been done with Linux 4.4 in which >> these warps/skew could be easily observed as vdso would use each vCPU >> pvti. Though Linux 4.5 will change this behaviour and use only the >> vCPU0 pvti though still requiring PVCLOCK_TSC_STABLE_BIT usage. >> >> I've been using it for a while in the past weeks with no issues so far >> though still requires testing on bad TSC machines. But I would like to >> get folks comments/suggestions if this is the correct approach in >> implementing this flag. > > If you have some test instructions, I can probably find some bad TSC > machines amongst the selection of older hardware in the XenServer test pool. Thanks! I have a bad TSC machine for testing already, and these tests would be to make sure there aren't regressions, since with a bad TSC PVCLOCK_TSC_STABLE_BIT wouldn't be set. Most of my tests were on a Broadwell CPU single socket. Joao > > ~Andrew > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.