Xen project Mailing List

[Xen-devel] VPMU interrupt unreliability

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Date: Sat, 22 Jul 2017 13:16:20 -0700

Cc: Robert O'Callahan <robert@xxxxxxxxxxxxx>

Delivery-date: Sat, 22 Jul 2017 20:16:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Last year I reported[0] seeing occasional instability in performance counter values when running rr[1], which depends on completely deterministic counts of retired conditional branches of userspace programs. I recently identified the cause of this problem. Xen's VPMU code contains a workaround for an alleged Nehalem bug that was added in 2010[2]. Supposedly if a hardware performance counter reaches 0 exactly during a PMI another PMI is generated potentially causing an endless loop. The workaround is to set the counter to 1. In 2013 the original bug was believed to affect more than just Nehalem and the workaround was enabled for all family 6 CPUs.[3] This workaround unfortunately disturbs the counter value in non-deterministic ways (since the value the counter has in the irq handler depends on interrupt latency), which is fatal to rr. I've verified that the discrepancies we see in the counted values are entirely accounted for by the number of times the workaround is used in any given run. Furthermore, patching Xen not to use this workaround makes the discrepancies in the counts vanish. I've added code[4] to rr that reliably detects this problem from guest userspace. Even with the workaround removed in Xen I see some additional issues (but not disturbed counter values) with the PMI, such as interrupts occasionally not being delivered to the guest. I haven't done much work to track these down, but my working theory is that interrupts that "skid" out of the guest that requested them and into Xen itself or perhaps even another guest are not being delivered. Our current plan is to stop depending on the PMI during rr's recording phase (which we use for timeslicing tracees primarily because it's convenient) to enable producing correct recordings in Xen guests. Accurate replay will not be possible under virtualization because of the PMI issues; that will require transferring the recording to another machine. But that will be sufficient to enable the use cases we care about (e.g. record an automated process on a cloud computing provider and have an engineer download and replay a failing recording later to debug it). I can think of several possible ways to fix the overcount problem, including: 1. Restricting the workaround to apply only to older CPUs and not all family 6 Intel CPUs forever. 2. Intercepting MSR loads for counters that have the workaround applied and giving the guest the correct counter value. 3. Or perhaps even changing the workaround to disable the PMI on that counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works on the relevant hardware. Since I don't have the relevant hardware to test changes to this workaround on and rr can avoid these bugs through other means I don't expect to work on this myself, but I wanted to apprise you of what we've learned. - Kyle [0] https://lists.xen.org/archives/html/xen-devel/2016-10/msg01288.html [1] http://rr-project.org/ [2] https://xenbits.xen.org/gitweb/?p=xen.git;a=blobdiff;f=xen/arch/x86/hvm/vmx/vpmu_core2.c;h=44aa8e3c47fc02e401f5c382d89b97eef0cd2019;hp=ce4fd2d43e04db5e9b042344dd294cfa11e1f405;hb=3ed6a063d2a5f6197306b030e8c27c36d5f31aa1;hpb=566f83823996cf9c95f9a0562488f6b1215a1052 [3] https://xenbits.xen.org/gitweb/?p=xen.git;a=blobdiff;f=xen/arch/x86/hvm/vmx/vpmu_core2.c;h=15b2036c8db1e56d8865ee34c363e7f23aa75e33;hp=9f152b48c26dfeedb6f94189a5fe4a5f7a772d83;hb=75a92f551ade530ebab73a0c3d4934dfb28149b5;hpb=71fc4da1306cec55a42787310b01a1cb52489abc [4] See https://github.com/mozilla/rr/blob/a5d23728cd7d01c6be0c79852af26c68160d4405/src/PerfCounters.cc#L313, which sets up a counter and then does some pointless math in a loop to reach exactly 500 conditional branches. Xen will report 501 branches because of this bug. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.