[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] VPMU interrupt unreliability

On 22/07/17 21:16, Kyle Huey wrote:
> Last year I reported[0] seeing occasional instability in performance
> counter values when running rr[1], which depends on completely
> deterministic counts of retired conditional branches of userspace
> programs.
> I recently identified the cause of this problem.  Xen's VPMU code
> contains a workaround for an alleged Nehalem bug that was added in
> 2010[2].  Supposedly if a hardware performance counter reaches 0
> exactly during a PMI another PMI is generated potentially causing an
> endless loop.  The workaround is to set the counter to 1.  In 2013 the
> original bug was believed to affect more than just Nehalem and the
> workaround was enabled for all family 6 CPUs.[3]  This workaround
> unfortunately disturbs the counter value in non-deterministic ways
> (since the value the counter has in the irq handler depends on
> interrupt latency), which is fatal to rr.
> I've verified that the discrepancies we see in the counted values are
> entirely accounted for by the number of times the workaround is used
> in any given run.  Furthermore, patching Xen not to use this
> workaround makes the discrepancies in the counts vanish.  I've added
> code[4] to rr that reliably detects this problem from guest userspace.
> Even with the workaround removed in Xen I see some additional issues
> (but not disturbed counter values) with the PMI, such as interrupts
> occasionally not being delivered to the guest.  I haven't done much
> work to track these down, but my working theory is that interrupts
> that "skid" out of the guest that requested them and into Xen itself
> or perhaps even another guest are not being delivered.
> Our current plan is to stop depending on the PMI during rr's recording
> phase (which we use for timeslicing tracees primarily because it's
> convenient) to enable producing correct recordings in Xen guests.
> Accurate replay will not be possible under virtualization because of
> the PMI issues; that will require transferring the recording to
> another machine.  But that will be sufficient to enable the use cases
> we care about (e.g. record an automated process on a cloud computing
> provider and have an engineer download and replay a failing recording
> later to debug it).
> I can think of several possible ways to fix the overcount problem, including:
> 1. Restricting the workaround to apply only to older CPUs and not all
> family 6 Intel CPUs forever.
> 2. Intercepting MSR loads for counters that have the workaround
> applied and giving the guest the correct counter value.
> 3. Or perhaps even changing the workaround to disable the PMI on that
> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works
> on the relevant hardware.
> Since I don't have the relevant hardware to test changes to this
> workaround on and rr can avoid these bugs through other means I don't
> expect to work on this myself, but I wanted to apprise you of what
> we've learned.

Thankyou for this investigation and analysis.

I think the first action is to try and identify what this mysterious
erratum is.  Despite the plethora of perf errata, the best I can find is
AAK135 "Multiple Performance Monitor Interrupts are Possible on Overflow
of IA32_FIXED_CTR2" which still doesn't obviously match the described

CC'ing Dietmar who was the author of the original workaround.  Do you
recall any other information which might be helpful in tracking this
down?  I also don't see any similar workaround in the Linux event
infrastructure, which makes me wonder whether the observed behaviour was
a side effect of something else Xen specific.

Having Xen perturb the counters behind a guests back (in a way contrary
to architectural or errata behaviour) is obviously a bad thing, and we
should fix that.  I do have access to hardware, but am lacking vPMU


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.