[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 2/2] x86/Intel: virtualize support for cpuid faulting



On Fri, Oct 21, 2016 at 8:52 AM, Kyle Huey <me@xxxxxxxxxxxx> wrote:
> On Thu, Oct 20, 2016 at 7:40 AM, Boris Ostrovsky
> <boris.ostrovsky@xxxxxxxxxx> wrote:
>> On 10/20/2016 10:11 AM, Andrew Cooper wrote:
>>> On 20/10/16 14:55, Kyle Huey wrote:
>>>>>> That said, rr currently does not work in Xen guests due to some PMU
>>>>>> issues that we haven't tracked down yet.
>>>>> Is this RR trying to use vPMU and it not functioning, or not
>>>>> specifically trying to use PMU facilities and getting stuck anyway?
>>>> The latter.  rr relies on the values returned by the PMU (the retired
>>>> conditional branches counter in particular) being exactly the same
>>>> during the recording and replay phases.  This is true when running on
>>>> bare metal, and when running inside a KVM guest, but when running in a
>>>> Xen HVM guest we see values that are off by a branch or two on a small
>>>> fraction of our tests.  Since it works in KVM I suspect this is some
>>>> sort of issue with how Xen multiplexes the real PMU and events are
>>>> "leaking" between guests (or perhaps from Xen itself, though I don't
>>>> think the Xen kernel executes any ring 3 code).  Even if that's
>>>> correct we're a long way from tracking it down and patching it though.
>>> Hmm.  That is unfortunate, and does point towards a bug in Xen.  Are
>>> these tests which notice the problem easy to run?
>>>
>>> Boris (CC'd) is the maintainer of that code.  It has undergone quite a
>>> few changes recently.
>>
>> I am actually not the maintainer, I just break this code more often than
>> others.
>>
>> But yes, having a test case would make it much easier to understand what
>> and why is not working.
>>
>> Would something like
>>
>>     wrmsr(PERFCTR,0);
>>     wrmsr(EVNTSEL, XXX); //enable counter
>>     // do something simple, with branches
>>     wrmsr(EVTSEL,YYY); // disable counter
>>
>> demonstrate the problem? (I assume we are talking about HVM guest)
>>
>> -boris
>>
>
> That is a good question.  I'll see if I can reduce the problem down
> from "run Linux and run our tests inside it".

The anomalies we see appear to be related to, or at least triggerable
by, the performance monitoring interrupt.  The following program runs
a loop of roughly 2^25 conditional branches.  It takes one argument,
the number of conditional branches to program the PMI to trigger on.
The default is 50,000, and if you run the program with that it'll
produce the same value every time.  If you drop it to 5000 or so
you'll probably see occasional off-by-one discrepancies.  If you drop
it to 500 the performance counter values fluctuate wildly.

I'm not yet sure if this is specifically related to the PMI, or if it
can be caused by any interrupt and it's only how frequently the
interrupts occur that matters.

- Kyle

#define _GNU_SOURCE 1

#include <assert.h>
#include <fcntl.h>
#include <linux/perf_event.h>
#include <signal.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static struct perf_event_attr rcb_attr;
static uint64_t period;
static int fd;

void counter_on(uint64_t ticks)
{
  int ret = ioctl(fd, PERF_EVENT_IOC_RESET, 0);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_PERIOD, &ticks);
  assert(!ret);
  ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
  assert(!ret);
}

void counter_off()
{
  int ret = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
  assert(!ret);
}

int64_t read_counter()
{
  int64_t val;
  ssize_t nread = read(fd, &val, sizeof(val));
  assert(nread == sizeof(val));
  return val;
}

void do_test()
{
  int64_t counts;
  int i, dummy;

  counter_on(period);
  for (i = 0; i < (1 << 25); i++) {
    dummy += i % (1 << 10);
    dummy += i % (79 * (1 << 10));
  }

  counter_off();
  counts = read_counter();
  printf("Counted %ld conditional branches\n", counts);
}

int main(int argc, const char* argv[])
{
  memset(&rcb_attr, 0, sizeof(rcb_attr));
  rcb_attr.size = sizeof(rcb_attr);
  rcb_attr.type = PERF_TYPE_RAW;
  /* Intel retired conditional branches counter, ring 3 only */
  rcb_attr.config = 0x5101c4;
  rcb_attr.exclude_kernel = 1;
  rcb_attr.exclude_guest = 1;
  /* We'll change this later */
  rcb_attr.sample_period = 0xffffffff;

  /* start the counter */
  fd = syscall(__NR_perf_event_open, &rcb_attr, 0, -1, -1, 0);
  if (fd < 0) {
    printf("Failed to initialize counter\n");
    return -1;
  }

  signal(SIGALRM, SIG_IGN);

  if (fcntl(fd, F_SETFL, O_ASYNC) || fcntl(fd, F_SETSIG, SIGALRM)) {
    printf("Failed to make counter async\n");
    return -1;
  }

  counter_off();

  period = 50000;
  if (argc > 1) {
    sscanf(argv[1], "%ld", &period);
  }

  printf("Period is %ld\n", period);

  do_test();

  return 0;
}

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.