[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about running a program(Intel PCM) in ring 0 on Xen

Hi Boris,

2014-02-18 11:16 GMT-05:00 Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>:
On 02/18/2014 10:24 AM, Meng Xu wrote:
Hi Dario,

Thank you so much for your detailed reply! It is really helpful! I'm looking at the vPMU and perf on Xen, and will try it. :-)

You will need the Xen patches that Dario pointed you to (thanks Dario) plus Linux kernel and toolstack changes that I can send you in a separate email (they still need some cleanup but should be usable).

Thank you so much for pointing this out! :)


BTW, you mentioned in the earlier email that you you wrote some code to directly access PMU registers and didn't think the code is particularly useful because of portability concerns. I believe basic counters (such as those for cache misses) and controls are common  across pretty much all recent Intel processors.

Yes, the counters are there. But when I looked at the events and umask number, they have slightly difference among the 2nd, 3rd and 4th generation of Intel's cpu. Some events are not there in earlier version of CPU. (If I code those difference in the xen tool I wrote, it will be like writing part of intel's PMC. that's why I hope to use the existing work to run in Xen. :-) )

The reason why I want to know this information from hardware performance counter is because I want to know the interference among each domains when they are running. 

In addition, when we measure the latency of accessing a large array, the result is out of our expectation. We increase the size of an array from 1KB to 12MB, which covers the L1(32KB), L2(256KB) and L3(12MB) cache size. We expect that the latency of accessing the whole array should have clear cut at around 32KB, 256KB and 12MB because the latency of L1 L2 and L3 are several times different. 

However, we saw the latency does not increase much when the array size is larger than the size of L1, L2, and L3. It's weird because if we run the same task in Linux on bare machine, it is the expected result.

Although most likely your vcpus are not migrating you should still make sure that they are pinned (and not oversubscribed to physical processors).

Thanks for pointing this out!

And (as with any performance measurements) disable power management and turbo mode. These things often mess up your timing.

Thank you very much for your help!





We are not sure if this is because of the virt. overhead or cache miss, that's why we want to know the cache access rate of each domain. 

It's really appreciated  if you can share some of your insight on this. :-)

Thank you very much for your time!



2014-02-18 4:14 GMT-05:00 Dario Faggioli <dario.faggioli@xxxxxxxxxx>:
On lun, 2014-02-17 at 17:32 -0500, Meng Xu wrote:
> Hi,

> I'm a PhD student, working on real time system.
Cool. There really seems to be a lot of interest in Real-Time
virtualization these days. :-D

> [My goal]
> I want to measure the cache hit/miss rate of each guest domain in Xen.
> I may also want to measure some other events, say memory access rate,
> for each program in each guest domain in Xen.
Ok. Can I, out of curiosity, as you to detail a bit more what your
*final* goal is (I mean, you're interested in these measurements for a
reason, not just for the sake of having them, right?).

> [The problem I'm encountering]
> I tried intel's Performance Counter Monitor (PCM) in Linux on bare
> machine to get the machine's cache access rate for each level of
> cache, it works very well.
> However, when I want to use the PCM in Xen and run it in dom0, it
> cannot work. I think the PCM needs to run in ring 0 to read/write the
> MSR. Because dom0 is running in ring 1, so PCM running in dom0 cannot
> work.

> So my question is:
> How can I run a program (say PCM) in ring 0 on Xen?
Running "a program" in there is going to be terribly difficult. What I
think you're better off is trying to access, from dom0 and/or
(para)virtualize the counters.think

In fact, there is work going on already on this, although I don't have
all the details about what's the current status.

> What's in my mind is:
> Writing a hypercall to call the PCM in Xen's kernel space, then the
> PCM will run in ring 0?
> But the problem I'm concerned is that some of the PCM's instruction,
> say printf(), may not be able to run in kernel space?
Well, Xen can print, e.g., on a serial console, but again, that's not
what you want. I'm adding the link to a few conversation about virtual
PMU. These are just the very first google's result, so there may well be


Boris (which I'm Cc-ing), gave a presentation about this at latest Xen
Developers Summit:


<<This happens because I choose it to happen!>> (Raistlin Majere)
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.