[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support

To: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Date: Thu, 12 Sep 2013 10:58:33 -0400
Cc: suravee.suthikulpanit@xxxxxxx, jacob.shin@xxxxxxx, eddie.dong@xxxxxxxxx, dietmar.hahn@xxxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>, jun.nakajima@xxxxxxxxx, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 12 Sep 2013 14:57:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 09/12/2013 05:39 AM, George Dunlap wrote:

On 11/09/13 19:22, Boris Ostrovsky wrote:
On 09/11/2013 01:01 PM, George Dunlap wrote:
On 10/09/13 16:47, Boris Ostrovsky wrote:
On 09/10/2013 11:34 AM, Jan Beulich wrote:
On 10.09.13 at 17:20, Boris Ostrovsky<boris.ostrovsky@xxxxxxxxxx> wrote:
This version has following limitations:
* For accurate profiling of dom0/Xen dom0 VCPUs should be pinned.
* Hypervisor code is only profiled on processors that haverunning dom0 VCPUs
on them.
With that I assume this is an RFC rather than full-fledgedsubmission?
I was thinking that this would be something like stage 1implementation (and
probably should have mentioned this in the cover letter).
For this stage I wanted to confine all changes on Linux side to xensubtrees.Properly addressing the above limitation would likely requirechanges in non-xen
sources (change in perf file format, remote MSR access etc.).
I think having the vpmu stuff for PV guests is a great idea, andfrom a quick skim through I don't have any problems with the generalapproach. (Obviously some more detailed review will be needed.)
However, I'm not a fan of this method of collecting perf stuff forXen and other VMs together in the cpu buffers for dom0. I thinkit's ugly, fragile, and non-scalable, and I would prefer to see ifwe could implement the same feature (allowing perf to analyze Xenand other vcpus) some other way. And I would rather not use it as a"stage 1", for fear that it would become entrenched.
I can see how collecting samples for other domains may bequestionable now (DOM0_PRIV mode) since at this stage there is no wayto distinguish between samples for non-priviledged domains.
But why do you think that getting data for both dom0 and Xen isproblematic? Someone has to process Xen's samples and who would dothis if not dom0? We could store samples in separate files (e.g.perf.data.dom0 and perf.data.xen) but that's toolstack's job.
It's not so much about dom0 collecting the samples and passing them onto the analysis tools; this is already what xenalyze does, inessence. It's about the requirement of having the dom0 vcpus pinned1-1 to physical cpus: both limiting the flexibility for scheduling,and limiting the configuration flexibility wrt having dom0 vcpus <pcpus. That is what seems an ugly hack to me -- having dom0 sort oftry to do something that requires hypervisor-level privileges andmaking a bit of a mess of it.


I probably should have explained the limitations better in the
original message.

Pinning:

The only reason this version requires pinning is because I haven't
provided hooks in Linux perf code to store both PCPU and VCPU of the
sample in the perf_sample_data. And I didn't do so this because this
would need to be done outside of arch/x86/xen and I decided not to go
there for this stage. So for now perf still only knows about CPUs, not
PCPUs or VCPUs.

Note that hypervisor already provides information about both P/VCPUs to
dom0 (*) so so when I fix what I described above in Linux (kernel and perf
toolstack) the right association of P/VCPUs will start working.

And pinning is not really *required*. If you don't pin you will not
get accurate sample distribution of hypervisor samples in perf.
For instance, if Xen's foo() was sampled on PCPU0 and then PCPU1 while
dom0's VCPU0 was running on each of them perf will assime that both
samples were taken on CPU0. Note again: CPU0, not P- or VCPU0).

#VCPUs < #PCPUs

This is different from pinning. The issue here is that tools (e.g. perf)need to

access the PMU's MSR. And they do it with something like wrmsr(msr, value),
and they assume that they are programming PMU on current processor. So
if a dom0's VCPU never runs on some PCPU it currently cannot program the
PMU there. One way to address this could be to have wrmsr_cpu(cpu, msr,
value). And presumably on bare metal this will be patched over with regular
wrmsr.

(*) Well, it doesn't. Because I forgot to add this to the code (it's oneline, really)

but I will in the next version.

I'm unfortunately not familiar enough with the perf system to knowexactly what it is that Linux needs to do (why, for example, you thinkit would need remote MSR access if dom0 weren't pinned),

Remote MSR access is needed not because of pinning but because the tool(perf, orany other tool for that matter) needs to program the PMU on non-dom0processors.

and how hard would be for Xen just to do that work, and provide an"adapter" that would translate Xen-specific stuff into something perfcould consume. Would it be possible, for example, for dom0 to specifywhat needed to be collected, for Xen to generate the samples in aXen-specific format, and then have something in dom0 that wouldseparate the samples into one file per domain that look similar enoughto a trace file that the perf system could consume it?

Perf calculates sampling period on each sample and writes resultingvalue into thecounter MSR (I haven't looked yet at how it uses other performancefacilities such as

PEBS, IBS and such).

Processing sample data is done by the toolstack and is relatively easy,we don't needXen-specific format (once we fix the pinning issue so we know to whom asample belongs).

Programming PMU HW from exiting perf code is the challenge.


Thanks.
-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: Boris Ostrovsky
- Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: Jan Beulich
- Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: Boris Ostrovsky
- Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: George Dunlap
- Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: Boris Ostrovsky
- Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
  - From: George Dunlap

Prev by Date: Re: [Xen-devel] [PATCH v5 11/13] xen: introduce xen_alloc/free_coherent_pages
Next by Date: Re: [Xen-devel] [PATCH v5 11/13] xen: introduce xen_alloc/free_coherent_pages
Previous by thread: Re: [Xen-devel] [PATCH v1 00/13] x86/PMU: Xen PMU PV support
Next by thread: [Xen-devel] [PATCH v1 0/5] xen/PMU: PMU support for Xen PV guests
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.