[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.3 development update

To: Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>
From: George Dunlap <george.dunlap@xxxxxxxxxxxxx>
Date: Fri, 5 Apr 2013 14:43:22 +0100
Cc: "Tim \(Xen.org\)" <tim@xxxxxxx>, Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Fri, 05 Apr 2013 13:44:00 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 04/04/13 18:14, Suravee Suthikulanit wrote:

On 4/3/2013 5:51 AM, George Dunlap wrote:

On 03/04/13 00:48, Suravee Suthikulanit wrote:

On 4/2/2013 12:06 PM, Suravee Suthikulpanit wrote:

On 4/2/2013 11:34 AM, Tim Deegan wrote:

At 16:42 +0100 on 02 Apr (1364920927), Jan Beulich wrote:

On 02.04.13 at 16:07, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
wrote:

* AMD NPT performance regression after c/s 24770:7f79475d3de7
     owner: ?
     Reference: http://marc.info/?l=xen-devel&m=135075376805215

This is supposedly fixed with the RTC changes Tim committed the
other day. Suravee, is that correct?

This is a separate problem.  IIRC the AMD XP perf issue is caused
by the
emulation of LAPIC TPR accesses slowing down with Andres's p2m locking
patches.  XP doesn't have 'lazy IRQL' or support for CR8, so it
takes a
_lot_ of vmexits for IRQL reads and writes.

Is there any tools or good ways to count the number of VMexit in Xen?

Tim/Jan,

I have used iperf benchmark to compare network performance (bandwidth)
between the two versions of the hypervisor:
1. good: 24769:730f6ed72d70
2. bad: 24770:7f79475d3de7

In the "bad" case, I am seeing that the network bandwidth has dropped
about 13-15%.

However, when I uses the xentrace utility to trace the number of VMEXIT,
I actually see about 25% more number of VMEXIT in the good case.  This
is inconsistent with the statement that Tim mentioned above.

I was going to say, what I remember from my little bit of
investigation back in November, was that it had all the earmarks of
micro-architectural "drag", which happens when the TLB or the caches
can't be effective.

Suvaree, if you look at xenalyze, a microarchitectural "drag" looks like:
* fewer VMEXITs, but
* time for each vmexit takes longer

If you post the results of "xenalyze --svm-mode -s" for both traces, I
can tell you what I see.

  -George

Here's another version of the outputs from xenalyze with only VMEXIT.
In this case, I pin all the VCPUs (4) and pin my application process to
VCPU 3.

NOTE: This measurement is without the RTC bug.

BAD:
-- v3 --
   VMEXIT_CR0_WRITE          305  0.00s  0.00%  1660 cyc { 1158| 1461| 2507}
   VMEXIT_CR4_WRITE            6  0.00s  0.00% 19771 cyc { 1738| 5031|79600}

[snip]

   VMEXIT_IOIO              5581  0.19s  0.85% 82514 cyc { 4250|81909|146439}
   VMEXIT_NPF             108072  0.71s  3.14% 15702 cyc { 6362| 6865|37280}

GOOD:
-- v3 --
   VMEXIT_CR0_WRITE         3099  0.00s  0.01%  1541 cyc { 1157| 1420| 2151}
   VMEXIT_CR4_WRITE           12  0.00s  0.00%  4105 cyc { 1885| 4380| 5515}

[snip]

   VMEXIT_IOIO             53835  1.97s  8.74% 87959 cyc { 4996|82423|144207}
   VMEXIT_NPF             855101  2.06s  9.13%  5787 cyc { 4903| 5328| 8572}

[snip]

So in the good run, we have 855k NPF exits, each of which takes about5.7k cycles. In the bad run, we have only 108k NPF exits, each of whichtakes an average of 15k cycles. (Although the 50th percentile is stillonly 6.8k cycles -- so most are about the same, but a few take a lotlonger.)

It's a bit strange -- the reduced number of NPF exits is consistent withthe idea of some micro-architectural thing slowing down the processingof the guest. However, in my experience usually this also has an effecton other processing as well -- i.e., the time to process an IOIO wouldalso go up, because dom0 would be slowed down as well; and time toprocess any random VMEXIT (say, the CR0 writes) would also go up.

But maybe it only has an effect inside the guest, because of the taggedTLBs or something?


Suravee, could you run this one again, but:
* Trace everything, not just vmexits

* Send me the trace files somehow (FTP or Dropbox), and/or add"--with-interrupt-eip-enumeration=249 --with-mmio-enumeration" when yourun the summary?

That will give us an idea where the guest is spending its timestatistically, and what kinds of MMIO it is doing, which may give us aclearer picture of what's going on.


Thanks,
 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] Xen 4.3 development update
  - From: George Dunlap
- Re: [Xen-devel] Xen 4.3 development update
  - From: Jan Beulich
- Re: [Xen-devel] Xen 4.3 development update
  - From: Tim Deegan
- Re: [Xen-devel] Xen 4.3 development update
  - From: Suravee Suthikulpanit
- Re: [Xen-devel] Xen 4.3 development update
  - From: Suravee Suthikulanit
- Re: [Xen-devel] Xen 4.3 development update
  - From: George Dunlap
- Re: [Xen-devel] Xen 4.3 development update
  - From: Suravee Suthikulanit

Prev by Date: Re: [Xen-devel] [PATCH 3/3 v2] vpmu intel: Dump vpmu infos in 'q' keyhandler
Next by Date: Re: [Xen-devel] [PATCH 08/28] libxc: ocaml: add simple binding for xentoollog (output only).
Previous by thread: Re: [Xen-devel] Xen 4.3 development update
Next by thread: Re: [Xen-devel] Xen 4.3 development update
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.