[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxx>
Date: Thu, 15 May 2014 14:10:37 +0200
Cc: jinsong.liu@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Thu, 15 May 2014 13:12:25 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>


On 05/15/2014 02:32 PM, Jan Beulich wrote:

On 15.05.14 at 11:11, <tomasz.wroblewski@xxxxxxxxx> wrote:

We've recently updated from Xen 4.3.1 to 4.3.2 and found out a major
regression in gpu passthrough performance in VMs using >4GB of memory.
When using GPU pt (some radeon cards, also intergrated intel gpu pt),
load on cpu is constantly near maximum and screen is slow to update. The
machines are intel haswell/ivybridge laptops/desktops, the guests are
windows 7 64-bit HVMs.

I've bisected the failure to be due to XSA-60 changes, specifically:

commit e81d0ac25464825b3828cff5dc9e8285612992c4
Author: Liu Jinsong <jinsong.liu@xxxxxxxxx>
Date:   Mon Dec 9 14:26:03 2013 +0100

      VMX: remove the problematic set_uc_mode logic

This commit seems to have removed a bit of logic which, when guest was
setting cache disable bit in CR0 for a brief time, was iterating on all
mapped pfns and resetting memory type in EPTs to be consistent with the
result of mtrr.c:epte_get_entry_emt() call. I believe my tracing
indicates this used to return WRITEBACK caching strategy for the 64bit
memory areas where the BARs of the gpu seem to be located.

This bit of code is not happening anymore, speculatively I think the PCI
BAR area stays as uncached which causes the general slowness.

But of course you would have to ask yourself whether it was correct
for that range to be anything but UC. And as you may be aware,
there has been more work in this area recently, so further UC-ness
may be there in 4.5...

Yes, indeed it is very possible it has been only accidentally workingproperly before. Thanks, I wasn't aware about further work in this areain 4.5, will look.

Note that
I'm not talking about slow performance during the window the CR0 has
caching disabled, it does stays slow even after guest reenables it
shortly after since the problem seems to be a side effect of removed
loop setting some default EPT policies on all pfns. Reintroducing the
removed loop fixes the problem.

Doing so is clearly not going to be an option.

Yes, was merely stating how things are.

To at least harden your suspicion, did you look at D debug key
output with and without the change (for that to be useful here
you may want to add memory type dumping as was added in
-unstable).

Yeah I have dumped EPT memory types on affected ranges before and afterchange. Before the change we were getting writeback, and ultimately fromdebugging that value was originating from mtrr.c:get_mtrr_type function(called by epte_get_entry_emt()), specifically the "return m->def_type"statement in there so it seems it was just going off some default incase the range was not specified in mtrr. After the change, it stays as UC.

Not really sure why it only affects 64bit vms but I've just noticed thepci BARs for the card are being relocated by hvmloader as per some logs:


(XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled

(XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 forlowmem MMIO hole(XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmemMMIO hole


So it might be also related to that.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich

References:
- [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich

Prev by Date: Re: [Xen-devel] [PATCH RFC 8/9] x86/irqs: Move interrupt-stub generation out of C
Next by Date: Re: [Xen-devel] [PATCH RFC 9/9] x86/misc: Post cleanup
Previous by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Next by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.