[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes

To: Jan Beulich <JBeulich@xxxxxxxx>
From: Tomasz Wroblewski <tomasz.wroblewski@xxxxxxxxx>
Date: Mon, 19 May 2014 12:47:37 +0200
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 19 May 2014 10:48:00 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>


On 05/19/2014 12:38 PM, Jan Beulich wrote:

On 19.05.14 at 12:29, <tomasz.wroblewski@xxxxxxxxx> wrote:

On 05/16/2014 04:36 PM, Jan Beulich wrote:

On 16.05.14 at 13:38, <JBeulich@xxxxxxxx> wrote:

On 16.05.14 at 13:18, <tomasz.wroblewski@xxxxxxxxx> wrote:

If I coded up a patch to deal with this on -unstable, would you be
able to test that?

Willing to give it a go (xen major version updates are often problematic
to do though so can't promise success). What would your patch be doing?
Adding entries to MTRR for the relocated regions?

This and properly declare the region in ACPI's _CRS. For starters I'll
probably try keeping the WB default overlaid with UC variable ranges,
as that's going to be the less intrusive change.

Okay here are two patches - the first to deal with the above mentioned
items, and the second to further increase correctness and at once
shrink the number of MTRR regions needed.

Afaict they apply equally well to stable-4.3, master, and staging.

But to be honest I don't expect any performance improvement, all
I'd expect is that BARs relocated above 4Gb would now get treated
equally to such below 4Gb - UC in all cases.

Thanks Jan. I've tried the patches and you're correct, putting UC in
MTRR for the relocated region didn't help the issue. However, I had to
hack that manually - the codepaths to do that in your hvmloader patch
were not activating. The hvmloader is not programming guest pci bars to
64bit regions at all, rather still programming them with 32 bit
regions... upon a look this seems because using_64bar conditon, as well
as bar64_relocate in hvmloader/pci.c is always false.

I'm confused - iirc this started out because you saw the graphics
card BARs to be put above 4Gb. And now you say they aren't being
put there. But ...

So bar relocation to 64bit is not happening, but ram relocation as per
the code tagged as /* Relocate RAM that overlaps PCI space (in 64k-page
chunks). */ is happening. This maybe is correct (?), although I think
the fact that RAM is relocated but not the BAR causes the tools (i.e.
qemu) to lose sight of what memory is used for mmio and as you mentioned
in one of the previous posts, the calls which would set it to
mmio_direct in p2m table are not happening. Our qemu is pretty ancient
and doesn't support 64bit bars so its not super trivial to verify
whether relocating bars to 64bit would help. Trying to make sense out of
this..

... indeed I was apparently mis-interpreting what you said - all that
really was to be concluded from the log messages you quoted was
that RAM pages got relocated. But according to

(XEN) HVM3: Relocating guest memory for lowmem MMIO space enabled
(XEN) HVM3: Relocating 0xffff pages from 0e0001000 to 14dc00000 for lowmem MMIO 
hole
(XEN) HVM3: Relocating 0x1 pages from 0e0000000 to 15dbff000 for lowmem MMIO 
hole

and assuming that these were all related messages, this really isn't
a sign of using 64-bit BARs yet. All it tells us is that the PCI region
gets extended from 0xf0000000-0xfc000000 to 0xe0000000-0xfc000000.

So perhaps time for sending complete logs, plus suitable information
from inside the guest of how things (RAM, MMIO, MTRRs) end up being
set up?

Could be, though please read the explanation I came up in the other postwhether its enough, I think it makes sense... 64bit guest BARs areindeed not in use (confirmed from guest). MTRR is setup such that onlythe low region is UC, which is correct.

But the RAM relocation code causes the caching on relocated region to beUC instead of WB due to the timing (very early, MTRR disabled) at whichit runs, which is incorrect. I am thinking enabling MTRR during thatrelocation would probably fix it on 4.3

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich

References:
- [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Tomasz Wroblewski
- Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
  - From: Jan Beulich

Prev by Date: [Xen-devel] Singleshot timer firing late
Next by Date: Re: [Xen-devel] [PATCH] sqlite_backup.pl: Create backup of DB using locks to avoid problems with queuerunner.
Previous by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Next by thread: Re: [Xen-devel] GPU passthrough performance regression in >4GB vms due to XSA-60 changes
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.