[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

To: Keir Fraser <Keir.Fraser@xxxxxxxxxxxx>
From: John Byrne <john.l.byrne@xxxxxx>
Date: Thu, 06 Dec 2007 17:43:12 -0800
Cc: Tim Deegan <Tim.Deegan@xxxxxxxxxxxxx>, "Huang2, Wei" <Wei.Huang2@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>
Delivery-date: Thu, 06 Dec 2007 17:44:12 -0800
List-id: Xen developer discussion <xen-devel.lists.xensource.com>


Keir,

I'm very late replying to this. I wanted to make sure I something that
worked first before continuing the discussion and things took longer
than I'd hoped. Wei has asked me to send along my patch (against 16256)

for discussion. (Maybe just to make his look good.) Mine is lesscomplete --- it doesn't handle page shattering when pages are removed--- but it works well enough to start Linux HAP guests with 1Gsuper-pages, which was my primary interest.


My original thought for modifying just populate_physmap() to
opportunistically use super-pages was that my try_larger_extents()
function in memory.c could be made mode-specific and that the hypervisor
was the easiest place to have this kind of policy. (Will IOMMU DMA
support for PV guests benefit from super-page allocations?)

I did end up modifying xc_hvm_build, because I wanted to optimize the
guest to use 1G pages by using as little memory under 1G as possible.
So, the memsize_low variable I define is meant to become a parameter to
allow the domain config to specify a low memory size (I'm using 32MB for
now) and the rest of the memory allocated starting at the 1G boundary.
Perhaps some general method of specifying the guest memory layout could
be developed.

For p2m, I assumed that gfn_to_mfn_current() was an infrequent operationunder HAP and it was not worth doing any direct mapping of the L2/L3

page tables to support this. So gfn_to_mfn_current() in HAP mode just
calls gfn_to_mfn_foreign() (modified to note PSE pages) and walks the
HAP pagetable.

Perhaps there is a useful idea in this that could be used with Wei'schanges.


John Byrne


Keir Fraser wrote:

To my mind populate_physmap() should do what it is told w.r.t. extent sizes. I 
don't mind some modification of xc_hvm_build to support this feature.

 -- Keir

On 16/11/07 17:53, "Huang2, Wei" <Wei.Huang2@xxxxxxx> wrote:

John,

If you have a better design, share with us and I will be happy to work with 
you. :-) I agree that xc_hvm_build.c does not have to be modified, if memory.c 
is smart enough to scan all page_array information. But one concern is that 
sometimes Xen tools really want to create mapping at 4KB boundary instead of 
using large page. That requires extra information passed from tools (e.g., 
xc_hvm_build.c) to memory.c

-Wei

________________________________
From: Byrne, John (HP Labs) [mailto:john.l.byrne@xxxxxx]
Sent: Friday, November 16, 2007 11:41 AM
To: Huang2, Wei; xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Tim Deegan
Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Wei,

I have been hacking at this, too,  since I am interested in trying 1GB pages to 
see what they can do. After I dug myself into a hole, I restarted from the 
beginning and am trying a different approach than modifying xc_hvm_build.c: 
modify populate_physmap() to opportunistically allocate large pages, if 
possible. I just thought I'd mention it.

John Byrne


________________________________
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Huang2, Wei
Sent: Thursday, November 15, 2007 8:26 AM
To: xen-devel@xxxxxxxxxxxxxxxxxxx
Cc: Tim Deegan
Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

I implemented a preliminary version of HAP large page support. My testings 
showed that 32bit PAE and 64bit worked well. Also I saw decent performance 
improvement for certain benchmarks.

So before I go too far, I send this patch to community for reviews/comments. 
This patch goes with xen-unstable changeset 16281. I will redo it after 
collecting all ideas.

Thanks,

-Wei

============
DESIGN IDEAS:
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls back to 
4KB pages.

2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table (setting 
PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based on page 
size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For instance, 
if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits are on, it will 
further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is only 
used at the very beginning, guest_physmap_add(), this is OK for now.

3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear mapping 
on SH_LINEAR_PT_VIRT_START. We borrow this address space since it was not used.

4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address translation 
correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE bit. If is 
on, we handle translation using large page. Otherwise, it falls back to normal 
RO_MPT_VIRT_START address space to access P2M L1 pages.

5. M2P translation
- Same as before, M2P translation still happens on 4KB level.

AREAS NEEDS COMMENTS:
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for xc_hvm_build.c. I 
don't want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect changes 
soon.

2. Shadow paging
- This implementation will affect shadow mode, especially at xc_hvm_build.c and 
memory.c.
- Where and how to avoid affecting shadow?

3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option or 
anything else)?

4. Other missing areas?
===========

________________________________
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Attachment: 1gpages.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

Prev by Date: Re: [Xen-devel] Re: [Xen-staging] [linux-2.6.18-xen] xenbus: Remove dead code.
Next by Date: RE: [Xen-devel] Using VT-D to grant a Windows DomU access to a PCIExpress graphics card?
Previous by thread: [Xen-devel][PATCH][IOEMU] Correct hu keymap definitions
Next by thread: [Xen-devel] HVM domain can not be created with providing vif= on Cset: #16549
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.