[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-ia64-devel][PATCH][RFC] Task: support huge page RE: [Xen-ia64-devel] Xen/IA64 Healthiness Report -Cset#11460



Le Jeudi 28 Septembre 2006 10:07, Isaku Yamahata a écrit :
> Hi Anthony.
>
> On Wed, Sep 27, 2006 at 09:56:11PM +0800, Xu, Anthony wrote:
> > Currently, memory allocated for domU and VTI-domain is 16K contiguous.
> > That means all huge page TLB entries must be broken into 16K TLB
> > entries. This definitely impact overall performance, for instance, in
> > linux, region 7 is using 16M page size. IA64 is supposed to be used at
> > high end server, many services running on IA64 are using huge page, like
> > Oracle is using 256M page in region 4, if XEN/IA64 still use 16K
> > contiguous physical page, we can image, this can impact performance
> > dramatically. So domU, VTI-domain and dom0 need to support huge page.
> >
> > Attached patch is an experiment to use 16M page on VTI-domain. A very
> > tricky way is used to allocate 16M contiguous memory for VTI-domain, so
> > it's just for reference.
> > Applying this patch, I can see 2%~3% performance gains when running KB
> > on VTI-domain(UP), you may know performance of KB on VTI-domain is not
> > bad:-), that means the improvement is somewhat big.
> > As we know, KB doesn't use 256M, the performance gain is coming from 16M
> > page in region 7, if we run some applications, which use 256M huge page,
> > and then we may get more improvement.
>
> I agree with you that supporting tlb insert with large page size and
> hugetlbfs would be a big gain.
>
> > In my mind, we need do below things (there may be more) if we want to
> > support huge page.
> > 1. Add an option "order" in configure file vtiexample.vti. if order=0,
> > XEN/IA64 allocate 16K contiguous memory for domain, if order=1, allocate
> > 32K,  and so on. Thus user can chose page size for domain.
>
> A fall back path should be implemented in case that
> large page allocation fails.
> Or do you propose introducing new page allocator with very large chunk?
> With order option, page fragmentation should be taken care of.
>
> > 2. This order option will be past to increase_reservation() function as
> > extent_order argument, increase_reservation() will allocate contiguous
> > memory for domain.
> >
> > 3.  There may be some memory blocks, which we also want
> > increase_reservation to allocate for us, such as shared page, or
> > firmware memory for VTI domain etc. So we may need to call
> > increase_reservation() several times to allocate memories with different
> > page size.
> >
> > 4. Per_LP_VHPT may need to be modified to support huge page.
>
> Do you mean hash collision?
>
> > 5. VBD/VNIF may need to be modified to use copy mechanism instead of
> > flipping page.
> >
> > 6. Ballon driver may need to be modified to increase or decrease domain
> > memory by page size not 16K.
> >
> > Magnus, would you like to take this task?
> >
> > Comments are always welcome.
>
> Those are my some random thoughts.
>
> * Presumably there are two goals
>   - Support one large page size(e.g. 16MB) to map kernel.
>   - Support hugetlbfs whose page size might be different from 16MB.
>
>   I.e. support three page sizes, normal page size 16KB, kernel mapping
>   page size 16MB and hugetlbfs page size 256MB.
>   I think hugetlbfs support can be addressed specialized way.
>
> hugetlbfs
> * Some specialized path can be implemented to support hugetlbfs.
>   - For domU
>     paravirtualize hugetlbfs for domU.
>     Hook to alloc_fresh_huge_page() in Linux. Then xen/ia64 is aware of
>     large pages.
>     Probably a new flag of the p2m entry, or other data structure might be
>     introduced.
>     For xenLinux, the region number, RGN_HPAGE can be used to check before
>     entering hugetlbfs specialized path.
>   - For domVTI
>     Can the use of hugetlbfs be detected somehow?
>     Probably some Linux-specific heuristic can be used.
>     e.g. check the region, RGN_HPAGE.
>
> kernel mapping with large page size.
> * page fragmentation should be addressed.
>   Both 16KB and 16MB page should be able to co-exist in a same domain.
>   - Allocating large contiguous region might fail.
>     So fall back path should be implemented.
>   - domain should be able to have pages with both page size (16KB and 16MB)
>     for smooth code merge.
>   probably a new bit of the p2m entry, something like _PAGE_HPAGE,
>   would be introduce to distinguish large page from normal page.
>
> * paravirtualized driver(VBD/VNIF)
>   This is a really issue.
>   For first prototype it is reasonable to not support page flipping
>   resorting grant table memory copy.
>
>   There are two kinds of page flipping, page mapping and page transfer.
>   I guess page mapping should be supported somehow assuming only dom0
>   (or driver domain) maps.
>   We should measure page flipping and memory copy before giving it a try.
>   I have no figures about it.
>   I'm not sure which has better-performance.
>   (I'm biased. I know that vnif analysis on xen/x86.
>    It said memory copy was cheaper on x86 than page flipping...)
>   If dom0 does only DMA, I/O request can be completed without copy and tlb
>   flush for VBD with tlb tracking patch.
>   Page transfer is difficult. I'm not sure that it's worth while to support
>   page transfer because I'm suffering in optimize it.
>
> Another approach is
> * increase xen page size.
>   Probably simply increasing page size wouldn't work well.
>   In that case, increase only domheap page size,
>   Or introduce new zone like MEMZONE_HPAGE,
>   Or introduce specialized page allocator for it.

Hi,

thank you for your thoughts.

If we want to support linux page-size != Xen page-size some similar issues are 
encountered.
I therefore suppose the both features should be worked together.

Tristan.

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.