[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Design question for PV superpage support



Mick Jordan wrote:
On 03/03/09 06:33, Dan Magenheimer wrote:
In general, I think the guest should assume that large page mappings are merely an optimization that (a) might not be possible on domain start due to machine memory fragmentation and (b) that this condition might also occur on restore. Given these, it must always be prepared to function with 4K pages, which implies that it would need to preserve enough page table frame memory to be able revert from large to small pages.

Mick

Do you disagree with my assertion that use of 2MB pages is
almost always an attempt to eke out a performance improvement,
that emulating 2MB pages with fragmented 4KB pages is likely
slower than just using 4KB pages to start with, and thus
that "must always be prepared to function with 4KB pages"
should NOT occur silently (if at all)?
I agree with the first statement. I'm not sure what you mean by "emulate 2MB pages with fragmented 4K pages" unless you assume nested page table support or you just mean falling back to 4K pages. As for whether a change should be silent, I'm less clear on that. I certainly wouldn't consider it a fatal condition requiring domain termination, That position is consistent with the "optimization not correctness" view of using large tables. However, a guest might want to indicate in some way that it has downgraded

The tradeoff is between the performance gain one might get from using large pages vs the intrusiveness of changes to a PV kernel. Given that when paravirtualizing this we're going to be making small changes to the kernel's existing large page support, rather than adding it new or a separate large-page mechanism, we need to make sure that as many of the guest's existing assumptions can be satisfied.

The requirement that a guest be able to come up with enough L1 pagetable pages to be able to map all the shattered 2M mappings at any time definitely doesn't fall into that category. You'd need to:

  1. Have an interface for Xen to tell the guest which pages need to be
     remapped.  Presumably this would be in terms of once contiguous
     pfn ranges which are now backed with discontinuous mfns.
  2. Get the guest to remap those pfns to the new mfns, which will
     require walking every pagetable of every process searching for
     those pfns, allocating memory for the new pagetable level.

However the main use of 2M mappings in Linux is to map the kernel text and data. That's clearly not going to be possible if we need to run kernel code to put things together after a restore. Hm, given that, I guess we could just kludge it into hugetlbfs, but it really does make it a very narrow set of users.

BTW, thinking ahead to ballooning with 2MB pages, are we prepared
to assume that a relinquished 2MB page can't be fragmented?
While this may be appealing for systems where nearly all
guests are using 2MB pages, systems where the 2MB guest is
an odd duck might suffer substantially by making that
assumption.
Agreed. All of this really only becomes an issue when memory is overcommitted. Unfortunately, that is precisely when 2MB machine contiguous pages are likely to be difficult to find.

If 2M pages are becoming more important, then we should change Xen to do all domain allocations in 2M units, while reserving separate superpages specifically for fragmenting into 4k allocations. Its certainly sensible to always round a domain's initial size up to 2M (most will already be a 2M multiple, I suspect). Balloon is the obvious exception, but I would argue that ballooning in less than 2M units is a lot of fiddly makework. The difference between a giving a domain 128MB vs 126MB is already pretty trivial; dealing with 4k changes in domain size is laughably small.

(Now Keir brings up all difficulties...)

   J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.