[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-ia64-devel] [PATCH][RFC]discontig memory support



Hi xen/ia64 developers.

The attached patch supports discontiguous memory. 
It also makes over 4GB memory available.
Please comment and review.

Signed-off-by: Kouya Shimura <kouya@xxxxxxxxxxxxxx>

Here is a design memo.
--------------------------------------------------------------------
VIRTUAL FRAME TABLE DESIGN MEMO
(supporting discontig memory)

[PURPOSE] IA64 Xen hypervisor currently limits the maximum size of
available memory to 4GB (the half is used for MMIO and actually only
2GB is available).  The purpose of this patch is to remove the
restriction.

The reason of this restriction is due to huge linear array of struct
page_info named 'frame_table'.  A page_info struct is accessed by a
linear index thus frame_table requires a contiguous memory area. If
there is a large hole in physical memory space, a block of page_info
corresponding to the hole is wasteful.

Ancient IA64 Xen hypervisor allocated the memory for frame_table from
xen heap whose size is at most 64MB. Xen heap had various uses and ran
out occasionally. I think the above is the reason to restrict the
available memory.

FYI, when I tried just to remove that restriction, latest Xen worked
well on Tiger4 with 8GB memory. Xen heap seems to have quite enough
margin now.

[SOLUTION]
Linux has the same problem and there are two solutions.

(1) VIRTUAL_MEM_MAP
  mem_map is linux's name of frame_table. places frame_table on
  virtual memory space and doesn't allocate physical memory to memory
  hole. The advantage is that the impact to existing code is little.

(2) SPARSEMEM
  prepares multi level tables and traverses the tree in order to
  access page_info. More modification and more influence to the
  existing code.  But supporting NUMA and memory hotplug on Linux owe
  to SPARSEMEM.

I have no idea which has good performance. Attached patch implements
the same way as VIRTUAL_MEM_MAP because of ease. But I think SPARSEMEM
should be implemented in future because it would be a mainstream in
Linux. In this case, common code might be modified.

[DESIGN]
* Location of the virtual address of frame_table
  I set the frame_table start at 0xf300000000000000 which region
  number is 7. Region number 7 is used for Xen hypervisor. The page
  size is 16KB. VHPT is disabled in this region and 16KB/16MB/64MB
  page size are mixed. The virtual aliasing might occur but the
  performance wouldn't be affected.

* Configuration of address translation table
  inherits linux's way and adopts 3-level configuration. 
+----------------------------------------------------------------+
|6 6             4          3          2          1             0|
|3 1             7          6          5          4             0|
+----------------------------------------------------------------+
 +-+10011         +-PGDoff--++-PMDoff--++-PTEoff--++---offset---+    
  region=7          (no PUD)      |          |            |
                       |   PGD    |   PMD    |   PTE      |
                       |  +---+ +--->+---+ +--->+---+     |
                       |  |   | | |  |   | | |  |   |     |
                       |  |   | | +->|   |-+ |  |   |     v
                       +->|   |-+    |   |   +->|   |--->OR->physical addr
                          +---+      +---+      +---+
  - Location of PGD.
    A page(swapper_pg_dir) pointed from init_mm.pgd seems to be never
    used. So I use this page as PGD for virtual frame_table. If
    someone uses this page, please tell me.

  - Another tables(PMD,PTE) are dynamically allocated from boot pages.

This structure inherits linux's address translation table. But it is
not efficient for small data such as frame_table. I expected that an
existing table walker is usable but finally I wrote a new walker. We
had better redesign the structure for performance improvement.

* Alternate data TLB miss handler
When a TLB miss occurs by accessing the frame_table, alternate data
TLB miss handler is called because the VHPT is disabled on region
7. Originally Xen's alt_dtlb_miss handler has TLB mapping mechanism to
a granule page (16MB) in order to access physical memory data in
virtual mode.

I appended a code that checks the fault address is inside of
frame_table and traverses the address translation table and inserts a
translation cache. To avoid nested data TLB fault, data address
translation is disabled at the time of table walking. On fail of table
walking, ia64_fault() is called.

[STATUS]
  * Xen and domU and domVTI (cset:9484:ddc279c91502) work well on
    Tiger4 with 8MB physical memory.

[TODO]
  * mpt_table (defined in xen/arch/ia64/xen/xenmem.c) implies the same
    problem. We have to fix it.
  * redesign of address translation table
  * performance evaluation
  * cleanup #ifdefs of CONFIG_VIRTUAL_MEM_MAP
--------------------------------------------------------------------

Attachment: virt_frame_table.patch
Description: Binary data

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.