[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
Keir, Jan (et al) -- In a recent long thread [1], there was a great deal of discussion about the possible need for a "memory reservation" hypercall. While there was some confusion due to the two worldviews of static vs dynamic management of physical memory capacity, one worldview definitely has a requirement for this new capability. It is still uncertain whether the other worldview will benefit as well, though I believe it eventually will, especially when page sharing is fully deployed. Note that to avoid confusion with existing usages of various terms (such as "reservation"), I am now using the distinct word "claim" as in a "land claim" or "mining claim": http://dictionary.cambridge.org/dictionary/british/stake-a-claim When a toolstack creates a domain, it can first "stake a claim" to the amount of memory capacity necessary to ensure the domain launch will succeed. In order to explore feasibility, I wanted to propose a possible hypervisor design and would very much appreciate feedback! The objective of the design is to ensure that a multi-threaded toolstack can atomically claim a specific amount of RAM capacity for a domain, especially in the presence of independent dynamic memory demand (such as tmem and selfballooning) which the toolstack is not able to track. "Claim X 50G" means that, on completion of the call, either (A) 50G of capacity has been claimed for use by domain X and the call returns success or (B) the call returns failure. Note that in the above, "claim" explicitly does NOT mean that specific physical RAM pages have been assigned, only that the 50G of RAM capacity is not available either to a subsequent "claim" or for most[2] independent dynamic memory demands. I think the underlying hypervisor issue is that the current process of "reserving" memory capacity (which currently does assign specific physical RAM pages) is, by necessity when used for large quantities of RAM, batched and slow and, consequently, can NOT be atomic. One way to think of the newly proposed "claim" is as "lazy reserving": The capacity is set aside even though specific physical RAM pages have not been assigned. In another way, claiming is really just an accounting illusion, similar to how an accountant must "accrue" future liabilities. Hypervisor design/implementation overview: A domain currently does RAM accounting with two primary counters "tot_pages" and "max_pages". (For now, let's ignore shr_pages, paged_pages, and xenheap_pages, and I hope Olaf/Andre/others can provide further expertise and input.) Tot_pages is a struct_domain element in the hypervisor that tracks the number of physical RAM pageframes "owned" by the domain. The hypervisor enforces that tot_pages is never allowed to exceed another struct_domain element called max_pages. I would like to introduce a new counter, which records how much capacity is claimed for a domain which may or may not yet be mapped to physical RAM pageframes. To do so, I'd like to split the concept of tot_pages into two variables, tot_phys_pages and tot_claimed_pages and require the hypervisor to also enforce: d.tot_phys_pages <= d.tot_claimed_pages[3] <= d.max_pages I'd also split the hypervisor global "total_avail_pages" into "total_free_pages" and "total_unclaimed_pages". (I'm definitely going to need to study more the two-dimensional array "avail"...) The hypervisor must now do additional accounting to keep track of the sum of claims across all domains and also enforce the global: total_unclaimed_pages <= total_free_pages I think the memory_op hypercall can be extended to add two additional subops, XENMEM_claim and XENMEM_release. (Note: To support tmem, there will need to be two variations of XEN_claim, "hard claim" and "soft claim" [3].) The XEN_claim subop atomically evaluates total_unclaimed_pages against the new claim, claims the pages for the domain if possible and returns success or failure. The XEN_release "unsets" the domain's tot_claimed_pages (to an "illegal" value such as zero or MINUS_ONE). The hypervisor must also enforce some semantics: If an allocation occurs such that a domain's tot_phys_pages would equal or exceed d.tot_claimed_pages, then d.tot_claimed_pages becomes "unset". This enforces the temporary nature of a claim: Once a domain fully "occupies" its claim, the claim silently expires. In the case of a dying domain, a XENMEM_release operation is implied and must be executed by the hypervisor. Ideally, the quantity of unclaimed memory for each domain and for the system should be query-able. This may require additional memory_op hypercalls. I'd very much appreciate feedback on this proposed design! Thanks, Dan [1] http://lists.xen.org/archives/html/xen-devel/2012-09/msg02229.html and continued in October (the archives don't thread across months) http://lists.xen.org/archives/html/xen-devel/2012-10/msg00080.html [2] Pages used to store tmem "ephemeral" data may be an exception because those pages are "free-on-demand". [3] I'd be happy to explain the minor additional work necessary to support tmem but have mostly left it out of the proposal for clarity. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |