Xen project Mailing List

Re: [Xen-devel] [PATCH v8 1/2] hypervisor: XENMEM_claim_pages (subop of existing) hypercall

To: David Vrabel <david.vrabel@xxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Mon, 3 Dec 2012 13:28:18 -0800 (PST)

Cc: Keir Fraser <keir@xxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>, Matthew Daley <mattjd@xxxxxxxxx>, TimDeegan <tim@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, Jan Beulich <JBeulich@xxxxxxxx>, Zhigang Wang <zhigang.x.wang@xxxxxxxxxx>

Delivery-date: Mon, 03 Dec 2012 21:28:54 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> From: David Vrabel [mailto:david.vrabel@xxxxxxxxxx] Hi David -- Thanks for your reply! > On 28/11/12 15:50, Dan Magenheimer wrote: > > This is patch 1of2 of an eighth cut of the patch of the proposed > > XENMEM_claim_pages hypercall/subop, taking into account review > > feedback from Jan and Keir and IanC and Matthew Daley, plus some > > fixes found via runtime debugging (using printk and privcmd only). > > > [...] > > > > Proposed: > > - call claim for mem=N amount of memory > > - if claim succeeds: > > call populate_physmap repeatedly to achieve mem=N memory (failsafe) > > else > > report -ENOMEM up the stack > > - claim is held until mem=N is achieved or the domain dies or > > the toolstack changes it to 0 > > - memory is held until domain dies or the toolstack decreases it > > There is no mechanism for per-NUMA node claim. Isn't this needed? It would be a useful extension but is not a necessary one; IIUC a domain creation succeeds even if optimal NUMA positioning is not available. As is, the proposed XENMEM_claim_pages patch does exactly the same. If there is a domain creation option that forces domain creation to fail if there is _not_ an optimal NUMA positioning available, XENMEM_claim_pages has flag fields to pass the same requirement so an extension should be easy to add. > More fundamentally, doesn't this approach result in a worse user > experience? It's guaranteeing that a new VM can be started but at the > expense of existing VMs on that node. Well, we are talking about a race. Somebody has to win. Traditionally, software races are decided by first-come-first-serve. That's exactly how the proposed XENMEM_claim_pages works. If you have a chance, please read the document I just posted (Proposed XENMEM_claim_pages hypercall: Analysis of problems and alternate solutions). > When making a VM placement decision, the toolstack needs to consider the > future memory requirements of the new and existing VMs on the host and > not just the current (or more correctly, the recently) memory. > > It seems more useful to me to have the toolstack (for example) to track > historical memory usage of a VM to allow it to make better predictions > about memory usage. With a better prediction, the number of failed VM > creates due to memory shortage will be minimized. Then, combined with > reducing the cost of a VM create by optimizing the allocator, the cost > of occasionally failing a create will be minimal. > > For example, Sally starts her CAD application at 9am, tripling her > desktop VM instances memory usage. If at 0858, the toolstack claimed a > most of the remaining memory for a new VM, then Sally's VM is going to > grind to a halt as it swaps to death. > > If the toolstack could predict that that desktop instances memory usage > was about to spike (because it had historical data showing his), it > could have selected a different host and Sally's VM would perform as > expected. You are drifting the thread a bit here, but... The last 4+ years of my life have been built on the fundamental assumption that nobody, not even one guest kernel itself, can adequately predict when memory usage is going to spike. Accurate inference from an external entity across potentially dozens of VMs is IMHO.... well... um... unlikely. I could be wrong but I believe, even in academia, there is no realistic research solution proposed for this. (If I'm wrong, please send a pointer.) If one accepts this assumption as true, one must instead plan to be able to adapt very dynamically when spikes occur. That's what tmem does to solve Sally's problem, though admittedly tmem doesn't work for proprietary guest kernels. +1 for open source. ;-) Thanks, Dan P.S. If you'd like to learn more about tmem, please let me know, as it is now available in Fedora and Ubuntu guests as well as Oracle Linux (and, of course, Xen itself). _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.