[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 1/2] hypervisor: XENMEM_claim_pages (subop of existing) hypercall



> From: David Vrabel [mailto:david.vrabel@xxxxxxxxxx]

Hi David --

Thanks for your reply!
 
> On 28/11/12 15:50, Dan Magenheimer wrote:
> > This is patch 1of2 of an eighth cut of the patch of the proposed
> > XENMEM_claim_pages hypercall/subop, taking into account review
> > feedback from Jan and Keir and IanC and Matthew Daley, plus some
> > fixes found via runtime debugging (using printk and privcmd only).
> >
> [...]
> >
> > Proposed:
> > - call claim for mem=N amount of memory
> > - if claim succeeds:
> >     call populate_physmap repeatedly to achieve mem=N memory (failsafe)
> >   else
> >     report -ENOMEM up the stack
> > - claim is held until mem=N is achieved or the domain dies or
> >    the toolstack changes it to 0
> > - memory is held until domain dies or the toolstack decreases it
> 
> There is no mechanism for per-NUMA node claim.  Isn't this needed?

It would be a useful extension but is not a necessary one;
IIUC a domain creation succeeds even if optimal NUMA
positioning is not available.  As is, the proposed XENMEM_claim_pages
patch does exactly the same.  If there is a domain creation
option that forces domain creation to fail if there is _not_
an optimal NUMA positioning available, XENMEM_claim_pages
has flag fields to pass the same requirement so an extension
should be easy to add.

> More fundamentally, doesn't this approach result in a worse user
> experience?  It's guaranteeing that a new VM can be started but at the
> expense of existing VMs on that node.

Well, we are talking about a race.  Somebody has to win.
Traditionally, software races are decided by first-come-first-serve.
That's exactly how the proposed XENMEM_claim_pages works.

If you have a chance, please read the document I just posted
(Proposed XENMEM_claim_pages hypercall: Analysis of problems
and alternate solutions).

> When making a VM placement decision, the toolstack needs to consider the
> future memory requirements of the new and existing VMs on the host and
> not just the current (or more correctly, the recently) memory.
> 
> It seems more useful to me to have the toolstack (for example) to track
> historical memory usage of a VM to allow it to make better predictions
> about memory usage.  With a better prediction, the number of failed VM
> creates due to memory shortage will be minimized.  Then, combined with
> reducing the cost of a VM create by optimizing the allocator, the cost
> of occasionally failing a create will be minimal.
> 
> For example, Sally starts her CAD application at 9am, tripling her
> desktop VM instances memory usage.  If at 0858, the toolstack claimed a
> most of the remaining memory for a new VM, then Sally's VM is going to
> grind to a halt as it swaps to death.
> 
> If the toolstack could predict that that desktop instances memory usage
> was about to spike (because it had historical data showing his), it
> could have selected a different host and Sally's VM would perform as
> expected.

You are drifting the thread a bit here, but...

The last 4+ years of my life have been built on the fundamental
assumption that nobody, not even one guest kernel itself,
can adequately predict when memory usage is going to spike.
Accurate inference from an external entity across potentially dozens
of VMs is IMHO.... well... um... unlikely.  I could be wrong
but I believe, even in academia, there is no realistic research
solution proposed for this.  (If I'm wrong, please send a pointer.)

If one accepts this assumption as true, one must instead
plan to be able to adapt very dynamically when spikes
occur.  That's what tmem does to solve Sally's problem,
though admittedly tmem doesn't work for proprietary guest
kernels. +1 for open source. ;-)

Thanks,
Dan

P.S. If you'd like to learn more about tmem, please let me know,
as it is now available in Fedora and Ubuntu guests as well as
Oracle Linux (and, of course, Xen itself).

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.