[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>
From: Tim Deegan <tim@xxxxxxx>
Date: Sun, 4 Nov 2012 20:35:32 +0000
Cc: Olaf Hering <olaf@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, IanCampbell <Ian.Campbell@xxxxxxxxxx>, Konrad Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, George Shuklin <george.shuklin@xxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx, DarioFaggioli <raistlin@xxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Kurt Hackel <kurt.hackel@xxxxxxxxxx>, Zhigang Wang <zhigang.x.wang@xxxxxxxxxx>
Delivery-date: Sun, 04 Nov 2012 20:36:24 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

At 11:43 -0800 on 04 Nov (1352029386), Dan Magenheimer wrote:
> > From: Keir Fraser [mailto:keir@xxxxxxx]
> > Sent: Friday, November 02, 2012 3:30 AM
> > To: Jan Beulich; Dan Magenheimer
> > Cc: Olaf Hering; IanCampbell; George Dunlap; Ian Jackson; George Shuklin; 
> > DarioFaggioli; xen-
> > devel@xxxxxxxxxxxxx; Konrad Rzeszutek Wilk; Kurt Hackel; Mukesh Rathor; 
> > Zhigang Wang; TimDeegan
> > Subject: Re: Proposed new "memory capacity claim" hypercall/feature
> > 
> > On 02/11/2012 09:01, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
> > 
> > > Plus, if necessary, that loop could be broken up so that only the
> > > initial part of it gets run with the lock held (see c/s
> > > 22135:69e8bb164683 for why the unlock was moved past the
> > > loop). That would make for a shorter lock hold time, but for a
> > > higher allocation latency on large oder allocations (due to worse
> > > cache locality).
> > 
> > In fact I believe only the first page needs to have its count_info set to !=
> > PGC_state_free, while the lock is held. That is sufficient to defeat the
> > buddy merging in free_heap_pages(). Similarly, we could hoist most of the
> > first loop in free_heap_pages() outside the lock. There's a lot of scope for
> > optimisation here.
> 
> (sorry for the delayed response)
> 
> Aren't we getting a little sidetracked here?  (Maybe my fault for
> looking at whether this specific loop is fast enough...)
> 
> This loop handles only order=N chunks of RAM.  Speeding up this
> loop and holding the heap_lock here for a shorter period only helps
> the TOCTOU race if the entire domain can be allocated as a
> single order-N allocation.

I think the idea is to speed up allocation so that, even for a large VM,
you can just allocate memory instead of needing a reservation hypercall
(whose only purpose, AIUI, is to give you an immediate answer).

> So unless the code for the _entire_ memory allocation path can
> be optimized so that the heap_lock can be held across _all_ the
> allocations necessary to create an arbitrary-sized domain, for
> any arbitrary state of memory fragmentation, the original
> problem has not been solved.
> 
> Or am I misunderstanding?
> 
> I _think_ the claim hypercall/subop should resolve this, though
> admittedly I have yet to prove (and code) it.

I don't think it solves it - or rather it might solve this _particular_
instance of it but it doesn't solve the bigger problem.  If you have a
set of overcommitted hosts and you want to start a new VM, you need to:

 - (a) decide which of your hosts is the least overcommitted;
 - (b) free up enough memory on that host to build the VM; and
 - (c) build the VM.

The claim hypercall _might_ fix (c) (if it could handle allocations that
need address-width limits or contiguous pages).  But (b) and (a) have
exactly the same problem, unless there is a central arbiter of memory
allocation (or equivalent distributed system).  If you try to start 2
VMs at once,

 - (a) the toolstack will choose to start them both on the same machine,
       even if that's not optimal, or in the case where one creation is
       _bound_ to fail after some delay.
 - (b) the other VMs (and perhaps tmem) start ballooning out enough
       memory to start the new VM.  This can take even longer than
       allocating it since it depends on guest behaviour.  It can fail
       after an arbitrary delay (ditto).

If you have a toolstack with enough knowledge and control over memory
allocation to sort out stages (a) and (b) in such a way that there are
no delayed failures, (c) should be trivial.

Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
  - From: Dan Magenheimer
- Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
  - From: Dan Magenheimer

References:
- Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
  - From: Jan Beulich
- Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
  - From: Keir Fraser
- Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
  - From: Dan Magenheimer

Prev by Date: Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
Next by Date: [Xen-devel] [PATCH] xen: add persistent grants to xbd_xenbus
Previous by thread: Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
Next by thread: Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.