[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
> From: Tim Deegan [mailto:tim@xxxxxxx] > Subject: Re: Proposed new "memory capacity claim" hypercall/feature Hi Tim -- > At 11:43 -0800 on 04 Nov (1352029386), Dan Magenheimer wrote: > > > From: Keir Fraser [mailto:keir@xxxxxxx] > > > Sent: Friday, November 02, 2012 3:30 AM > > > To: Jan Beulich; Dan Magenheimer > > > Cc: Olaf Hering; IanCampbell; George Dunlap; Ian Jackson; George Shuklin; > > > DarioFaggioli; xen- > > > devel@xxxxxxxxxxxxx; Konrad Rzeszutek Wilk; Kurt Hackel; Mukesh Rathor; > > > Zhigang Wang; TimDeegan > > > Subject: Re: Proposed new "memory capacity claim" hypercall/feature > > > > > > On 02/11/2012 09:01, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: > > > > > > > Plus, if necessary, that loop could be broken up so that only the > > > > initial part of it gets run with the lock held (see c/s > > > > 22135:69e8bb164683 for why the unlock was moved past the > > > > loop). That would make for a shorter lock hold time, but for a > > > > higher allocation latency on large oder allocations (due to worse > > > > cache locality). > > > > > > In fact I believe only the first page needs to have its count_info set to > > > != > > > PGC_state_free, while the lock is held. That is sufficient to defeat the > > > buddy merging in free_heap_pages(). Similarly, we could hoist most of the > > > first loop in free_heap_pages() outside the lock. There's a lot of scope > > > for > > > optimisation here. > > > > (sorry for the delayed response) > > > > Aren't we getting a little sidetracked here? (Maybe my fault for > > looking at whether this specific loop is fast enough...) > > > > This loop handles only order=N chunks of RAM. Speeding up this > > loop and holding the heap_lock here for a shorter period only helps > > the TOCTOU race if the entire domain can be allocated as a > > single order-N allocation. > > I think the idea is to speed up allocation so that, even for a large VM, > you can just allocate memory instead of needing a reservation hypercall > (whose only purpose, AIUI, is to give you an immediate answer). Its purpose is to give an immediate answer on whether sufficient space is available for allocation AND (atomically) claim it so no other call to the allocator can race and steal some or all of it away. So unless the allocation is sped up enough (given an arbitrary size domain and arbitrary state of memory fragmentation) so that the heap_lock can be held for that length of time, speeding up allocation doesn't solve the problem. > > So unless the code for the _entire_ memory allocation path can > > be optimized so that the heap_lock can be held across _all_ the > > allocations necessary to create an arbitrary-sized domain, for > > any arbitrary state of memory fragmentation, the original > > problem has not been solved. > > > > Or am I misunderstanding? > > > > I _think_ the claim hypercall/subop should resolve this, though > > admittedly I have yet to prove (and code) it. > > I don't think it solves it - or rather it might solve this _particular_ > instance of it but it doesn't solve the bigger problem. If you have a > set of overcommitted hosts and you want to start a new VM, you need to: > > - (a) decide which of your hosts is the least overcommitted; > - (b) free up enough memory on that host to build the VM; and > - (c) build the VM. > > The claim hypercall _might_ fix (c) (if it could handle allocations that > need address-width limits or contiguous pages). But (b) and (a) have > exactly the same problem, unless there is a central arbiter of memory > allocation (or equivalent distributed system). If you try to start 2 > VMs at once, > > - (a) the toolstack will choose to start them both on the same machine, > even if that's not optimal, or in the case where one creation is > _bound_ to fail after some delay. > - (b) the other VMs (and perhaps tmem) start ballooning out enough > memory to start the new VM. This can take even longer than > allocating it since it depends on guest behaviour. It can fail > after an arbitrary delay (ditto). > > If you have a toolstack with enough knowledge and control over memory > allocation to sort out stages (a) and (b) in such a way that there are > no delayed failures, (c) should be trivial. (You've used the labels (a) and (b) twice so I'm not quite sure I understand... but in any case) Sigh. No, you are missing the beauty of tmem and dynamic allocation; you are thinking from the old static paradigm where the toolstack controls how much memory is available. There is no central arbiter of memory anymore than there is a central toolstack (other than the hypervisor on a one server Xen environment) that decides exactly when to assign vcpus to pcpus. There is no "free up enough memory on that host". Tmem doesn't start ballooning out enough memory to start the VM... the guests are responsible for doing the ballooning and it is _already done_. The machine either has sufficient free+freeable memory or it does not; and it is _that_ determination that needs to be done atomically because many threads are micro-allocating, and possibly multiple toolstack threads are macro-allocating, simultaneously. Everything is handled dynamically. And just like a CPU scheduler built into a hypervisor that dynamically allocates vcpu->pcpus has proven more effective than partitioning pcpus to different domains, dynamic memory management should prove more effective than some bossy toolstack trying to control memory statically. I understand that you can solve "my" problem in your paradigm without a claim hypercall and/or by speeding up allocations. I _don't_ see that you can solve "my" problem in _my_ paradigm without a claim hypercall... speeding up allocations doesn't solve the TOCTOU race so allocating sufficient space for a domain must be atomic. Sigh. Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |