[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature
> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > Subject: RE: Proposed new "memory capacity claim" hypercall/feature > > >>> On 30.10.12 at 18:13, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote: > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] (NOTE TO KEIR: Input from you requested in first stanza below.) Hi Jan -- Thanks for the continued feedback! I've slightly re-ordered the email to focus on the problem (moved tmem-specific discussion to the end). > As long as the allocation times can get brought down to an > acceptable level, I continue to not see a need for the extra > "claim" approach you're proposing. So working on that one (or > showing that without unreasonable effort this cannot be > further improved) would be a higher priority thing from my pov > (without anyone arguing about its usefulness). Fair enough. I will do some measurement and analysis of this code. However, let me ask something of you and Keir as well: Please estimate how long (in usec) you think it is acceptable to hold the heap_lock. If your limit is very small (as I expect), doing anything "N" times in a loop with the lock held (for N==2^26, which is a 256GB domain) may make the analysis moot. > But yes, with all the factors you mention brought in, there is > certainly some improvement needed (whether your "claim" > proposal is a the right thing is another question, not to mention > that I currently don't see how this would get implemented in > a consistent way taking several orders of magnitude less time > to carry out). OK, I will start on the next step... proof-of-concept. I'm envisioning simple arithmetic, but maybe you are right and arithmetic will not be sufficient. > > Suppose you have a huge 256GB machine and you have already launched > > a 64GB tmem guest "A". The guest is idle for now, so slowly > > selfballoons down to maybe 4GB. You start to launch another 64GB > > guest "B" which, as we know, is going to take some time to complete. > > In the middle of launching "B", "A" suddenly gets very active and > > needs to balloon up as quickly as possible or it can't balloon fast > > enough (or at all if "frozen" as suggested) so starts swapping (and, > > thanks to Linux frontswap, the swapping tries to go to hypervisor/tmem > > memory). But ballooning and tmem are both blocked and so the > > guest swaps its poor little butt off even though there's >100GB > > of free physical memory available. > > That's only one side of the overcommit situation you're striving > to get work right here: That same self ballooning guest, after > sufficiently more guest got started so that the rest of the memory > got absorbed by them would suffer the very same problems in > the described situation, so it has to be prepared for this case > anyway. The tmem design does ensure the guest is prepared for this case anyway... the guest swaps. And, unlike page-sharing, the guest determines which pages to swap, not the host, and there is no possibility of double-paging. In your scenario, the host memory is truly oversubscribed. This scenario is ultimately a weakness of virtualization in general; trying to statistically-share an oversubscribed fixed resource among a number of guests will sometimes cause a performance degradation, whether the resource is CPU or LAN bandwidth or, in this case, physical memory. That very generic problem is I think not one any of us can solve. Toolstacks need to be able to recognize the problem (whether CPU, LAN, or memory) and act accordingly (report, or auto-migrate). In my scenario, guest performance is hammered only because of the unfortunate deficiency in the existing hypervisor memory allocation mechanisms, namely that small allocations must be artificially "frozen" until a large allocation can complete. That specific problem is one I am trying to solve. BTW, with tmem, some future toolstack might monitor various available tmem statistics and predict/avoid your scenario. Dan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |