Xen project Mailing List

Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions

To: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

From: Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>

Date: Thu, 3 Jan 2013 11:24:58 -0500

Cc: "Keir \(Xen.org\)" <keir@xxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andres Lagar-Cavilla <andreslc@xxxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, xen-devel@xxxxxxxxxxxxx, Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Thu, 03 Jan 2013 16:25:23 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Jan 2, 2013, at 4:38 PM, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote: >> From: Tim Deegan [mailto:tim@xxxxxxx] >> Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of >> problem and alternate >> solutions >> >> Hi, > > Happy New Year Tim, and thanks for trying to add some clarity to the > discussion. > >> The question of starting VMs in parallel seems like a red herring to me: >> - TTBOMK Xapi already can start VMs in parallel. Since it knows what >> constraints it's placed on existing VMs and what VMs it's currently >> building, there is nothing stopping it. Indeed, AFAICS any toolstack >> that can guarantee enough RAM to build one VM at a time could do the >> same for multiple parallel builds with a bit of bookkeeping. >> - Dan's stated problem (failure during VM build in the presence of >> unconstrained guest-controlled allocations) happens even if there is >> only one VM being created. > > Agreed. The parallel VM discussion was simply trying to point out > that races can occur even without guest-controlled allocations, > so is distracting from the actual issue (which is, according to > wikipedia, one of the definitions of "red herring"). > > (As an aside, your use of the word "unconstrained" is a red herring. ;-) > >>>>> Andres Lagar-Cavilla says "... this is because of shortcomings in the >>>>> [Xen] mm layer and its interaction with wait queues, documented >>>>> elsewhere." In other words, this batching proposal requires >>>>> significant changes to the hypervisor, which I think we >>>>> all agreed we were trying to avoid. >>>> >>>> Let me nip this at the bud. I use page sharing and other techniques in an >>>> environment that doesn't >> use Citrix's DMC, nor is focused only on proprietary kernels... >>> >>> I believe Dan is saying is that it is not enabled by default. >>> Meaning it does not get executed in by /etc/init.d/xencommons and >>> as such it never gets run (or does it now?) - unless one knows >>> about it - or it is enabled by default in a product. But perhaps >>> we are both mistaken? Is it enabled by default now on xen-unstable? >> >> I think the point Dan was trying to make is that if you use page-sharing >> to do overcommit, you can end up with the same problem that self-balloon >> has: guest activity might consume all your RAM while you're trying to >> build a new VM. >> >> That could be fixed by a 'further hypervisor change' (constraining the >> total amount of free memory that CoW unsharing can consume). I suspect >> that it can also be resolved by using d->max_pages on each shared-memory >> VM to put a limit on how much memory they can (severally) consume. > > (I will respond to this in the context of Andres' response shortly...) > >>> Just as a summary as this is getting to be a long thread - my >>> understanding has been that the hypervisor is suppose to toolstack >>> independent. >> >> Let's keep calm. If people were arguing "xl (or xapi) doesn't need this >> so we shouldn't do it" > > Well Tim, I think this is approximately what some people ARE arguing. > AFAICT, "people" _are_ arguing that "the toolstack" must have knowledge > of and control over all memory allocation. Since the primary toolstack > is "xl", even though xl does not currently have this knowledge/control > (and, IMHO, never can or should), I think people _are_ arguing: > > "xl (or xapi) SHOULDn't need this so we shouldn't do it". > >> that would certainly be wrong, but I don't think >> that's the case. At least I certainly hope not! > > I agree that would certainly be wrong, but it seems to be happening > anyway. :-( Indeed, some are saying that we should disable existing > working functionality (eg. in-guest ballooning) so that the toolstack > CAN have complete knowledge and control. If you refer to my opinion on the bizarre-ness of the balloon, what you say is not at all what I mean. Note that I took great care to not break balloon functionality in the face of paging or sharing, and vice-versa. Andres > > So let me check, Tim, do you agree that some entity, either the toolstack > or the hypervisor, must have knowledge of and control over all memory > allocation, or the allocation race condition is present? > >> The discussion ought to be around the actual problem, which is (as far >> as I can see) that in a system where guests are ballooning without >> limits, VM creation failure can happen after a long delay. In >> particular it is the delay that is the problem, rather than the failure. >> Some solutions that have been proposed so far: >> - don't do that, it's silly (possibly true but not helpful); >> - this reservation hypercall, to pull the failure forward; >> - make allocation faster to avoid the delay (a good idea anyway, >> but can it be made fast enough?); >> - use max_pages or similar to stop other VMs using all of RAM. > > Good summary. So, would you agree that the solution selection > comes down to: "Can max_pages or similar be used effectively to > stop other VMs using all of RAM? If so, who is implementing that? > Else the reservation hypercall is a good solution." ? > >> My own position remains that I can live with the reservation hypercall, >> as long as it's properly done - including handling PV 32-bit and PV >> superpage guests. > > Tim, would you at least agree that "properly" is a red herring? > Solving 100% of a problem is clearly preferable and I would gladly > change my loyalty to someone else's 100% solution. But solving 98%* > of a problem while not making the other 2% any worse is not "improper", > just IMHO sensible engineering. > > * I'm approximating the total number of PV 32-bit and PV superpage > guests as 2%. Substitute a different number if you like, but > the number is certainly getting smaller over time, not growing. > > Tim, thanks again for your useful input. > > Thanks, > Dan > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.