[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions

On Jan 2, 2013, at 4:38 PM, Dan Magenheimer <dan.magenheimer@xxxxxxxxxx> wrote:

>> From: Tim Deegan [mailto:tim@xxxxxxx]
>> Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of 
>> problem and alternate
>> solutions
>> Hi,
> Happy New Year Tim, and thanks for trying to add some clarity to the
> discussion.
>> The question of starting VMs in parallel seems like a red herring to me:
>> - TTBOMK Xapi already can start VMs in parallel.  Since it knows what
>>  constraints it's placed on existing VMs and what VMs it's currently
>>  building, there is nothing stopping it.  Indeed, AFAICS any toolstack
>>  that can guarantee enough RAM to build one VM at a time could do the
>>  same for multiple parallel builds with a bit of bookkeeping.
>> - Dan's stated problem (failure during VM build in the presence of
>>  unconstrained guest-controlled allocations) happens even if there is
>>  only one VM being created.
> Agreed.  The parallel VM discussion was simply trying to point out
> that races can occur even without guest-controlled allocations,
> so is distracting from the actual issue (which is, according to
> wikipedia, one of the definitions of "red herring").
> (As an aside, your use of the word "unconstrained" is a red herring. ;-)
>>>>> Andres Lagar-Cavilla says "... this is because of shortcomings in the
>>>>> [Xen] mm layer and its interaction with wait queues, documented
>>>>> elsewhere."  In other words, this batching proposal requires
>>>>> significant changes to the hypervisor, which I think we
>>>>> all agreed we were trying to avoid.
>>>> Let me nip this at the bud. I use page sharing and other techniques in an 
>>>> environment that doesn't
>> use Citrix's DMC, nor is focused only on proprietary kernels...
>>> I believe Dan is saying is that it is not enabled by default.
>>> Meaning it does not get executed in by /etc/init.d/xencommons and
>>> as such it never gets run (or does it now?) - unless one knows
>>> about it - or it is enabled by default in a product. But perhaps
>>> we are both mistaken? Is it enabled by default now on xen-unstable?
>> I think the point Dan was trying to make is that if you use page-sharing
>> to do overcommit, you can end up with the same problem that self-balloon
>> has: guest activity might consume all your RAM while you're trying to
>> build a new VM.
>> That could be fixed by a 'further hypervisor change' (constraining the
>> total amount of free memory that CoW unsharing can consume).  I suspect
>> that it can also be resolved by using d->max_pages on each shared-memory
>> VM to put a limit on how much memory they can (severally) consume.
> (I will respond to this in the context of Andres' response shortly...)
>>> Just as a summary as this is getting to be a long thread - my
>>> understanding has been that the hypervisor is suppose to toolstack
>>> independent.
>> Let's keep calm.  If people were arguing "xl (or xapi) doesn't need this
>> so we shouldn't do it"
> Well Tim, I think this is approximately what some people ARE arguing.
> AFAICT, "people" _are_ arguing that "the toolstack" must have knowledge
> of and control over all memory allocation.  Since the primary toolstack
> is "xl", even though xl does not currently have this knowledge/control
> (and, IMHO, never can or should), I think people _are_ arguing:
> "xl (or xapi) SHOULDn't need this so we shouldn't do it".
>> that would certainly be wrong, but I don't think
>> that's the case.  At least I certainly hope not!
> I agree that would certainly be wrong, but it seems to be happening
> anyway. :-(  Indeed, some are saying that we should disable existing
> working functionality (eg. in-guest ballooning) so that the toolstack
> CAN have complete knowledge and control.

If you refer to my opinion on the bizarre-ness of the balloon, what you say is 
not at all what I mean. Note that I took great care to not break balloon 
functionality in the face of paging or sharing, and vice-versa.

> So let me check, Tim, do you agree that some entity, either the toolstack
> or the hypervisor, must have knowledge of and control over all memory
> allocation, or the allocation race condition is present?
>> The discussion ought to be around the actual problem, which is (as far
>> as I can see) that in a system where guests are ballooning without
>> limits, VM creation failure can happen after a long delay.  In
>> particular it is the delay that is the problem, rather than the failure.
>> Some solutions that have been proposed so far:
>> - don't do that, it's silly (possibly true but not helpful);
>> - this reservation hypercall, to pull the failure forward;
>> - make allocation faster to avoid the delay (a good idea anyway,
>>   but can it be made fast enough?);
>> - use max_pages or similar to stop other VMs using all of RAM.
> Good summary.  So, would you agree that the solution selection
> comes down to: "Can max_pages or similar be used effectively to
> stop other VMs using all of RAM? If so, who is implementing that?
> Else the reservation hypercall is a good solution." ?
>> My own position remains that I can live with the reservation hypercall,
>> as long as it's properly done - including handling PV 32-bit and PV
>> superpage guests.
> Tim, would you at least agree that "properly" is a red herring?
> Solving 100% of a problem is clearly preferable and I would gladly
> change my loyalty to someone else's 100% solution.  But solving 98%*
> of a problem while not making the other 2% any worse is not "improper",
> just IMHO sensible engineering.
> * I'm approximating the total number of PV 32-bit and PV superpage
> guests as 2%.  Substitute a different number if you like, but
> the number is certainly getting smaller over time, not growing.
> Tim, thanks again for your useful input.
> Thanks,
> Dan

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.