[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature

> From: Keir Fraser [mailto:keir@xxxxxxx]
> Subject: Re: Proposed new "memory capacity claim" hypercall/feature
> On 29/10/2012 21:08, "Dan Magenheimer" <dan.magenheimer@xxxxxxxxxx> wrote:
> > The core issue is that, in the hypervisor, every current method of
> > "allocating RAM" is slow enough that if you want to allocate millions
> > of pages (e.g. for a large domain), the total RAM can't be allocated
> > atomically.  In fact, it may even take minutes, so currently a large
> > allocation is explicitly preemptible, not atomic.
> >
> > The problems the proposal solves are (1) some toolstacks (including
> > Oracle's "cloud orchestration layer") want to launch domains in parallel;
> > currently xl/xapi require launches to be serialized which isn't very
> > scalable in a large data center;
> Well it does depend how scalable domain creation actually is as an
> operation. If it is spending most of its time allocating memory then it is
> quite likely that parallel creations will spend a lot of time competing for
> the heap spinlock, and actually there will be little/no speedup compared
> with serialising the creations. Further, if domain creation can take
> minutes, it may be that we simply need to go optimise that -- we already
> found one stupid thing in the heap allocator recently that was burining
> loads of time during large-memory domain creations, and fixed it for a
> massive speedup in that particular case.

I suppose ultimately it is a scalability question.  But Oracle's
measure of success here is based on how long a human or a tool
has to wait for confirmation to ensure that a domain will
successfully launch.  If two domains are launched in parallel
AND an indication is given that both will succeed, spinning on
the heaplock a bit just makes for a longer "boot" time, which is
just a cost of virtualization.  If they are launched in parallel
and, minutes later (or maybe even 20 seconds later), one or
both say "oops, I was wrong, there wasn't enough memory, so
try again", that's not OK for data center operations, especially if
there really was enough RAM for one, but not for both. Remember,
in the Oracle environment, we are talking about an administrator/automation
overseeing possibly hundreds of physical servers, not just a single

Does that make more sense?

The "claim" approach immediately guarantees success or failure.
Unless there are enough "stupid things/optimisations" found that
you would be comfortable putting memory allocation for a domain
creation in a hypervisor spinlock, there will be a race unless
an atomic mechanism exists such as "claiming" where
only simple arithmetic must be done within a hypervisor lock.

Do you disagree?

> > and (2) tmem and/or other dynamic
> > memory mechanisms may be asynchronously absorbing small-but-significant
> > portions of RAM for other purposes during an attempted domain launch.
> This is an argument against allocate-rather-than-reserve? I don't think that
> makes sense -- so is this instead an argument against
> reservation-as-a-toolstack-only-mechanism? I'm not actually convinced yet we
> need reservations *at all*, before we get down to where it should be
> implemented.

I'm not sure if we are defining terms the same, so that's hard
to answer.  If you define "allocation" as "a physical RAM page frame
number is selected (and possibly the physical page is zeroed)",
then I'm not sure how your definition of "reservation" differs
(because that's how increase/decrease_reservation are implemented
in the hypervisor, right?).

Or did you mean "allocate-rather-than-claim" (where "allocate" is
select a specific physical pageframe and "claim" means do accounting
only?  If so, see the atomicity argument above.

I'm not just arguing against reservation-as-a-toolstack-mechanism,
I'm stating I believe unequivocally that reservation-as-a-toolstack-
only-mechanism and tmem are incompatible.  (Well, not _totally_
incompatible... the existing workaround, tmem freeze/thaw, works
but is also single-threaded and has fairly severe unnecessary
performance repercussions.  So I'd like to solve both problems
at the same time.)


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.