[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed new "memory capacity claim" hypercall/feature

> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> Subject: RE: Proposed new "memory capacity claim" hypercall/feature
> > Aren't we getting a little sidetracked here?  (Maybe my fault for
> > looking at whether this specific loop is fast enough...)
> >
> > This loop handles only order=N chunks of RAM.  Speeding up this
> > loop and holding the heap_lock here for a shorter period only helps
> > the TOCTOU race if the entire domain can be allocated as a
> > single order-N allocation.
> >
> > Domain creation is supposed to succeed as long as there is
> > sufficient RAM, _regardless_ of the state of memory fragmentation,
> > correct?
> >
> > So unless the code for the _entire_ memory allocation path can
> > be optimized so that the heap_lock can be held across _all_ the
> > allocations necessary to create an arbitrary-sized domain, for
> > any arbitrary state of memory fragmentation, the original
> > problem has not been solved.
> >
> > Or am I misunderstanding?
> I think we got here via questioning whether suppressing certain
> activities (like tmem causing the allocator visible amount of
> available memory) for a brief period of time would be acceptable,
> and while that indeed depends on the overall latency of memory
> allocation for the domain as a whole, I would be somewhat
> tolerant for it to involve a longer suspension period on a highly
> fragmented system.
> But of course, if this can be made work uniformly, that would be
> preferred.

Hi Jan and Keir --

OK, here's a status update.  Sorry for the delay but it took awhile
for me to refamiliarize myself with the code paths.

It appears that the attempt to use 2MB and 1GB pages is done in
the toolstack, and if the hypervisor rejects it, toolstack tries
smaller pages.  Thus, if physical memory is highly fragmented
(few or no order>=9 allocations available), this will result
in one hypercall per 4k page so a 256GB domain would require
64 million hypercalls.  And, since AFAICT, there is no sane
way to hold the heap_lock across even two hypercalls, speeding
up the in-hypervisor allocation path, by itself, will not solve
the TOCTOU race.

One option to avoid the 64M hypercalls is to change the Xen ABI to
add a new memory hypercall/subop to populate_physmap an arbitrary
amount of physical RAM, and have Xen (optionally) try order==18, then
order==9, then order==0.  I suspect that, even with the overhead
of hypercalls removed, the steps required to allocate 64 million pages
(including, for example, removing a page from a xen list
and adding it to the domain's page list) will consume enough time
that holding the heap_lock and/or suppressing micro-allocations for
the entire macro-allocation on a fragmented system will still be
unacceptable (e.g. at least tens of seconds).  However, I am
speculating, and I think I can measure it if you (Jan or Keir)
feel a measurement is necessary to fully convince you.

I think this brings us back to the proposed "claim" hypercall/subop.
Unless there are further objections or suggestions for different
approaches, I'll commence prototyping it, OK?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.