[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions

> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx]
> Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of 
> problem and alternate
> solutions
> Putting aside any bias or fixed mindedness the maintainers are not
> especially happy with the proposed fix, even within the constraints of
> the dynamic model. (It omits to cover various use cases and I think
> strikes many as something of a sticking plaster).

Sticking plaster: FWIW I agree.  But the wound it is covering
is that a buddy allocator is not well suited to atomically allocate
large quantities of potentially discontiguous memory, which is what
we need Xen to do to allocate all the memory to create a domain without
a race.  The whole concept of capacity allocation is a hack to work
around that limitation.  Maybe we could overhaul the allocator to
handle this better or maybe we could replace the whole allocator,
but IMHO, compared to those alternatives (especially considering a
likely bug tail), a plaster is far preferable.

Omits use cases:  I've stated my opinion on this several times
("prefer to fix 98% of a bug and not make the other 2%[1] worse
than fix 0%") and nobody has argued the point.  It's not uncommon
for a proposed Xen fix to solve a HVM problem and not a similar PV
problem, or vice-versa.  Maybe one should think of claim_pages as
a complete solution to the _HVM_ "delayed failure race problem"
that, coincidentally, also solves nearly all of the _PV_ "delayed
failure race problem".  Does that help? ;-)  So I see this not
as a reason to block the proposed hypercall, but as an indication
that some corner cases need to be put on someone's "to-do" list.
And, IMHO, prioritized very low on that person's to-do list.

[1] BTW, to clarify, the "2%" is PV domains (not HVM) with superpages=1
manually set in vm.cfg, plus 32-bit PV domains _only_ on systems
with >64GB physical RAM. So 2% is probably way too high.

> Given that I've been trying to suggest an alternative solution which
> works within the constraints of you model and happens to have the nice
> property of not requiring hypervisor changes. I genuinely think there is
> a workable solution to your problem in there, although you are correct
> that it mostly just an idea right now.

Let me also summarize my argument:

It's very hard to argue against ideas and I certainly don't
want to challenge anyone's genuineness (or extremely hard work as
a maintainer), but the hypervisor really does have very easy
atomic access to certain information and locks, and the toolstack
simply doesn't.  So the toolstack has to guess and/or create
unnecessary (and potentially dangerous) constraints to ensure
the information it collects doesn't race against changes to that
information (TOCTOU races).  And even if the toolstack could safely
create those constraints, it must create them severally against
multiple domains whereas the hypervisor can choose to enforce
only the total system constraint (i.e. max-of-sums is better than

So, I think we all agree with the goal:

"Any functionality which can be reasonably provided outside the
  hypervisor should be excluded from it."

Ian, I believe I have clearly proven that providing the claim
functionality outside the hypervisor can be done only by
taking away other functionality (e.g. unnecessarily constraining
guests which are doing dynamic allocation and requiring sum-of-maxes
rather than max-of-sums).

I hope you can finally agree and ack the hypervisor patch.

But first...

> That said the best suggestion for a solution I've seen so far was Tim's
> suggestion that tmem be more tightly integrated with memory allocation
> as another step towards the "memory scheduler" idea. So I wouldn't
> bother pursuing the maxmem approach further unless the tmem-integration
> idea doesn't pan out for some reason.

Please excuse my frustration and if I sound like a broken record,
but tmem, as it sits today (and has sat in the hypervisor for nearly
four years now) _is_ already a huge step towards the memory scheduler
idea, and _is_ already tightly integrated with the hypervisor
memory allocator.  In fact, one could say it is exactly because
this tight integration already exists that claim_pages needs to
be implemented as a hypercall.

I've repeatedly invited all Xen maintainers [2] to take some time to
truly understand how tmem works and why, but still have had no takers.
It's a very clever solution to a very hard problem, and it's all open
source and all shipping today; but it is not simple so unfortunately
can't be explained in a couple of paragraphs or a 10-minute call.
Please let me know if you want to know more.

[2] I think Jan and Keir fully understand the Xen mechanisms
  but perhaps not the guest-side or how tmem all works together
  and why.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.