[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of problem and alternate solutions
> From: Ian Campbell [mailto:Ian.Campbell@xxxxxxxxxx] > Subject: Re: [Xen-devel] Proposed XENMEM_claim_pages hypercall: Analysis of > problem and alternate > solutions > > Putting aside any bias or fixed mindedness the maintainers are not > especially happy with the proposed fix, even within the constraints of > the dynamic model. (It omits to cover various use cases and I think > strikes many as something of a sticking plaster). Sticking plaster: FWIW I agree. But the wound it is covering is that a buddy allocator is not well suited to atomically allocate large quantities of potentially discontiguous memory, which is what we need Xen to do to allocate all the memory to create a domain without a race. The whole concept of capacity allocation is a hack to work around that limitation. Maybe we could overhaul the allocator to handle this better or maybe we could replace the whole allocator, but IMHO, compared to those alternatives (especially considering a likely bug tail), a plaster is far preferable. Omits use cases: I've stated my opinion on this several times ("prefer to fix 98% of a bug and not make the other 2%[1] worse than fix 0%") and nobody has argued the point. It's not uncommon for a proposed Xen fix to solve a HVM problem and not a similar PV problem, or vice-versa. Maybe one should think of claim_pages as a complete solution to the _HVM_ "delayed failure race problem" that, coincidentally, also solves nearly all of the _PV_ "delayed failure race problem". Does that help? ;-) So I see this not as a reason to block the proposed hypercall, but as an indication that some corner cases need to be put on someone's "to-do" list. And, IMHO, prioritized very low on that person's to-do list. [1] BTW, to clarify, the "2%" is PV domains (not HVM) with superpages=1 manually set in vm.cfg, plus 32-bit PV domains _only_ on systems with >64GB physical RAM. So 2% is probably way too high. > Given that I've been trying to suggest an alternative solution which > works within the constraints of you model and happens to have the nice > property of not requiring hypervisor changes. I genuinely think there is > a workable solution to your problem in there, although you are correct > that it mostly just an idea right now. Let me also summarize my argument: It's very hard to argue against ideas and I certainly don't want to challenge anyone's genuineness (or extremely hard work as a maintainer), but the hypervisor really does have very easy atomic access to certain information and locks, and the toolstack simply doesn't. So the toolstack has to guess and/or create unnecessary (and potentially dangerous) constraints to ensure the information it collects doesn't race against changes to that information (TOCTOU races). And even if the toolstack could safely create those constraints, it must create them severally against multiple domains whereas the hypervisor can choose to enforce only the total system constraint (i.e. max-of-sums is better than sum-of-maxes). So, I think we all agree with the goal: "Any functionality which can be reasonably provided outside the hypervisor should be excluded from it." Ian, I believe I have clearly proven that providing the claim functionality outside the hypervisor can be done only by taking away other functionality (e.g. unnecessarily constraining guests which are doing dynamic allocation and requiring sum-of-maxes rather than max-of-sums). I hope you can finally agree and ack the hypervisor patch. But first... > That said the best suggestion for a solution I've seen so far was Tim's > suggestion that tmem be more tightly integrated with memory allocation > as another step towards the "memory scheduler" idea. So I wouldn't > bother pursuing the maxmem approach further unless the tmem-integration > idea doesn't pan out for some reason. Please excuse my frustration and if I sound like a broken record, but tmem, as it sits today (and has sat in the hypervisor for nearly four years now) _is_ already a huge step towards the memory scheduler idea, and _is_ already tightly integrated with the hypervisor memory allocator. In fact, one could say it is exactly because this tight integration already exists that claim_pages needs to be implemented as a hypercall. I've repeatedly invited all Xen maintainers [2] to take some time to truly understand how tmem works and why, but still have had no takers. It's a very clever solution to a very hard problem, and it's all open source and all shipping today; but it is not simple so unfortunately can't be explained in a couple of paragraphs or a 10-minute call. Please let me know if you want to know more. [2] I think Jan and Keir fully understand the Xen mechanisms but perhaps not the guest-side or how tmem all works together and why. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |