[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning

From: David Vrabel
>On 16/08/13 05:13, Yechen Li wrote:
>> +### nodemask VNODE\_TO\_PNODE(int vnode) ###
>> +
>> +This service is provided by the hypervisor (and wired, if necessary, all the
>> +way up to the proper toolstack layer or guest kernel), since it is only Xen
>> +that knows both the virtual and the physical topologies.
>The physical NUMA topology must not be exposed to guests that have a
>virtual NUMA topology -- only the toolstack and Xen should know the
>mapping between the two.
>A guest cannot make sensible use of a machine topology as it may be
>migrated to a host with a different topology.
See the other e-mail (me replying to George about something like
that being necessary for ballooning up, although, yes, it probably can
happen all in Xen, if that's considered better).

>> +## Description of the problem ##
>> +
>> +Let us use an example. Let's assume that guest _G_ has a virtual 2 vnodes,
>> +and that the memory for vnode #0 and #1 comes from pnode #0 and pnode #2,
>> +respectively.
>> +
>> +Now, the user wants to create a new guest, but the system is under high 
>> memory
>> +pressure, so he decides to try ballooning _G_ down. He sees that pnode #2 
>> has
>> +the best chances to accommodate all the memory for the new guest, which 
>> would
>> +be really good for performance, if only he can make space there. _G_ is the
>> +only domain eating some memory from pnode, #2 but, as said above, not all of
>> +its memory comes from there.
>It is not clear to me that this is the optimal decision.  What
>tools/information will be available that the user can use to make
>sensible decisions here?  e.g., is the current layout available to tools?
Well, the whole "free page from pnode #2" is more a tool than a decision. It's
a tool that will become available for better enact decisions made at some upper
level (i.e., admin, or toolstack). The current layout of how much memory is
occupied on what node by each guset is definitely something we should have
in place (even independently from this feature/series, I think). It's already
available via a Xen debug key, so it's just a matter of wiring it up (I think). 
give it a try as soon as I'll be back to work.

>Remember that the "user" in this example is most often some automated
>process and not a human.
Exactly. :-D

>I would like to see some real world examples of cases where this is
>In general, I'm not keen on adding ABIs or interfaces that don't solve
>real world problems, particularly if they're easy to misuse and end up
>with something that is very suboptimal.
I see what you mean, and certainly I don't disagree. It's a bit of a
chicken-&-egg, since I can't find real examples of something that
does not exist, but yes, I think we can investigate a bit more whether
or not something like this would be useful.

The reason I think it is is that we have an automatic initial placement
algorithm for VM that tries to find the smallest set of nodes to place a
VM on, every time we create one, and I think it would be nice to give
the admin (or some advanced toolstack) all the tools to maximize the
probabilities of such algorithm finding a suitable and nice for performance
solution... Right now the only one of this tool is "kill or migrate some VM
somewhere else", which is not that much... :-P

>If we decide we do need such control, I think the xenstore interface
>should look more like:
>  as before
>  target for virtual node 0
>  target for virtual node N
>I think this better reflects the goal which is an adjusted NUMA layout
>for the guest rather than the steps required to reach it (release P
>pages from node N).
Oh, cool, I really like this. Yechen, what do you think?

>The balloon driver attempts to reach target, whist simultaneously trying
>to reach the individual node targets.  It should prefer to balloon
>up/down on the node that is furthest from its node target.
And this is an interesting idea too.

>In cases where target and the sum of target-by-nid/N don't agree (or are
>not present) the balloon driver should use target, and balloon up/down
>evenly across all NUMA nodes.
And that would be fine too... As I said in another e-mail, what I propose to
Yechen is to start dealing with this latter case, i.e., get rid of the new 
(or pretend they're not there) and implement the evenly distribution of
ballooned pages across virtual NUMA nodes.

After that, we will move to a more advanced interface, if we'll deem it

>Finally a style comment, please avoid the use of a single gender
>specific pronouns in documentation/comments (i.e., don't always use
>he/his etc.).  I prefer to use a singular "they" but you could consider
>"he or she" or using "he" for some examples and "she" in others.
Good point. Personally, I think I prefer the "they" form, but I'm fine with

Thanks and Regards,

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.