[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2][PATCH 1/3] docs: design and intended usage for NUMA-aware ballooning



Hi Jan,

On Fri, Aug 16, 2013 at 5:09 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
This looks conceptually wrong: The balloon driver should have no
need to know about pNID-s; it should be the tool stack doing the
translation prior to writing the xenstore node.

Further, the new xenstore node would presumably better be a mask
than a single vNID, since in order to e.g. balloon up another guest
already spanning multiple nodes, giving the tool stack a way to ask
for memory on any of the spanned nodes.

Yeah, you are right. I'm also telling myself that it's not a good idea to
let guest OS knows the physical IDs.
These two transformation between p-nid and v-nid could be put either
inside of Xen, or inside of balloon, which is its current state.
Anyway, the interfaces from guest NUMA topology have not been
implemented yet. I'll mark this as an to-do issue and move it into Xen
in the future.
And finally, coming back what Tim had already pointed out - doing
things the way you propose can cause an imbalance in the
ballooned down guest, penalizing it in favor of not penalizing the
intended consumer of the recovered memory. Therefore I wonder
whether, without any new xenstore node, it wouldn't be better to
simply require conforming balloon drivers to balloon out memory
evenly across the domain's virtual nodes. 
I should say sorry here, but I'm not quite understand the "whether" part.
the "new xenstore node" just store the requirement from user, so that
balloon could read it. It's similar to ~/memory/target. This new node
could store either p-nodeid, or v-nodeid, according to the interfaces we
talked above is placed inside of xen, or inside of guest OS.
Do you have a better way to pass this requirement to balloon, instead of
create a new xenstore node? I'd be very happy if you have one, since
nor do I like the way I have done(create a new node) already!

> +The biggest difference between current and NUMA-aware ballooning is that the
> +latter needs to keep multiple lists of the ballooned pages in an array, with
> +one element for each virtual node. This way, it is always evident, at any
> +given time, what ballooned pages belong to what vnode.

That's wrong afaict: ballooned out pages aren't associated with any
memory, and hence can't be associated with any vNID. Once they
get re-populated, which vNID the memory belongs to is an attribute
of the memory coming in, not the control structure that it's to be
associated with.

I believe this thinking of yours stems from the fact that in Linux the
page control structures are associated with nodes by way of the
physical memory map being split into larger pieces, each coming from
a particular node. But other OSes don't need to follow this model,
and what you propose would also exclude extending the spanned
nodes set if memory gets ballooned in that's not associated with
any node the domain so far was "knowing" of. 
You are exactly right again, this design is only for Linux balloon driver.
For Linux, balloon can choose which page to balloon in/out. So we can
assocate the pages with v-nodeid.
For the other kinds of architechure, please forgive me that I haven't think
of that far...

> +Regarding the stealing a page from the OS part, it is enough to use the Linux
> +function alloc_page_node(), in place of alloc\_page().

Such statement seems to confirm that you're thinking Linux centric
instead of defining a generic model.

Jan

Yes.

And thank you again to spend your valuable time reviewing my patch!
I hope my answer could solve your questions. If not, please point it out
for me!


On Fri, Aug 16, 2013 at 5:09 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>> On 16.08.13 at 06:13, Yechen Li <lccycc123@xxxxxxxxx> wrote:
> +So, in NUMA aware ballooning, ballooning down and up works as follows:
> +
> +* target < current usage -- first of all, the ballooning driver uses the
> +  PNODE\_TO\_VNODE() service (provided by the virtual topology implementation,
> +  as explained above) to translate _pnid_ (that it reads from xenstore) to
> +  the id(s) of the corresponding set of vnode IDs, say _{vnids}_ (which will

This looks conceptually wrong: The balloon driver should have no
need to know about pNID-s; it should be the tool stack doing the
translation prior to writing the xenstore node.

Further, the new xenstore node would presumably better be a mask
than a single vNID, since in order to e.g. balloon up another guest
already spanning multiple nodes, giving the tool stack a way to ask
for memory on any of the spanned nodes.

And finally, coming back what Tim had already pointed out - doing
things the way you propose can cause an imbalance in the
ballooned down guest, penalizing it in favor of not penalizing the
intended consumer of the recovered memory. Therefore I wonder
whether, without any new xenstore node, it wouldn't be better to
simply require conforming balloon drivers to balloon out memory
evenly across the domain's virtual nodes.

> +The biggest difference between current and NUMA-aware ballooning is that the
> +latter needs to keep multiple lists of the ballooned pages in an array, with
> +one element for each virtual node. This way, it is always evident, at any
> +given time, what ballooned pages belong to what vnode.

That's wrong afaict: ballooned out pages aren't associated with any
memory, and hence can't be associated with any vNID. Once they
get re-populated, which vNID the memory belongs to is an attribute
of the memory coming in, not the control structure that it's to be
associated with.

I believe this thinking of yours stems from the fact that in Linux the
page control structures are associated with nodes by way of the
physical memory map being split into larger pieces, each coming from
a particular node. But other OSes don't need to follow this model,
and what you propose would also exclude extending the spanned
nodes set if memory gets ballooned in that's not associated with
any node the domain so far was "knowing" of.

> +Regarding the stealing a page from the OS part, it is enough to use the Linux
> +function alloc_page_node(), in place of alloc\_page().

Such statement seems to confirm that you're thinking Linux centric
instead of defining a generic model.

Jan




--
Yechen Li

Team of System Virtualization and Cloud Computing 
School of Electronic Engineering  and Computer Science

Peking University, China

Nothing is impossible because impossible itself  says: " I'm possible "
lccycc From PKU
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.