[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Claim mode and HVM PoD interact badly



On 01/10/2014 04:05 PM, Konrad Rzeszutek Wilk wrote:
On Fri, Jan 10, 2014 at 03:56:13PM +0000, Ian Campbell wrote:
On Fri, 2014-01-10 at 10:28 -0500, Konrad Rzeszutek Wilk wrote:
On Fri, Jan 10, 2014 at 03:16:25PM +0000, Ian Campbell wrote:
On Fri, 2014-01-10 at 09:58 -0500, Konrad Rzeszutek Wilk wrote:
Which implies to me that we _need_ the 'maxmem' amount of memory at boot time.
And then it is the responsibility of the balloon driver to give the memory
back (and this is where the 'static-max' et al come in play to tell the
balloon driver to balloon out).
PoD exists purely so that we don't need the 'maxmem' amount of memory at
boot time. It is basically there in order to let the guest get booted
far enough to load the balloon driver to give the memory back.

It's basically a boot time zero-page sharing mechanism AIUI.
But it does look to gulp up hypervisor memory and return it during
allocation of memory for the guest.
It should be less than the maxmem-memory amount though. Perhaps because
Wei is using relatively small sizes the pod cache ends up being a
similar size to the saving?

Or maybe I just don't understand PoD, since the code you quote does seem
to contradict that.

Or maybe libxl's calculation of pod_target is wrong?

 From reading the code the patch seems correct - we will _need_ that
extra 128MB 'claim' to allocate/free the 128MB extra pages. They
are temporary as we do free them.
It does makes sense that the PoD cache should be included in the claim,
I just don't get why the cache is so big...
I think it expands and shrinks to make sure that the memory is present
in the hypervisor. If there is not enough memory it would -ENOMEM and
the toolstack would know immediately.

But that seems silly - as that memory might be in the future used
by other guests and then you won't be able to use said cache. But since
it is a "cache" I guess that is OK.

Sorry, "cache" is a bit of a misnomer.  It really should be "pool".

The basic idea is to allocate memory to the guest *without assigning it to specific p2m entries*. Then the PoD mechanism will shuffle the memory around behind the p2m entries as needed until the balloon driver comes up.

In the simple case, memory should only ever be allocated, not freed; for example:
* Admin sets target=1GiB, maxmem=2GiB
* Domain creation:
 - makes 2GiB of p2m, filling it with PoD entries rather than memory
 - allocates 1GiB of ram for the PoD "cache"
* PoD shuffles memory around to allow guest to boot
* Balloon driver comes up, and balloons down to target.
 - In theory, at this point #PoD entries in p2m == #pages in the "cache"

The basic complication comes in that there is no point at which we can be *certain* that all PoD entries have been eliminated. If the guest just doesn't touch some of its memory, there may be PoD entries outstanding (with corresponding memory in the "cache") indefinitely. Also, the admin may want to be able to change the target before the balloon driver hits it. So every time you change the target, you need to tell the PoD code that's what you're doing; it has carefully thought-out logic inside it to free or allocate more memory appropriately.

For example, in the example above, while the balloon driver has only ballooned down to 1.2 GiB, the admin may want to set the target to 1.5GiB. In that case, the PoD code would allocate an additional 0.2GiB (to cover the outstanding 0.2GiB of PoD entries in the p2m).

Anyway, if I understand correctly, the problem was that the memory allocated to the PoD "cache" wasn't being counted in the claim mode. That's the basic problem: memory in the PoD "cache" should be considered basically the same as memory in the p2m table for claim purposes.

 -George



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.