[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] QEMU bumping memory bug analysis



On 06/08/2015 02:22 PM, Stefano Stabellini wrote:
>>      3. A group of entities which operate in isolation by only ever
>>         increasing or descreasing the max pages according to their own
>>         requirements, without reference to anyone else. When QEMU
>>         entered the fray, and with the various libxl fixes since, you
>>         might think we are implementing this model, but we aren't
>>         because the hypervisor interface is only "set", not
>>         "increment/decrement" and so there is a racy read/modify/write
>>         cycle in every entity now.
> 
> I don't think this is true: QEMU only sets maxmem at domain creation,
> before "xl create" even returns. I think that is safe. We had an email
> exchange at the time, I explained this behaviour and the general opinon
> was that it is acceptable. I don't understand why it is not anymore.

Well for one, nobody on the hypervisor side seems to have been brought
in -- I definitely would have objected, and it sounds like AndyC would
have objected too.

I think we need to go back one level further.

So first, let's make a distinction between *pages*, which are actual
host RAM assigned to the guest and put in its p2m table, and *memory*,
which is a virtualization construct (i.e., virtual RAM and video memory
for virtual graphics cards).

The hypervisor only cares about *pages*.  It allocates pages to a
domain, it puts them in the p2m.  That's all it knows.

The purpose of max_pages in the hypervisor is to make sure that no guest
can allocate more host memory (pages) than it is allowed to have.

How many pages is a particular guest allowed to have?

Well pages are used for a number of purposes:
* To implement virtual RAM in the guest
* To implement video ram for virtual devices in qemu
* To implement virtual ROMs
* For magic "shared pages" used behind-the-scenes (not visible to the
guest)

(Feel free to add anything I missed.)

max_pages in the hypervisor must be set to the sum of all the pages the
domain is allowed to have.

So the first point is this: Xen doesn't have a clue about any of those.
 It doesn't know how much virtual RAM a guest has, how much video RAM it
has, how many virtual ROMs, how many magic shared pages, or anything.
All it knows are what pages are in the p2m table.

So although Xen certainly *enforces* max_pages, it is (at the moment) in
no position to *decide* what max_pages should be.

At the moment, in fact, nobody is.  There is no single place that has a
clear picture into how virtual RAM, guest devices, guest ROMs, and
"magic pages" convert into actual number of pages.  I think that's a bug.

And at the moment, pages in the p2m are allocated by a number of entities:
* In the libxc domain builder.
* In the guest balloon driver
* And now, in qemu, to allocate extra memory for virtual ROMs.

Did I miss anything?

For the first two, it's libxl that sets maxmem, based in its calculation
of the size of virtual RAM plus various other bits that will be needed.
 Having qemu *also* set maxmem was always the wrong thing to do, IMHO.

In theory, from the interface perspective, what libxl promises to
provide is virtual RAM.  When you say "memory=8192" in a domain config,
that means (or should mean) 8192MiB of virtual RAM, exclusive of video
RAM, virtual ROMs, and magic pages.  Then when you say "xl mem-set
4096", it should again be aiming at giving the VM the equivalent of
4096MiB of virtual RAM, exclusive of video RAM, &c &c.

We already have the problem that the balloon driver at the moment
doesn't actually know how big the guest RAM is, nor , but is being told
to make a balloon exactly big enough to bring the total RAM down to a
specific target.

I think we do need to have some place in the middle that actually knows
how much memory is actually needed for the different sub-systems, so it
can calculate and set maxmem appropriately.  libxl is the obvious place.

What about this:
* Libxl has a maximum amount of RAM that qemu is *allowed* to use to set
up virtual ROMs, video ram for virtual devices, &c
* At start-of-day, it sets maxpages to PAGES(virtual RAM)+PAGES(magic) +
max_qemu_pages
* Qemu allocates as many pages as it needs for option ROMS, and writes
the amount that it actually did use into a special node in xenstore.
* When the domain is unpaused, libxl will set maxpages to PAGES(virtual
RAM) + PAGES(magic) + actual_qemu_pages that it gets from xenstore.

I think also that probably libxl, rather than setting a target amount of
memory the balloon driver is supposed to aim at, should set the target
size of the balloon.  Once qemu tells it how many pages are actually
being used for virtual devices,

We could, in theory, expose all this information in xenstore such that
*either* libxl or qemu would be able to calculate max_pages based on the
numbers that were written there.  And that would work if we could
enforce a lock-step between the toolstack and qemu, as we can between
Xen and the toolstack.  But I think setting anything like this in stone
is a really bad idea; which unfortulately excludes the idea of putting
it in qemu.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.