[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] QEMU bumping memory bug analysis
On 06/08/2015 02:22 PM, Stefano Stabellini wrote: >> 3. A group of entities which operate in isolation by only ever >> increasing or descreasing the max pages according to their own >> requirements, without reference to anyone else. When QEMU >> entered the fray, and with the various libxl fixes since, you >> might think we are implementing this model, but we aren't >> because the hypervisor interface is only "set", not >> "increment/decrement" and so there is a racy read/modify/write >> cycle in every entity now. > > I don't think this is true: QEMU only sets maxmem at domain creation, > before "xl create" even returns. I think that is safe. We had an email > exchange at the time, I explained this behaviour and the general opinon > was that it is acceptable. I don't understand why it is not anymore. Well for one, nobody on the hypervisor side seems to have been brought in -- I definitely would have objected, and it sounds like AndyC would have objected too. I think we need to go back one level further. So first, let's make a distinction between *pages*, which are actual host RAM assigned to the guest and put in its p2m table, and *memory*, which is a virtualization construct (i.e., virtual RAM and video memory for virtual graphics cards). The hypervisor only cares about *pages*. It allocates pages to a domain, it puts them in the p2m. That's all it knows. The purpose of max_pages in the hypervisor is to make sure that no guest can allocate more host memory (pages) than it is allowed to have. How many pages is a particular guest allowed to have? Well pages are used for a number of purposes: * To implement virtual RAM in the guest * To implement video ram for virtual devices in qemu * To implement virtual ROMs * For magic "shared pages" used behind-the-scenes (not visible to the guest) (Feel free to add anything I missed.) max_pages in the hypervisor must be set to the sum of all the pages the domain is allowed to have. So the first point is this: Xen doesn't have a clue about any of those. It doesn't know how much virtual RAM a guest has, how much video RAM it has, how many virtual ROMs, how many magic shared pages, or anything. All it knows are what pages are in the p2m table. So although Xen certainly *enforces* max_pages, it is (at the moment) in no position to *decide* what max_pages should be. At the moment, in fact, nobody is. There is no single place that has a clear picture into how virtual RAM, guest devices, guest ROMs, and "magic pages" convert into actual number of pages. I think that's a bug. And at the moment, pages in the p2m are allocated by a number of entities: * In the libxc domain builder. * In the guest balloon driver * And now, in qemu, to allocate extra memory for virtual ROMs. Did I miss anything? For the first two, it's libxl that sets maxmem, based in its calculation of the size of virtual RAM plus various other bits that will be needed. Having qemu *also* set maxmem was always the wrong thing to do, IMHO. In theory, from the interface perspective, what libxl promises to provide is virtual RAM. When you say "memory=8192" in a domain config, that means (or should mean) 8192MiB of virtual RAM, exclusive of video RAM, virtual ROMs, and magic pages. Then when you say "xl mem-set 4096", it should again be aiming at giving the VM the equivalent of 4096MiB of virtual RAM, exclusive of video RAM, &c &c. We already have the problem that the balloon driver at the moment doesn't actually know how big the guest RAM is, nor , but is being told to make a balloon exactly big enough to bring the total RAM down to a specific target. I think we do need to have some place in the middle that actually knows how much memory is actually needed for the different sub-systems, so it can calculate and set maxmem appropriately. libxl is the obvious place. What about this: * Libxl has a maximum amount of RAM that qemu is *allowed* to use to set up virtual ROMs, video ram for virtual devices, &c * At start-of-day, it sets maxpages to PAGES(virtual RAM)+PAGES(magic) + max_qemu_pages * Qemu allocates as many pages as it needs for option ROMS, and writes the amount that it actually did use into a special node in xenstore. * When the domain is unpaused, libxl will set maxpages to PAGES(virtual RAM) + PAGES(magic) + actual_qemu_pages that it gets from xenstore. I think also that probably libxl, rather than setting a target amount of memory the balloon driver is supposed to aim at, should set the target size of the balloon. Once qemu tells it how many pages are actually being used for virtual devices, We could, in theory, expose all this information in xenstore such that *either* libxl or qemu would be able to calculate max_pages based on the numbers that were written there. And that would work if we could enforce a lock-step between the toolstack and qemu, as we can between Xen and the toolstack. But I think setting anything like this in stone is a really bad idea; which unfortulately excludes the idea of putting it in qemu. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |