[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore



On 03/06/15 14:22, George Dunlap wrote:
> On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu <wei.liu2@xxxxxxxxxx> wrote:
>> Previous discussion at [0].
>>
>> For the benefit of discussion, we refer to max_memkb inside hypervisor
>> as hv_max_memkb (name subject to improvement). That's the maximum number
>> of memory a domain can use.
> Why don't we try to use "memory" for virtual RAM that we report to the
> guest, and "pages" for what exists inside the hypervisor?  "Pages" is
> the term the hypervisor itself uses internally (i.e., set_max_mem()
> actually changes a domain's max_pages value).
>
> So in this case both guest memory and option roms are created using
> hypervisor pages.
>
>> Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
>> because of optional ROMs etc.
> So a translation of this using "memory/pages" terminology would be:
>
> QEMU may need extra pages from Xen to implement option ROMS, and so at
> the moment it calls set_max_mem() to increase max_pages so that it can
> allocate more pages to the guest.  libxl doesn't know what max_pages a
> domain needs prior to qemu start-up.
>
>> Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
>> there is no mechanism to community between QEMU and libxl. This is an
>> area that needs improvement, we've encountered problems in this area
>> before.
> [translating]
> Libxl doesn't know max_pages  even after qemu start-up, because there
> is no mechanism to communicate between qemu and libxl.
>
>> QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
>> pages are only accounted in hypervisor. During migration, libxl
>> (currently) doesn't extract that value from hypervisor.
> [translating]
> qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
> Those pages are only accounted for in the hypervisor.  libxl
> (currently) does not extract that value from the hypervisor.
>
>> So now the problem is on the remote end:
>>
>> 1. Libxl indicates domain needs X pages.
>> 2. Domain actually needs X + N pages.
>> 3. Remote end tries to write N more pages and fail.
>>
>> This behaviour currently doesn't affect normal migration (that you
>> transfer libxl JSON to remote, construct a domain, then start QEMU)
>> because QEMU won't bump hv_max_memkb again. This is by design and
>> reflected in QEMU code.
> I don't understand this paragraph -- does the remote domain actually
> need X+N pages or not?  If it does, in what way does this behavior
> "not affect normal migration"?
>
>> This behaviour affects COLO and becomes a bug in that case, because
>> secondary VM's QEMU doesn't go through the same start-of-day
>> initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
>> hv_max_memkb inside QEMU.
>>
>> Andrew plans to embed JSON inside migration v2 and COLO is based on
>> migration v2. The bug is fixed if JSON is correct in the first place.
>>
>> As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
>> it should be fixed for the benefit of COLO.
>>
>> So here is a proof of concept patch to record and honour that value
>> during migration.  A new field is added in IDL. Note that we don't
>> provide xl level config option for it and mandate it to be default value
>> during domain creation. This is to prevent libxl user from using it to
>> avoid unforeseen repercussions.
>>
>> This patch is compiled test only. If we agree this is the way to go I
>> will test and submit a proper patch.
> Reading max_pages from Xen and setting it on the far side seems like a
> reasonable option.

It is the wrong level to fix the bug.  Yes - it will (and does) fix one
manifestation of the bug, but does not solve the problem.

>   Is there a reason we can't add a magic XC_SAVE_ID
> for v1, like we do for other parameters?

Amongst other things, playing with xc_domain_setmaxmem() is liable to
cause a PoD domain to be shot by Xen because the PoD cache was not
adjusted at the same time that maxmem was.

Only libxl is in a position to safely adjust domain memory.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.