[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/1] xc_domain_restore: Allow QEMU to increase memory





On 04/14/2015 05:28 PM, Andrew Cooper wrote:
On 14/04/15 10:22, Hongyang Yang wrote:


On 04/14/2015 04:53 PM, Andrew Cooper wrote:
On 14/04/15 00:51, Don Slutz wrote:
On 04/13/15 12:25, Andrew Cooper wrote:
On 13/04/15 17:09, Don Slutz wrote:
If QEMU has called on xc_domain_setmaxmem to add more memory for
option ROMs, domain restore needs to also increase the memory.

Signed-off-by: Don Slutz <dslutz@xxxxxxxxxxx>
hvmloader has no interaction with xc_domain_restore().
I did not mean to imply that it did.  Somehow I lost the fact that this
is a bug in master:

[root@dcs-xen-52 don]# xl cre -p /home/don/e1000x8.xfg
Parsing config from /home/don/e1000x8.xfg
got a tsc mode string: "native_paravirt"
[root@dcs-xen-52 don]# xl save e1000x8 e1000x8.save
Saving to e1000x8.save new xl format (info 0x1/0x0/3506)
xc: Saving memory: iter 0 (last sent 0 skipped 0): 1044481/1044481
100%
[root@dcs-xen-52 don]# xl restore e1000x8.save
Loading new save file e1000x8.save (new xl fmt info 0x1/0x0/3506)
   Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: error: Failed to allocate memory for batch.!: Internal error
libxl: error: libxl_create.c:1057:libxl__xc_domain_restore_done:
restoring domain: Cannot allocate memory
libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot
(re-)build domain: -3
libxl: error: libxl.c:1576:libxl__destroy_domid: non-existant domain 2
libxl: error: libxl.c:1540:domain_destroy_callback: unable to destroy
guest with domid 2
libxl: error: libxl_create.c:1490:domcreate_destruction_cb: unable to
destroy domain 2 following failed creation
[root@dcs-xen-52 don]#

The hvmloader part turns out to be a "warning message" that is ok to
ignore.  However "xl save" followed by "xl restore" is currently broken
without some fix.

It is xl's job to propagate the current memory as part of
migration.  If
qemu has had to bump it up, this should be reflected in the domain
config file.
Not at all sure how QEMU (either in dom0 or a driver domain) is
expected
to change a file (the domain config file).  My guess is that you are
saying that "xl save" (before xc_domain_save) is the correct place to
record (or add extra info).   However it looks to me that all the data
needed is in the save file, but
I could be wrong.

The page data is correctly saved into the save file (migration stream).
However when
the new domain is created, it's size is "wrong".  You cannot adjust any
of the config info to fix the size, because the option ROM space to
reserve is not an existing config option.   So if I am following you
correctly, libxl needs to add and process more info to handle this
case.

The migration path should not be hacked up to fix a higher level
toolstack bug.
I do not see how QEMU is part of the "higher level toolstack".  If you
mean xl vs xc, then
I can see "xl save" some how doing the work.  This patch works for me,
but I am happy to
explore other ways to fix this bug.

If I understand correctly, the steps are this:

* 'xl create' makes a VM of size $FOO
* qemu bumps the size to $FOO+$N
* 'xl save' writes $FOO+$N of page data, but the xl config file at the
start of the image still says $FOO
* 'xl restore' creates a vm of size $FOO, then instructs
xc_domain_restore() to put $FOO+$N pages into it.

I would argue first, that qemu should not play in this area to start
with.

However, the real bug here is that the domain configuration written by
xl save is inaccurate.

There's a case like COLO:
1. Both Primary/Secondary VM are created first with the same config file
    which makes a VM of size $FOO
2. qemu bumps the size to $FOO+$N
3. 'save' writes $FOO+$N of page data
4. 'restore' put $FOO+$N pages into $FOO pages which will cause error

Even if you fix the configuration, the bug still exists because the
config
only been transferred from Primary to Secondary once at the very
beginning
when you execute 'xl remus' command. The problem is how to let the
secondary
(restore) side knows the size change? Through a migration
command(which is
easier in v2 migration) or some other solution?
Form this point of view, I think Don's solution is one way to solve the
problem.

Funny you should ask that.  Migrationv2 for libxl moves the JSON config
blob into the libxl stream, rather than being a singleshot action at the
very start.  From that point, it becomes plausible to send a new JSON
blob when an update on the source side occurs.

That should solve the problem, but Migrationv2 for libxl won't be in upstream
for the moment, the bug still exists, is there a way to solve the problem
now under v1 migration?


However, I am still firmly of the opinion that is an xl/libxl bug to be
fixed at that level, not hacked around in the restore code.

~Andrew
.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.