[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time





On 06/09/2015 03:30 PM, Andrew Cooper wrote:
On 09/06/2015 01:59, Yang Hongyang wrote:


On 06/08/2015 06:15 PM, Andrew Cooper wrote:
On 08/06/15 10:58, Yang Hongyang wrote:


On 06/08/2015 05:46 PM, Andrew Cooper wrote:
On 08/06/15 04:43, Yang Hongyang wrote:
ioreq page contains evtchn which will be set when we resume the
secondary vm the first time. The hypervisor will check if the
evtchn is corrupted, so we cannot zero the ioreq page more
than one time.

The ioreq->state is always STATE_IOREQ_NONE after the vm is
suspended, so it is OK if we only zero it one time.

Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx>
CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

The issue here is that we are running the restore algorithm over a
domain which has already been running in Xen for a while.  This is a
brand new usecase, as far as I am aware.

Exactly.


Does the qemu process associated with this domain get frozen while the
secondary is being reset, or does the process get destroyed and
recreated.

What do you mean by reset? do you mean secondary is suspended at
checkpoint?

Well - at the point that the buffered records are being processed, we
are in the process of resetting the state of the secondary to match the
primary.

Yes, at this point, the qemu process associated with this domain is
frozen.
the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
pause
qemu. After we processed all records, qemu will be restored with the
received
state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() in
qemu)
api to restore qemu with received state. Currently in libxl, qemu only
start
with the received state, there's no api to load received state while
qemu is
running for a while.

Now I consider this more, it is absolutely wrong to not zero the page
here.  The event channel in the page is not guaranteed to be the same
between the primary and secondary,

That's why we don't zero it on secondary.

and we don't want to unexpectedly
find a pending/in-flight ioreq.

ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, there
should be no pending/in-flight ioreq at checkpoint.


Either qemu needs to take care of re-initialising the event channels
back to appropriate values, or Xen should tolerate the channels
disappearing.

~Andrew
.


--
Thanks,
Yang.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.