Xen project Mailing List

Re: [Xen-devel] [PATCH] libxc: succeed silently on restore

From: Ian Campbell <Ian.Campbell@xxxxxxxxxxxxx>

Date: Thu, 2 Sep 2010 19:29:00 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Thu, 02 Sep 2010 11:29:45 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

On Thu, 2010-09-02 at 19:16 +0100, Brendan Cully wrote: > On Thursday, 02 September 2010 at 18:01, Ian Campbell wrote: > > So it turns out that there is a similar issue on migration: > > xc: Saving memory: iter 3 (last sent 37 skipped 0): 0/32768 > > 0%xc: error: rdexact failed (select returned 0): Internal error > > xc: error: Error when reading batch size (110 = Connection timed > > out): Internal error > > xc: error: error when buffering batch, finishing (110 = Connection > > timed out): Internal error > > > > I'm not so sure what can be done about this case, the way > > xc_domain_restore is (currently) designed it relies on the saving end to > > close its FD when it is done in order to generate an EOF at the receiver > > end to signal the end of the migration. > > > > The xl migration protocol has a postamble which prevents us from closing > > the FD and so instead what happens is that the sender finishes the save > > and then sits waiting for the ACK from the receiver so the receiver hits > > the remus heartbeat timeout which causes us to continue. This isn't > > ideal from the downtime point of view nor from just a general design > > POV. > > > > Perhaps we should insert an explicit done marker into the xc save > > protocol which would be appended in the non-checkpoint case? Only the > > save end is aware if the migration is a checkpoint or not (and only > > implicitly via callbacks->checkpoint <> NULL) but that is OK, I think. > > I think this can be done trivially? We can just add another negative > length record at the end of memory copying (like the debug flag, tmem, > hvm extensions, etc) if we're running the new xl migration protocol > and expect restore to exit after receiving the first full > checkpoint. Or, if you're not as worried about preserving the existing > semantics, make the minus flag indicate that callbacks->checkpoint is > not null, and only continue reading past the first complete checkpoint > if you see that minus flag on the receive side. > > Isn't that sufficient? It would probably work but isn't there a benefit to having the receiver know that it is partaking in a multiple checkpoint restore and being told how many iterations there were etc? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.