[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Domain Save Image Format proposal (draft B)

On Tue, Feb 11, 2014 at 5:58 AM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote:
On 10/02/14 20:00, Shriram Rajagopalan wrote:
> On Mon, Feb 10, 2014 at 9:20 AM, David Vrabel <david.vrabel@xxxxxxxxxx
> <mailto:david.vrabel@xxxxxxxxxx>> wrote:
> Its tempting to adopt all the TCP-style madness for transferring a set of
> structured data.  Why this endian-ness mess?  Am I missing something here?
> I am assuming that a lion's share of Xen's deployment is on x86
> (not including Amazon). So that leaves ARM.  Why not let these
> processors take the hit of endian-ness conversion?

I'm not sure I would characterize a spec being precise about byte
ordering as "endianness mess".

I think it would be a pretty poor specification if it didn't specify
byte ordering -- we can't have the tools having to make assumptions
about the ordering.

Totally agree. But as someone else put it (and you did as well), my point was
that its sufficient to specify it once, somewhere in the image header and making
sure that (as you put it below), that the current use cases don't have to go through 
needless endian conversion.

However, I do think it can be specified in such a way that all the
current use cases don't have to do any byte swapping (except for the
minimal header).

>         +-----------------------+-------------------------+
>         | checksum              | (reserved)              |
>         +-----------------------+-------------------------+
> I am assuming that you the checksum field is present only
> for debugging purposes? Otherwise, I see no reason for the
> computational overhead, given that we are already sending data
> over a reliable channel + IIRC we already have an image-wide checksum
> when saving the image to disk.

I'm not aware of any image wide checksum.

Yep. I was mistaken.

The checksum seems like a potentially useful feature but I don't have a
requirement for it so if no one else thinks it is useful it can be removed.

My suggestion is that when saving the image to disk, why not have a single 
image-wide checksum to ensure that the image from disk being restored is still valid?
>     ---------
>     --------------------------------------------------------------------
>     Field       Description
>     ----------- --------------------------------------------------------
>     count       Number of pages described in this record.
>     pfn         An array of count PFNs. Bits 63-60 contain
>                 the XEN\_DOMCTL\_PFINFO_* value for that PFN.
>     page_data   page_size octets of uncompressed page contents for each page
>                 set as present in the pfn array.
>     --------------------------------------------------------------------
> s/uncompressed/(compressed/uncompressed)/
> (Remus sends compressed data)

No.  I think compressed page data should have its own record type. The
current scheme of mode flipping records seems crazy to me.

What record flipping? For page compression, Remus basically has a simple 
XOR+RLE encoded sequence of bytes, preceded by a 4-byte length field. 
Instead of sending the usual 4K per-page page_data, this compressed chunk is sent.
The additional code on the remote side is an additional "if" block, that uses
 xc_uncompess instead of memcpy to get the uncompressed page.

It would not change the way the PAGE_DATA record would be transmitted.

Though, one potentially cooler addition could be to use the option field of the record header
to indicate whether the data is compressed or not. Given that we have 64 bits, we could even
go as far as specifying the type of compression module used (e.g., none, remus, gzip, etc.).
This might be really helpful when one wants to save/restore large images (a 8GB VM for example)
to/from disks. Is this better/worse than simply gzipping the entire saved image? I don't know yet.

However, for live migration, this would be pretty helpful (especially when migrating over long latency
networks).  Remus' compression technique cannot be used for live migration as it requires a previous
version of pages for XOR+RLE compression.  However, gzip and other such compression algorithms
would be pretty handy in the live migration case, over WAN or even a clogged LAN, where there
are tons of VMs being moved back and forth.

Feel free to shoot down this idea if it seems unfeasible.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.