[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Domain Save Image Format proposal (draft B)
On Mon, Feb 10, 2014 at 7:35 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
True. But why should we explicitly convert the application level data to network byte order and then convert it back to host byte order, when its
already going to be done by the underlying stack, as you put it?
May be I got it wrong. I vaguely recall some sort of a crc checksum being stored along with the saved memory snapshots. But that could have been someone else's research code. Sorry..
Lets see.. Am I certain that all migration is happening over TCP ? yes. Worst case reliable UDP. By reliable, I just mean no bit errors or such stuff. I am not talking about security.
Absolutely not. Which is why I was under the impression that the image wide checksum would detect a corrupt image.
Nope. But I am fairly certain that good old TCP and IP checksums + the ethernet's checksum have been put in place to detect these errors and recover transparent to the application. Are you are implying
that there is some remote corner case that allows corrupt data to escape all of these three checks in the network stack and percolate to the application layer? I don't think so.
If you are implying that the DRAMs cause memory bit errors that flip bits here and here, wreaking havoc, then probably yes, checksums make sense. However, with ECC memory modules being the norm (please
correct me if I wrong about this), why start bothering now, if we didn't over the last 3 years? What has changed? My point here being, checksums seem like unnecessary compute overhead when doing live migration
or Remus. One can simply set this field to 0 when doing live migration/Remus. And, as you said later in this mail, data transmission overhead is not that much. However, as far as storing snapshots in disks is concerned, I totally agree that there needs to be some
form of a checksum to ensure that the data has not been corrupted. But why have record-level checksums? It is not as if we can recover the corrupted records. Majority of the use cases are, IMO, do or die. If checksum
is correct, then start the restore process. Else abort. So why not have an image wide checksum?
thanks shriram _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |