[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v2 12/27] tools/python: Other migration infrastructure



Contains:
 * Reverse-engineered notes of the legacy format from xg_save_restore.h
 * Python implementation of the legacy format
 * Public HVM Params used in the legacy stream
 * XL header format

Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
CC: Ian Campbell <Ian.Campbell@xxxxxxxxxx>
CC: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
CC: Wei Liu <wei.liu2@xxxxxxxxxx>

---
New in v2 - removes various many magic numbers from subsequent scripts
---
 tools/python/xen/migration/legacy.py |  279 ++++++++++++++++++++++++++++++++++
 tools/python/xen/migration/public.py |   21 +++
 tools/python/xen/migration/xl.py     |   12 ++
 3 files changed, 312 insertions(+)
 create mode 100644 tools/python/xen/migration/legacy.py
 create mode 100644 tools/python/xen/migration/public.py
 create mode 100644 tools/python/xen/migration/xl.py

diff --git a/tools/python/xen/migration/legacy.py 
b/tools/python/xen/migration/legacy.py
new file mode 100644
index 0000000..2f2240a
--- /dev/null
+++ b/tools/python/xen/migration/legacy.py
@@ -0,0 +1,279 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Libxc legacy migration streams
+
+Documentation and record structures for legacy migration
+"""
+
+"""
+SAVE/RESTORE/MIGRATE PROTOCOL
+=============================
+
+The general form of a stream of chunks is a header followed by a
+body consisting of a variable number of chunks (terminated by a
+chunk with type 0) followed by a trailer.
+
+For a rolling/checkpoint (e.g. remus) migration then the body and
+trailer phases can be repeated until an external event
+(e.g. failure) causes the process to terminate and commit to the
+most recent complete checkpoint.
+
+HEADER
+------
+
+unsigned long        : p2m_size
+
+extended-info (PV-only, optional):
+
+  If first unsigned long == ~0UL then extended info is present,
+  otherwise unsigned long is part of p2m. Note that p2m_size above
+  does not include the length of the extended info.
+
+  extended-info:
+
+    unsigned long    : signature == ~0UL
+    uint32_t           : number of bytes remaining in extended-info
+
+    1 or more extended-info blocks of form:
+    char[4]          : block identifier
+    uint32_t         : block data size
+    bytes            : block data
+
+    defined extended-info blocks:
+    "vcpu"             : VCPU context info containing vcpu_guest_context_t.
+                       The precise variant of the context structure
+                       (e.g. 32 vs 64 bit) is distinguished by
+                       the block size.
+    "extv"           : Presence indicates use of extended VCPU context in
+                       tail, data size is 0.
+
+p2m (PV-only):
+
+  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
+
+BODY PHASE - Format A (for live migration or Remus without compression)
+----------
+
+A series of chunks with a common header:
+  int              : chunk type
+
+If the chunk type is +ve then chunk contains guest memory data, and the
+type contains the number of pages in the batch:
+
+    unsigned long[]  : PFN array, length == number of pages in batch
+                       Each entry consists of XEN_DOMCTL_PFINFO_*
+                       in bits 31-28 and the PFN number in bits 27-0.
+    page data        : PAGE_SIZE bytes for each page marked present in PFN
+                       array
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If chunk type is 0 then body phase is complete.
+
+
+BODY PHASE - Format B (for Remus with compression)
+----------
+
+A series of chunks with a common header:
+  int              : chunk type
+
+If the chunk type is +ve then chunk contains array of PFNs corresponding
+to guest memory and type contains the number of PFNs in the batch:
+
+    unsigned long[]  : PFN array, length == number of pages in batch
+                       Each entry consists of XEN_DOMCTL_PFINFO_*
+                       in bits 31-28 and the PFN number in bits 27-0.
+
+If the chunk type is -ve then chunk consists of one of a number of
+metadata types.  See definitions of XC_SAVE_ID_* below.
+
+If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
+chunk consists of compressed page data, in the following format:
+
+    unsigned long        : Size of the compressed chunk to follow
+    compressed data :      variable length data of size indicated above.
+                           This chunk consists of compressed page data.
+                           The number of pages in one chunk depends on
+                           the amount of space available in the sender's
+                           output buffer.
+
+Format of compressed data:
+  compressed_data = <deltas>*
+  delta           = <marker, run*>
+  marker          = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
+  RUNFLAG         = 0
+  SKIPFLAG        = 1 << 7
+  RUNLEN          = 7-bit unsigned value indicating number of WORDS in the run
+  run             = string of bytes of length sizeof(WORD) * RUNLEN
+
+   If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data 
following
+  the marker is copied into the target page at the appropriate offset 
indicated by
+  the offset_ptr
+   If marker contains SKIPFLAG, then the offset_ptr is advanced
+  by RUNLEN * sizeof(WORD).
+
+If chunk type is 0 then body phase is complete.
+
+There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA,
+containing compressed pages. The compressed chunks are collated to form
+one single compressed chunk for the entire iteration. The number of pages
+present in this final compressed chunk will be equal to the total number
+of valid PFNs specified by the +ve chunks.
+
+At the sender side, compressed pages are inserted into the output stream
+in the same order as they would have been if compression logic was absent.
+
+Until last iteration, the BODY is sent in Format A, to maintain live
+migration compatibility with receivers of older Xen versions.
+At the last iteration, if Remus compression was enabled, the sender sends
+a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the
+BODY in Format B from the next iteration onwards.
+
+An example sequence of chunks received in Format B:
+    +16                              +ve chunk
+    unsigned long[16]                PFN array
+    +100                             +ve chunk
+    unsigned long[100]               PFN array
+    +50                              +ve chunk
+    unsigned long[50]                PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA       TAG
+      N                              Length of compressed data
+      N bytes of DATA                Decompresses to 166 pages
+
+    XC_SAVE_ID_*                     other xc save chunks
+    0                                END BODY TAG
+
+Corner case with checkpoint compression:
+    At sender side, after pausing the domain, dirty pages are usually
+  copied out to a temporary buffer. After the domain is resumed,
+  compression is done and the compressed chunk(s) are sent, followed by
+  other XC_SAVE_ID_* chunks.
+    If the temporary buffer gets full while scanning for dirty pages,
+  the sender stops buffering of dirty pages, compresses the temporary
+  buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA.
+  The sender then resumes the buffering of dirty pages and continues
+  scanning for the dirty pages.
+    For e.g., assume that the temporary buffer can hold 4096 pages and
+  there are 5000 dirty pages. The following is the sequence of chunks
+  that the receiver will see:
+
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+    +1024                       +ve chunk
+    unsigned long[1024]         PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA  TAG
+     N                          Length of compressed data
+     N bytes of DATA            Decompresses to 4096 pages
+
+    +4                          +ve chunk
+    unsigned long[4]            PFN array
+
+    XC_SAVE_ID_COMPRESSED_DATA  TAG
+     M                          Length of compressed data
+     M bytes of DATA            Decompresses to 4 pages
+
+    XC_SAVE_ID_*                other xc save chunks
+    0                           END BODY TAG
+
+    In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with
+  +ve chunks arbitrarily. But at the receiver end, the following condition
+  always holds true until the end of BODY PHASE:
+   num(PFN entries +ve chunks) >= num(pages received in compressed form)
+
+TAIL PHASE
+----------
+
+Content differs for PV and HVM guests.
+
+HVM TAIL:
+
+ "Magic" pages:
+    uint64_t         : I/O req PFN
+    uint64_t         : Buffered I/O req PFN
+    uint64_t         : Store PFN
+ Xen HVM Context:
+    uint32_t         : Length of context in bytes
+    bytes            : Context data
+ Qemu context:
+    char[21]         : Signature:
+      "QemuDeviceModelRecord" : Read Qemu save data until EOF
+      "DeviceModelRecord0002" : uint32_t length field followed by that many
+                                bytes of Qemu save data
+      "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002".
+
+PV TAIL:
+
+ Unmapped PFN list   : list of all the PFNs that were not in map at the close
+    unsigned int     : Number of unmapped pages
+    unsigned long[]  : PFNs of unmapped pages
+
+ VCPU context data   : A series of VCPU records, one per present VCPU
+                       Maximum and present map supplied in XC_SAVE_ID_VCPUINFO
+    bytes:           : VCPU context structure. Size is determined by size
+                       provided in extended-info header
+    bytes[128]       : Extended VCPU context (present IFF "extv" block
+                       present in extended-info header)
+
+ Shared Info Page    : 4096 bytes of shared info page
+"""
+
+CHUNK_end                       = 0
+CHUNK_enable_verify_mode        = -1
+CHUNK_vcpu_info                 = -2
+CHUNK_hvm_ident_pt              = -3
+CHUNK_hvm_vm86_tss              = -4
+CHUNK_tmem                      = -5
+CHUNK_tmem_extra                = -6
+CHUNK_tsc_info                  = -7
+CHUNK_hvm_console_pfn           = -8
+CHUNK_last_checkpoint           = -9
+CHUNK_hvm_acpi_ioports_location = -10
+CHUNK_hvm_viridian              = -11
+CHUNK_compressed_data           = -12
+CHUNK_enable_compression        = -13
+CHUNK_hvm_generation_id_addr    = -14
+CHUNK_hvm_paging_ring_pfn       = -15
+CHUNK_hvm_monitor_ring_pfn      = -16
+CHUNK_hvm_sharing_ring_pfn      = -17
+CHUNK_toolstack                 = -18
+CHUNK_hvm_ioreq_server_pfn      = -19
+CHUNK_hvm_nr_ioreq_server_pages = -20
+
+chunk_type_to_str = {
+    CHUNK_end                       : "end",
+    CHUNK_enable_verify_mode        : "enable_verify_mode",
+    CHUNK_vcpu_info                 : "vcpu_info",
+    CHUNK_hvm_ident_pt              : "hvm_ident_pt",
+    CHUNK_hvm_vm86_tss              : "hvm_vm86_tss",
+    CHUNK_tmem                      : "tmem",
+    CHUNK_tmem_extra                : "tmem_extra",
+    CHUNK_tsc_info                  : "tsc_info",
+    CHUNK_hvm_console_pfn           : "hvm_console_pfn",
+    CHUNK_last_checkpoint           : "last_checkpoint",
+    CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location",
+    CHUNK_hvm_viridian              : "hvm_viridian",
+    CHUNK_compressed_data           : "compressed_data",
+    CHUNK_enable_compression        : "enable_compression",
+    CHUNK_hvm_generation_id_addr    : "hvm_generation_id_addr",
+    CHUNK_hvm_paging_ring_pfn       : "hvm_paging_ring_pfn",
+    CHUNK_hvm_monitor_ring_pfn      : "hvm_monitor_ring_pfn",
+    CHUNK_hvm_sharing_ring_pfn      : "hvm_sharing_ring_pfn",
+    CHUNK_toolstack                 : "toolstack",
+    CHUNK_hvm_ioreq_server_pfn      : "hvm_ioreq_server_pfn",
+    CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages",
+}
+
+# Up to 1024 pages (4MB) at a time
+MAX_BATCH = 1024
+
+# Maximum #VCPUs currently supported for save/restore
+MAX_VCPU_ID = 4095
diff --git a/tools/python/xen/migration/public.py 
b/tools/python/xen/migration/public.py
new file mode 100644
index 0000000..fab2f84
--- /dev/null
+++ b/tools/python/xen/migration/public.py
@@ -0,0 +1,21 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+Xen public ABI constants, used in migration
+"""
+
+HVM_PARAM_STORE_PFN             = 1
+HVM_PARAM_IOREQ_PFN             = 5
+HVM_PARAM_BUFIOREQ_PFN          = 6
+HVM_PARAM_VIRIDIAN              = 9
+HVM_PARAM_IDENT_PT              = 12
+HVM_PARAM_VM86_TSS              = 15
+HVM_PARAM_CONSOLE_PFN           = 17
+HVM_PARAM_ACPI_IOPORTS_LOCATION = 19
+HVM_PARAM_PAGING_RING_PFN       = 27
+HVM_PARAM_MONITOR_RING_PFN      = 28
+HVM_PARAM_SHARING_RING_PFN      = 29
+HVM_PARAM_IOREQ_SERVER_PFN      = 32
+HVM_PARAM_NR_IOREQ_SERVER_PAGES = 33
+HVM_PARAM_VM_GENERATION_ID_ADDR = 34
diff --git a/tools/python/xen/migration/xl.py b/tools/python/xen/migration/xl.py
new file mode 100644
index 0000000..978e744
--- /dev/null
+++ b/tools/python/xen/migration/xl.py
@@ -0,0 +1,12 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+"""
+XL migration stream format
+"""
+
+MAGIC = "Xen saved domain, xl format\n \0 \r"
+
+HEADER_FORMAT = "=IIII"
+
+MANDATORY_FLAG_STREAMV2 = 2
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.