diff --git a/docs/misc/xsplice.markdown b/docs/misc/xsplice.markdown index 8ff51d2..29cd238 100644 --- a/docs/misc/xsplice.markdown +++ b/docs/misc/xsplice.markdown @@ -1,4 +1,4 @@ -# xSplice Design v1 (EXTERNAL RFC v2) +# xSplice Design v1 (EXTERNAL RFC v3) ## Rationale @@ -8,6 +8,13 @@ opcodes that have come about due to primarily security updates. This document describes the design of the API that would allow us to upload to the hypervisor binary patches. +The document is split in four sections: + - Detailed descriptions of the problem statement. + - Design of the data structures. + - Design of the hypercalls. + - Implementation notes that should be taken into consideration. + + ## Glossary * splice - patch in the binary code with new opcodes @@ -24,7 +31,7 @@ no gaps - so we have no luxury of 'moving' existing code and must either insert a trampoline to the new code to be executed - or only modify in-place the code if there is sufficient space. The placement of new code has to be done by hypervisor and the virtual address for the new code is allocated dynamically. -i + This implies that the hypervisor must compute the new offsets when splicing in the new trampoline code. Where the trampoline is added (inside the function we are patching or just the callers?) is also important. @@ -198,16 +205,16 @@ As such the hypercall **MUST** support an XSM policy to limit the what guest is allowed. If the system is booted with signature checking the signature checking will be enforced. -## Payload format +## Design of payload format The payload **MUST** contain enough data to allow us to apply the update and also safely reverse it. As such we **MUST** know: - * What the old code is expected to be. We **MUST** verify it against the - runtime code. + * What the old code is expected to be. We **MUST** be able verify it + against the runtime code. * The locations in memory to be patched. This can be determined dynamically via symbols or via virtual addresses. - * The new code to be used. + * The new code (or data) to will be patched in. * Signature to verify the payload. This binary format can be constructed using an custom binary format but @@ -221,216 +228,389 @@ there are severe disadvantages of it: As such having the payload in an ELF file is the sensible way. We would be carrying the various set of structures (and data) in the ELF sections under different names and with definitions. The prefix for the ELF section name -would always be: *.xsplice_* +would always be: *.xsplice* to match up to the names of the structures. Note that every structure has padding. This is added so that the hypervisor can re-use those fields as it sees fit. -There are five sections *.xsplice_* sections: +Earlier design attempted to ineptly explain the relations of the ELF sections +to each other without using proper ELF mechanism (sh_info, sh_link, data +structures using Elf_* types, etc). This design will explain in detail +the structures and how they are used together and not dig in the ELF +format - except mention that the section names should match the +structure names. + +### ASCII art of structures. + +The diagram below is ommiting some entries to easy the relationship explanation. + +
+ /---------------------\ + +->| xsplice_reloc_howto | + / \---------------------/ + /---------------\ 1:1/ + +->| xsplice_reloc | / + / | - howto +--/ 1:1 /----------------\ + / | - symbol +-------->| xsplice_symbol | + 1:N / \---------------/ / \----------------/ +/----------\ /--------------\ / / +| xsplice | 1:1 | xsplice_code | / 1:1/ +| - new +------->| - relocs +---/ 1:N /-----------------\ / +| - old +------->| - sections +----------->| xsplice_section | / +\----------/ | - patches +--\ | - symbol +/ 1:1 /----------------\ + \--------------/ \ | - addr +------->| .text or .data | + \ \----------------/ \----------------/ + \ + 1:N \ + \ /----------------\ + +-->| xsplice_patch | 1:1 /----------------\ + | - content +------>| binary code or | + \----------------/ | data | + \----------------/ + +- * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be referenced - during the update. This can contain the symbols (functions) that will be - patched, or the list of symbols (functions) to be checked pre-patching which - may not be on the stack. +### xsplice structures -* `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly construct - trampolines for an patch. We can have multiple locations for which we - need to insert an trampoline for a payload and each location might require - a different way of handling it. This would naturally reference the `.text` - section and its proper offset. The `.xsplice_reloc` is not directly concerned - with patches but rather is an ELF relocation - describing the target - of a relocation and how that is performed. They're also used for where - the new code references the run code too. +From the top (or left in the above diagram) the structures are: - * `.xsplice_sections`. The safety data for the old code and new code. - This contains an array of symbols (pointing to `.xsplice_symbols` to - and `.text`) which are to be used during safety and dependency checking. + * `xsplice`. The top most structure - contains the the name of the update, + the id to match against the hypervisor, the pointer to the metadata for + the new code and optionally the metadata for the old code. + * `xsplice_code`. The structure that ties all of this together and defines + the payload. Contains arrays of `xsplice_reloc`, `xsplice_section`, and + `xsplice_patch`. - * `.xsplice_patches`: The description of the new functions to be patched - in (size, type, pointer to code, etc.). + * `xsplice_reloc` contains telemtry used for patching - which describes the + targets to be patched and how to do it. - * `.xsplice_change`. The structure that ties all of this together and defines - the payload. + * `xsplice_section` - the safety data for the code. Contains pointer to the + symbol (`xsplice_symbols`) and pointer to the code (`.text`) or data (`.data`), + which are to be used during safety and dependency checking. -Additionally the ELF file would contain: + * `xsplice_patch`: the description of the new function to be patched in + along with the binary code or data. - * `.text` section for the new and old code (function). - * `.rela.text` relocation data for the `.text` (both new and old). - * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such as offset - to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section). - * `.bss` section for the new code (function) - * `.data` and `.data.read_mostly` section for the new and old code (function) - * `.rodata` section for the new and old code (function). + * ` xsplice_reloc_howto`: the howto properly construct trampolines for an patch. + We may have multiple locations for which we need to insert an trampoline for a + payload and each location might require a different way of handling it. + + * `xsplice_symbols `. The symbol that will be patched. + +In short the *.xsplice* sections (with `xsplice` being the top) represent +various structures to define the new code and safety checks for the old +code (optional). The ELF provides the mechanism to glue it all together when +loaded in memory. -In short the *.xsplice_* sections represent various structures and the -ELF provides the mechanism to glue it all together when loaded in memory. Note that a lot of these ideas are borrowed from kSplice which is available at: https://github.com/jirislaby/ksplice -For ELF understanding the best starting point is the OSDev Wiki -(http://wiki.osdev.org/ELF). Furthermore the ELF specification is -at http://www.skyfree.org/linux/references/ELF_Format.pdf and -at Oracle's web site: -http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scrolltoc +### struct xsplice -### ASCII art of the ELF structures +The top most structure is quite simple. It defines the name, the id +of the hypervisor, pointer to the new code and an pointer to +the old code (optional). -*TODO*: Include an ASCII art of how the sections are tied together. +The new code uses all of the `xsplice_*` structures while the +old code does not use the `xsplice_reloc` structures. -### xsplice_symbols +The sections defining the structures will explicitly state +when they are not used. -The section contains an array of an structure that outlines the name -of the symbol to be patched (or checked against). The structure is -as follow: +
+struct xsplice { + const char *name; /* A sensible name for the patch. Up to 40 characters. */ + const char *id; /* ID of the hypervisor this binary was built against. */ + struct xsplice_code *new; /* Pointer to the new code to be patched. */ + struct xsplice_code *old; /* Pointer to the old code to be checked against. */ + uint8_t pad[32]; /* Must be zero. */ +}; ++ +The size of this structure should be 64 bytes. + +### xsplice_code + +The structure embedded within this section ties the other +structures together. It has the pointers with an start and end +address for each set of structures. This means that an update +can be split in multiple changes - for example to accomodate +an update that contains both code and data and will need patching +in both .text and .data sections.
-struct xsplice_symbol { - const char *name; /* The ELF name of the symbol. */ - const char *label; /* A unique xSplice name for the symbol. */ - uint8_t pad[16]; /* Must be zero. */ -}; +struct xsplice_code { + struct xsplice_reloc *relocs, *relocs_end; /* How to patch it. */ + struct xsplice_section *sections, *sections_end; /* Safety data. */ + struct xsplice_patch *patches, *patches_end; /* Patch code and data */ + uint8_t pad[16]; /* Must be zero. */ +};-The structures may be in the section in any order and in any amount -(duplicate entries are permitted). -Both `name` and `label` would be pointing to entries in `.xsplice_str`. +The size of this structure is 64 bytes. -The `label` is used for diagnostic purposes - such as including the -name and the offset. +There can be at most two of those structures in the payload. +One for the new code and another for the old code (optional). -### xsplice_reloc and xsplice_reloc_howto +If it is for the old code the relocs, and relocs_end values will be ignored. -The section contains an array of a structure that outlines the different -locations (and howto) for which an trampoline is to be inserted. -The howto defines in the detail the change. It contains the type, -whether the relocation is relative, the size of the relocation, -bitmask for which parts of the instruction or data are to be replaced, -amount of final relocation is shifted by (to drop unwanted data), and -whether the replacement should be interpreted as signed value. +### xsplice_reloc -The structure is as follow: +The `xsplice_code` defines an array of these structures. As such +an singular structure defines an singular point where to patch the +hypervisor. -
-#define XSPLICE_HOWTO_RELOC_INLINE 0 /* Inline replacement. */ -#define XSPLICE_HOWTO_RELOC_PATCH 1 /* Add trampoline. */ -#define XSPLICE_HOWTO_RELOC_DATA 2 /* __DATE__ type change. */ -#define XSPLICE_HOWTO_RELOC_TIME 3 /* __TIME__ type chnage. */ -#define XSPLICE_HOWTO_BUG 4 /* BUG_ON being replaced.*/ -#define XSPLICE_HOWTO_EXTABLE 5 /* exception_table change. */ -#define XSPLICE_HOWTO_SYMBOL 6 /* change in symbol table. */ +The structure contains the address of the hypervisor (if known), +the symbol associated with this address, how the patching is to +be done, and platform specific details. -#define XSPLICE_HOWTO_FLAG_PC_REL 0x00000001 /* Is PC relative. */ -#define XSPLICE_HOWOT_FLAG_SIGN 0x00000002 /* Should the new value be treated as signed value. */ +The `isns_added` is an value to be used to compute the new offset +due to the quirks of the operands of the instruction. For example +to patch in an jump operation to the new code - the offset is relative +to the program counter of the next instruction - hence the offset +value has to be subtracted by four bytes - hence this would contain -4 . -struct xsplice_reloc_howto { - uint32_t type; /* XSPLICE_HOWTO_* */ - uint32_t flag; /* XSPLICE_HOWTO_FLAG_* */ - uint32_t size; /* Size, in bytes, of the item to be relocated. */ - uint32_t r_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */ - uint64_t mask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */ - uint8_t pad[8]; /* Must be zero. */ -}; +The `isns_target` is the offset against the symbol. -+The relation of this structure with `xsplice_patch` is 1:1, even +for inline patches. See the section detailing the structure +`xsplice_reloc_howto`. -This structure is used in: +The relation of this structure with `xsplice_section` is 1:1. + +This structure is as follow:
struct xsplice_reloc { uint64_t addr; /* The address of the relocation (if known). */ struct xsplice_symbol *symbol; /* Symbol for this relocation. */ + int64_t isns_target; /* rest of the ELF addend. This is equal to the offset against the symbol that the relocation refers to. */ struct xsplice_reloc_howto *howto; /* Pointer to the above structure. */ - uint64_t isns_added; /* ELF addend resulting from quirks of instruction one of whose operands is the relocation. For example, this is -4 on x86 pc-relative jumps. */ - uint64_t isns_target; /* rest of the ELF addend. This is equal to the offset against the symbol that the relocation refers to. */ - uint8_t pad[8]; /* Must be zero. */ + int64_t isns_added; /* ELF addend resulting from quirks of instruction one of whose operands is the relocation. For example, this is -4 on x86 pc-relative jumps. */ + uint8_t pad[24]; /* Must be zero. */ }; +-### xsplice_sections +The size of this structure is 64 bytes. + +### xsplice_section -The structure defined in this section is used to verify that it is safe -to update with the new changes. It can contain safety data on the old code +The structure defined in this section is used during pre-patching and +during patching. Pre-patching it is used to verify that it is safe +to update with the new changes - and contains safety data on the old code and what kind of matching we are to expect. -It also can contain safety date of what to check when about to patch. -That is whether any of the addresses (either provided or resolved -when payload is loaded by referencing the symbols) are in memory +That is whether the address (either provided or resolved when payload is +loaded by referencing the symbols) is: + + * in memory, + * correct size, + * in it's proper ELF section, + * has been already patched (or not), + * is expected not to be the CPU stack - (or it is OK for it be on the CPU stack). + with what we expect it to be. -As such the flags can be or-ed together: +Some of the checks can be relaxed, as such the `flag` values +can be or-ed together.
+ #define XSPLICE_SECTION_TEXT 0x00000001 /* Section is in .text */ -#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */ -#define XSPLICE_SECTION_DATA 0x00000004 /* Section is in .rodata */ +#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .rodata */ +#define XSPLICE_SECTION_DATA 0x00000004 /* Section is in .data */ #define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */ -#define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has .altinstructions. */ -#define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */ -#dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */ + +#define XSPLICE_SECTION_TEXT_INLINE 0x00000200 /* Change is to be inline. */ +#define XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */ #define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the stack. */ struct xsplice_section { struct xsplice_symbol *symbol; /* The symbol associated with this change. */ uint64_t address; /* The address of the section (if known). */ - uint64_t size; /* The size of the section. */ - uint64_t flags; /* Various XSPLICE_SECTION_* flags. */ - uint8_t pad[16]; /* To be zero. */ + uint32_t size; /* The size of the section. */ + uint32_t flags; /* Various XSPLICE_SECTION_* flags. */ + uint8_t pad[12]; /* To be zero. */ };-### xsplice_patches +The size of this structure is 32 bytes. -Within this section we have an array of a structure defining the new code (patch). +### xsplice_patch -This structure consist of an pointer to the new code (which in ELF ends up -pointing to an offset in `.text` or `.data` section); the type of patch: -inline - either text or data, or requesting an trampoline; and size of patch. +This structure has the binary code (or data) to be patched. Depending on the +type it can either an inline patch (data or text) or require an relocation +change (which requires an trampoline). Naturally it also points to a blob +of the binary data to patch in, and the size of the patch. -The structure is as follow: +The `addr` is used when the patch is for inline change. If it is an relocation +(requiring an trampoline), the `addr` should be zero. + +There must be an corresponding ` struct xsplice_reloc` and +`struct xsplice_section` describing this patch.
-#define XSPLICE_PATCH_INLINE_TEXT 0 -#define XSPLICE_PATCH_INLINE_DATA 1 -#define XSPLICE_PATCH_RELOC_TEXT 2 +#define XSPLICE_PATCH_INLINE_TEXT 0x1 +#define XSPLICE_PATCH_INLINE_DATA 0x2 +#define XSPLICE_PATCH_RELOC_TEXT 0x3 struct xsplice_patch { uint32_t type; /* XSPLICE_PATCH_* .*/ uint32_t size; /* Size of patch. */ - uint64_t addr; /* The address of the new code (or data). */ + uint64_t addr; /* The address of the inline new code (or data). */ void *content; /* The bytes to be installed. */ - uint8_t pad[16]; /* Must be zero. */ + uint8_t pad[40]; /* Must be zero. */ };-### xsplice_code +The size of this structure is 64 bytes. -The structure embedded within this section ties it all together. -It has the name of the patch, and pointers to all the above -mentioned structures (the start and end addresses). +### xsplice_symbols + +The structure contains an pointer to the name of the ELF symbol +to be patched and as well an unique name for the symbol. + +The `label` is used for diagnostic purposes - such as including the +name and the offset. The structure is as follow:
-struct xsplice_code { - const char *name; /* A sensible name for the patch. Up to 40 characters. */ - struct xsplice_reloc *relocs, *relocs_end; /* How to patch it */ - struct xsplice_section *sections, *sections_end; /* Safety data */ - struct xsplice_patch *patches, *patches_end; /* Patch code & data */ - uint8_t pad[32]; /* Must be zero. */ -}; +struct xsplice_symbol { + const char *name; /* The ELF name of the symbol. */ + const char *label; /* A unique xSplice name for the symbol. */ + uint8_t pad[16]; /* Must be zero. */ +};-There should only be one such structure in the section. +The size of this structure is 32 bytes. + + +### xsplice_reloc_howto + +The howto defines in the detail the change. It contains the type, +whether the relocation is relative, the size of the relocation, +bitmask for which parts of the instruction or data are to be replaced, +amount the final relocation is shifted by (to drop unwanted data), and +whether the replacement should be interpreted as signed value. + +The structure is as follow: + +
+#define XSPLICE_HOWTO_RELOC_INLINE 0x1 /* It is an inline replacement. */ +#define XSPLICE_HOWTO_RELOC_PATCH 0x2 /* Add an trampoline. */ + +#define XSPLICE_HOWTO_FLAG_PC_REL 0x1 /* Is PC relative. */ +#define XSPLICE_HOWOT_FLAG_SIGN 0x2 /* Should the new value be treated as signed value. */ + +struct xsplice_reloc_howto { + uint32_t type; /* XSPLICE_HOWTO_* */ + uint32_t flag; /* XSPLICE_HOWTO_FLAG_* */ + uint32_t size; /* Size, in bytes, of the item to be relocated. */ + uint32_t r_shift; /* The value the final relocation is shifted right by; used to drop unwanted data from the relocation. */ + uint64_t mask; /* Bitmask for which parts of the instruction or data are replaced with the relocated value. */ + uint8_t pad[8]; /* Must be zero. */ +}; + ++ +The size of this structure is 32 bytes. ### Example -*TODO*: Include an objdump of how the ELF would look like for the XSA -mentioned earlier. +There is a wealth of information that the payload must have to define a simple +patch. For this example we will assume that the hypervisor has not been compiled +with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill structures +for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure +in `xen_version` hypercall. This function is not called **anywhere** in +the hypervisor (it is called by the guest) but referenced in the +`compat_hypercall_table` and `hypercall_table` (and indirectly called +from that). There are two ways to patch this: +inline patch `hvm_hypercall64_table` and `hvm_hypercall` with a new +address for the new `do_xen_version` , or insert +trampoline in `do_xen_version` code. The example will focus on the later. + +The `do_xen_version` code is located at virtual address ffff82d080112f9e. + +
+struct xsplice_code xsplice_xsa125; +struct xsplice_reloc relocs[1]; +struct xsplice_section sections[1]; +struct xsplice_patch patches[1]; +struct xsplice_symbol do_xen_version_symbol; +struct xsplice_reloc_howto do_xen_version_howto; +char do_xen_version_new_code[1728]; + +#ifndef HYPERVISOR_ID +#define HYPERVISOR_ID "92dd05a61556c554155b1508c9cf67d993336d28" +#endif + +struct xsplice xsa125 = { + .name = "xsa125", + .id = HYPERVISOR_ID, + .old = NULL, + .new = &xsplice_xsa125, +}; + +struct xsplice_code xsplice_xsa125 = { + .relocs = &relocs[0], + .relocs_end = &relocs[0], + .sections = §ions[0], + .sections_end = §ions[0], + .patches = &patches[0], + .patches_end = &patches[0], +}; + +struct xsplice_reloc relocs[1] = { + { + .addr = 0xffff82d080112f9e, + .symbol = &do_xen_version_symbol, + .isns_target = 0, + .howto = &do_xen_version_howto, + .isns_added = -4, + }, +}; + +struct xsplice_symbol do_xen_version_symbol = { + .name = "do_xen_version", + .label = "do_xen_version+<0x0>", +}; + +struct xsplice_reloc_howto do_xen_version_howto = { + .type = XSPLICE_HOWTO_RELOC_PATCH, + .flag = XSPLICE_HOWTO_FLAG_PC_REL, + .r_shift = 0, + .mask = (-1ULL), +}; + + +struct xsplice_section sections[1] = { + { + .symbol = &do_xen_version_symbol, + .address = 0xffff82d080112f9e, + .size = 1728, + .flags = XSPLICE_SECTION_TEXT, + }, +}; + +struct xsplice_patch patches[1] = { + { + .type = XSPLICE_PATCH_RELOC_TEXT, + .size = 1728, + .addr = 0, + .content = &do_xen_version_new_code, + }, +}; + +char do_xen_version_new_code[1728] = { 0x83, 0xff, 0x09, /* And more code. */}; ++ ## Signature checking requirements. @@ -497,7 +677,7 @@ There are to be four sub-operations: * getting an particular payload summary and its state. * command to apply, delete, or revert the payload. -The patching is asynchronous therefore the caller is responsible +The actions are asynchronous therefore the caller is responsible to verify that it has been applied properly by retrieving the summary of it and verifying that there are no error codes associated with the payload. @@ -506,6 +686,8 @@ every physical CPU to be lock-step with each other. The patching mechanism while an implementation detail, is not an short operation and as such the design **MUST** assume it will be an long-running operation. +The sub-operations will spell out how preemption is to be handled (if at all). + Furthermore it is possible to have multiple different payloads for the same function. As such an unique id has to be visible to allow proper manipulation. @@ -523,7 +705,6 @@ struct xen_sysctl_xsplice_op { while the rest of hypercall specific structures are part of the this structure. - ### XEN_SYSCTL_XSPLICE_UPLOAD (0) Upload a payload to the hypervisor. The payload is verified and if there @@ -541,6 +722,10 @@ Duplicate `id` are not supported. The `payload` is the ELF payload as mentioned in the `Payload format` section. +This operation can be preempted by the hypercall returning EAGAIN. +This is due to the nature of signature verification - which may require +SecureBoot firmware calls which are unbounded. + The structure is as follow:
@@ -557,7 +742,6 @@ Retrieve an summary of an specific payload. This caller provides: * `id` the unique id. * `status` *MUST* be set to zero. - * `rc` *MUST* be set to zero. The `summary` structure contains an summary of payload which includes: @@ -568,8 +752,10 @@ The `summary` structure contains an summary of payload which includes: 3. *XSPLICE_STATUS_CHECKED* (2) the ELF payload safety checks passed. 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted. - 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details. - * `rc` - its error state if any. + 6. Negative values is an error. The error would be of EXX format. + +The return value is zero on success and EXX on failure. This operation +is synchronous and does not require preemption. The structure is as follow: @@ -579,12 +765,10 @@ The structure is as follow: #define XSPLICE_STATUS_CHECKED 2 #define XSPLICE_STATUS_APPLIED 3 #define XSPLICE_STATUS_REVERTED 4 -#define XSPLICE_STATUS_IN_ERROR 5 struct xen_sysctl_xsplice_summary { char id[40]; /* IN/OUT, name of the patch. */ - uint32_t status; /* OUT */ - int32_t rc; /* OUT */ + int32_t status; /* OUT */ };@@ -595,37 +779,44 @@ hypervisor. The caller provides: + * `version`. Initially it *MUST* be zero. * `idx` index iterator. Initially it *MUST* be zero. * `count` the max number of entries to populate. * `summary` virtual address of where to write payload summaries. The hypercall returns zero on success and updates the `idx` (index) iterator with the number of payloads returned, `count` to the number of remaining -payloads, and `summary` with an number of payload summaries. +payloads, and `summary` with an number of payload summaries. The `version` +is updated on every hypercall - if it varies from one hypercall to another +the data is stale and further calls could fail. If the hypercall returns E2BIG the `count` is too big and should be lowered. Note that due to the asynchronous nature of hypercalls the domain might have added or removed the number of payloads making this information stale. It is -the responsibility of the domain to provide proper accounting. +the responsibility of the toolstack to use the `version` field to check +between each invocation. + +This operation is synchronous and does not require preemption. The `summary` structure contains an summary of payload which includes: + * `version` version of the data. * `id` unique id. * `status` - whether it has been: 1. *XSPLICE_STATUS_LOADED* (0) has been loaded. 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the **XEN_SYSCTL_XSPLICE_ACTION** command. - 3. *XSPLICE_STATUS_CHECKED* (2) the payload `old` and `addr` match with the hypervisor. + 3. *XSPLICE_STATUS_CHECKED* (2) the ELF payload safety checks passed. 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also reverted. - 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult `rc` for details. - * `rc` - its error state if any. + 6. Any negative values means there has been error. The value is in EXX format. The structure is as follow:
struct xen_sysctl_xsplice_list { + uint32_t version; /* OUT */ uint32_t idx; /* IN/OUT */ uint32_t count; /* IN/OUT */ XEN_GUEST_HANDLE_64(xen_sysctl_xsplice_summary) summary; /* OUT */ @@ -633,8 +824,7 @@ struct xen_sysctl_xsplice_list { struct xen_sysctl_xsplice_summary { char id[40]; /* OUT, name of the patch. */ - uint32_t status; /* OUT */ - int32_t rc; /* OUT */ + int32_t status; /* OUT */ };@@ -644,33 +834,94 @@ Perform an operation on the payload structure referenced by the `id` field. The operation request is asynchronous and the status should be retrieved by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** hypercall. -The caller provides: +There are two ways about doing preemption. Either via returning back EBUSY +or the mechanism outlined here. + +Doing it in userland would remove any tracking of states in +the hypervisor - except the simple commands apply, unload, and revert. + +However we would not be able to patch all the code that is invoked while +this hypercall is in progress. That is - the do_domctl, the spinlocks, +anything put on the stack, etc. + +The disadvantage of the mechanism outlined here is that the hypervisor +code has to keep the state atomic and have an upper bound of time +on actions. If within the time the operation does not succeed the +operation would go in error state. * `id` the unique id. + * `time` the upper bound of time the cmd should take. Zero means infinite. * `cmd` the command requested: - 1. *XSPLICE_ACTION_CHECK* (0) check that the payload will apply properly. - 2. *XSPLICE_ACTION_UNLOAD* (1) unload the payload. - 3. *XSPLICE_ACTION_REVERT* (2) revert the payload. - 4. *XSPLICE_ACTION_APPLY* (3) apply the payload. - + 1. *XSPLICE_ACTION_CHECK* (1) check that the payload will apply properly. + 2. *XSPLICE_ACTION_UNLOAD* (2) unload the payload. + Any further hypercalls against the `id` will result in failure unless + **XEN_SYSCTL_XSPLICE_UPLOAD** hypercall is perfomed with same `id`. + 3. *XSPLICE_ACTION_REVERT* (3) revert the payload. If the operation takes + more time than the upper bound of time the `status` will EBUSY. + 4. *XSPLICE_ACTION_APPLY* (4) apply the payload. If the operation takes + more time than the upper bound of time the `status` will be EBUSY. + 5. *XSPLICE_ACTION_LOADED* is an initial state and cannot be requested. The return value will be zero unless the provided fields are incorrect. The structure is as follow:
-#define XSPLICE_ACTION_CHECK 0 -#define XSPLICE_ACTION_UNLOAD 1 -#define XSPLICE_ACTION_REVERT 2 -#define XSPLICE_ACTION_APPLY 3 +#define XSPLICE_ACTION_LOADED 0 +#define XSPLICE_ACTION_CHECK 1 +#define XSPLICE_ACTION_UNLOAD 2 +#define XSPLICE_ACTION_REVERT 3 +#define XSPLICE_ACTION_APPLY 4 struct xen_sysctl_xsplice_action { char id[40]; /* IN, name of the patch. */ + uint64_t time; /* IN, upper bound of time (ms) for the operation to take. */ uint32_t cmd; /* IN */ };+## State diagrams of XSPLICE_ACTION values. + +There is a strict ordering state of what the commands can be. +The XSPLICE_ACTION prefix has been dropped to easy reading: + +
+ /->\ + \ / + /-------< CHECK <--------\ + | | | + | + / + | +--->UNLOAD<--\ / + | / \ / + | / \/ + /-> APPLY -----------> REVERT --\ + | | + \-------------------------------/ + ++Or an state transition table of valid states: +
++-------+-------+--------+--------+---------+-------+------------------+ +| CHECK | APPLY | REVERT | UNLOAD | Current | Next | Result | ++-------+-------+--------+--------+---------+-------+------------------+ +| x | | | | LOADED | CHECK | Check payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| x | | | | CHECK | CHECK | Check payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| | x | | | CHECK | APPLY | Apply payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| | | | x | CHECK | UNLOAD| Unload payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| | | x | | APPLY | REVERT| Revert payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| | | | x | APPLY | UNLOAD| unload payload. | ++-------+-------+--------+--------+---------+-------+------------------+ +| | x | | | REVERT | APPLY | Apply payload. | ++-------+-------+--------+--------+---------+-------+------------------+ ++All the other states are invalid. + ## Sequence of events. The normal sequence of events is to: @@ -701,14 +952,8 @@ expand the code to place a most generic code in place - emit a special ELF .section header to tag this location. During run-time the hypervisor can leave the areas alone or patch them with an better suited opcodes. -As we might be patching the alternative assembler sections as well - by -providing a new better suited op-codes or perhaps with nops - we need to -also re-run the alternative assembler patching after we have done our -patching. - -Also when we are doing safety checks the code we are checking might be -utilizing alternative assembler. As such we should relax out checks to -accomodate that. +However these sections are part of .init. and as such can't reasonably be +subject to patching. ### .rodata sections @@ -735,7 +980,7 @@ If are doing trampoline patching we would need to: offset to the string. * mark the region RO when we are done. -### .bss sections +### .bss and .data sections. Patching writable data is not suitable as it is unclear what should be done depending on the current state of data. As such it should not be attempted. @@ -756,13 +1001,131 @@ a very small footprint. However if we need - we can always add two trampolines. One at the 2GB limit that calls the next trampoline. -### Time rendezvous code instead of stop_machine for patching +### When to patch + +During the discussion on the design two candidates bubbled where +the call stack for each CPU would be deterministic. This would +minimize the chance of the patch not being applied due to safety +checks failing. + +#### Rendezvous code instead of stop_machine for patching The hypervisor's time rendezvous code runs synchronously across all CPUs every second. Using the stop_machine to patch can stall the time rendezvous code and result in NMI. As such having the patching be done at the tail of rendezvous code should avoid this problem. +#### Before entering the guest code. + +Before we call VMXResume we check whether any soft IRQs need to be executed. +This is a good spot because all Xen stacks are effectively empty at +that point. + +To randezvous all the CPUs an barrier with an maximum timeout (which +could be adjusted), combined with forcing all other CPUs through the +hypervisor with IPIs, can be utilized to have all the CPUs be lockstep. + +The approach is similar in concept to stop_machine and the time rendezvous +but is time-bound. + +### Compiling the hypervisor code + +Hotpatch generation often requires support for compiling the target +with -ffunction-sections / -fdata-sections. Changes would have to +be done to the linker scripts to support this. + + +### Generation of xSplice ELF payloads + +The design of that is not discussed in this design. + +The author of this design envisions objdump and objcopy along +with special GCC parameters (see above) to create .o.xsplice files +which can be used to splice an ELF with the new payload. + +### Exception tables and symbol tables growth + +We may need support for adapting or augmenting exception tables if +patching such code. Hotpatches may need to bring their own small +exception tables (similar to how Linux modules support this). + +If supporting hotpatches that introduce additional exception-locations +is not important, one could also change the exception table in-place +and reorder it afterwards. + + +### xSplice interdependencies + +xSplice patches interdependencies are tricky. + +There are the ways this can be addressed: + * A single large patch that subsumes and replaces all previous ones. + Over the life-time of patching the hypervisor this large patch + grows to accumulate all the code changes. + * Hotpatch stack - where an mechanism exists that loads the hotpatches + in the same order they were built in. We would need an build-id + of the hypevisor to make sure the hot-patches are build against the + correct build. + * Payload containing the old code to check against that. That allows + the hotpatches to be loaded indepedently (if they don't overlap) - or + if the old code also containst previously patched code - even if they + overlap. + +The disadvantage of the first large patch is that it can grow over +time and not provide an bisection mechanism to identify faulty patches. + +The hot-patch stack puts stricts requirements on the order of the patches +being loaded and requires an hypervisor build-id to match against. + +The old code allows much more flexibility and an additional guard, +but is more complex to implement. + +### Hypervisor ID (buid-id) + +The build-id can help with: + + * Prevent loading of wrong hotpatches (intended for other builds) + + * Allow to identify suitable hotpatches on disk and help with runtime + tooling (if laid out using build ID) + +The build-id (aka hypervisor id) can be easily obtained by utilizing +the ld --build-id operatin which (copied from ld): + +
+--build-id + --build-id=style + Request creation of ".note.gnu.build-id" ELF note section. The contents of the note are unique bits identifying this + linked file. style can be "uuid" to use 128 random bits, "sha1" to use a 160-bit SHA1 hash on the normative parts of the + output contents, "md5" to use a 128-bit MD5 hash on the normative parts of the output contents, or "0xhexstring" to use a + chosen bit string specified as an even number of hexadecimal digits ("-" and ":" characters between digit pairs are + ignored). If style is omitted, "sha1" is used. + + The "md5" and "sha1" styles produces an identifier that is always the same in an identical output file, but will be + unique among all nonidentical output files. It is not intended to be compared as a checksum for the file's contents. A + linked file may be changed later by other tools, but the build ID bit string identifying the original linked file does + not change. + + Passing "none" for style disables the setting from any "--build-id" options earlier on the command line. + ++ +### Symbol names + + +Xen as it is now, has a couple of non-unique symbol names which will +make runtime symbol identification hard. Sometimes, static symbols +simply have the same name in C files, sometimes such symbols get +included via header files, and some C files are also compiled +multiple times and linked under different names (guest_walk.c). + +As such we need to modify the linker to make sure that the symbol +table qualifies also symbols by their source file name. + +For the awkward situations in which C-files are compiled multiple +times patches we would need to some modification in the Xen code. + ### Security Only the privileged domain should be allowed to do this operation. +