Xen project Mailing List

[Xen-devel] [RFC PATCH v2 00/17] RFC: SGX Virtualization design and draft patches

Hi all, This is the v2 of RFC SGX Virtualization design and draft patches, you can find v1 at: https://lists.gt.net/xen/devel/483404 In the new version, I fix a few things according to the feedbacks for previous version(mostly are cleanups and code movement). Besides, Kai and I redesign the SGX MSRs setting up part and introduce new XL parameter 'lehash' and 'lewr'. Another big change is that I modify the EPC management to fit EPC pages in 'struct page_info', and in patch #6 and #7, unscrubbable pages, 'PGC_epc', 'MEMF_epc' and 'XENZONE_EPC' are introduced, so that EPC management is fully integrated into existing memory management of xen. This might be the controversial bit, so patch 6~8 are simply to show the idea and drive deep discussion. Detailed changes since v1: (modifications with tag "[New]" is totally new in this series, reviews and comments are highly welcome for those parts) * Make SGX related mostly common for x86 by: 1) moving sgx.[ch] to arch/x86/ and include/asm-x86/ and 2) renaming EPC related functions with domain_* prefix. * Rename ioremap_cache() with ioremap_wb() and make it x86-specific as suggested by Jan Beulich. * Remove percpu sgx_cpudata, during bootup secondary CPUs now check whether they read different value than boot CPU, if so SGX is disabled. * Remove domain_has_sgx_{,launch_control}, and make sure we can rely on domain's arch.cpuid->feat.sgx{_lc} for setting checks. * Cleanup the code for CPUID handling as suggested by Andrew Cooper. * Adjust to msr_policy framework for SGX MSRs handling, and remove unnecessary fields like 'readable' and 'writable' * Use 'page_info' to maintain EPC pages, and [NEW] add an draft implementation for employing xenheap for EPC page management. Please see patch 6~8 * [New] Modify the XL parameter for SGX, please see section 2.1.1 in the updated design doc. * [New] Use _set_vcpu_msrs hypercall in the toolstack to set the SGX related. Please see patch #17. * ACPI related tool changes are temporarily dropped in this patchset, as I need more time to resolve the comments and do related tests. And the update design doc is as follow, as the previous version in the design there are some particualr points that we don't know which implementation is better. For those a question mark (?) is added at the right of the menu. And for SGX live migration, thanks to Wei Liu for providing comments that it's nice to support if we can in previous version review, but we'd like hear more from you guys so we still put a question mark fot this item. Your comments on those "question mark (?)" parts (and other comments as well, of course) are highly appreciated. =================================================================== 1. SGX Introduction 1.1 Overview 1.1.1 Enclave 1.1.2 EPC (Enclave Paage Cache) 1.1.3 ENCLS and ENCLU 1.2 Discovering SGX Capability 1.2.1 Enumerate SGX via CPUID 1.2.2 Intel SGX Opt-in Configuration 1.3 Enclave Life Cycle 1.3.1 Constructing & Destroying Enclave 1.3.2 Enclave Entry and Exit 1.3.2.1 Synchonous Entry and Exit 1.3.2.2 Asynchounous Enclave Exit 1.3.3 EPC Eviction and Reload 1.4 SGX Launch Control 1.5 SGX Interaction with IA32 and IA64 Architecture 2. SGX Virtualization Design 2.1 High Level Toolstack Changes 2.1.1 New 'sgx' XL configure file parameter 2.1.2 New XL commands (?) 2.1.3 Notify domain's virtual EPC base and size to Xen 2.2 High Level Hypervisor Changes 2.2.1 EPC Management 2.2.2 EPC Virtualization 2.2.3 Populate EPC for Guest 2.2.4 Launch Control Support 2.2.5 CPUID Emulation 2.2.6 EPT Violation & ENCLS Trapping Handling 2.2.7 Guest Suspend & Resume 2.2.8 Destroying Domain 2.3 Additional Point: Live Migration, Snapshot Support (?) 3. Reference 1. SGX Introduction 1.1 Overview 1.1.1 Enclave Intel Software Guard Extensions (SGX) is a set of instructions and mechanisms for memory accesses in order to provide security accesses for sensitive applications and data. SGX allows an application to use it's pariticular address space as an *enclave*, which is a protected area provides confidentiality and integrity even in the presence of privileged malware. Accesses to the enclave memory area from any software not resident in the enclave are prevented, including those from privileged software. Below diagram illustrates the presence of Enclave in application. |-----------------------| | | | |---------------| | | | OS kernel | | |-----------------------| | |---------------| | | | | | | | | |---------------| | | |---------------| | | | Entry table | | | | Enclave |---|-----> | |---------------| | | |---------------| | | | Enclave stack | | | | App code | | | |---------------| | | |---------------| | | | Enclave heap | | | | Enclave | | | |---------------| | | |---------------| | | | Enclave code | | | | App code | | | |---------------| | | |---------------| | | | | | | |-----------------------| |-----------------------| SGX supports SGX1 and SGX2 extensions. SGX1 provides basic enclave support, and SGX2 allows additional flexibility in runtime management of enclave resources and thread execution within an enclave. 1.1.2 EPC (Enclave Page Cache) Just like normal application memory management, enclave memory management can be devided into two parts: address space allocation and memory commitment. Address space allocation is allocating particular range of linear address space for enclave. Memory commitment is assigning actual resource for the enclave. Enclave Page Cache (EPC) is the physical resource used to commit to enclave. EPC is divided to 4K pages. An EPC page is 4K in size and always aligned to 4K boundary. Hardware performs additional access control checks to restrict access to the EPC page. The Enclave Page Cache Map (EPCM) is a secure structure which holds one entry for each EPC page, and is used by hardware to track the status of each EPC page (invisibe to software). Typically EPC and EPCM are reserved by BIOS as Processor Reserved Memory but the actual amount, size, and layout of EPC are model-specific, and dependent on BIOS settings. EPC is enumerated via new SGX CPUID, and is reported as reserved memory. EPC pages can either be invalid or valid. There are 4 valid EPC types in SGX1: regular EPC page, SGX Enclave Control Structure (SECS) page, Thread Control Structure (TCS) page, and Version Array (VA) page. SGX2 adds Trimmed EPC page. Each enclave is associated with one SECS page. Each thread in enclave is associated with one TCS page. VA page is used in EPC page eviction and reload. Trimmed EPC page is introduced in SGX2 when particular 4K page in enclave is going to be freed (trimmed) at runtime after enclave is initialized. 1.1.3 ENCLS and ENCLU Two new instructions ENCLS and ENCLU are introduced to manage enclave and EPC. ENCLS can only run in ring 0, while ENCLU can only run in ring 3. Both ENCLS and ENCLU have multiple leaf functions, with EAX indicating the specific leaf function. SGX1 supports below ENCLS and ENCLU leaves: ENCLS: - ECREATE, EADD, EEXTEND, EINIT, EREMOVE (Enclave build and destroy) - EPA, EBLOCK, ETRACK, EWB, ELDU/ELDB (EPC eviction & reload) ENCLU: - EENTER, EEXIT, ERESUME (Enclave entry, exit, re-enter) - EGETKEY, EREPORT (SGX key derivation, attestation) Additionally, SGX2 supports below ENCLS and ENCLU leaves for runtime add/remove EPC page to enclave after enclave is initialized, along with permission change. ENCLS: - EAUG, EMODT, EMODPR ENCLU: - EACCEPT, EACCEPTCOPY, EMODPE VMM is able to interfere with ENCLS running in guest (see 1.2.x SGX interaction with VMX) but is unable to interfere with ENCLU. 1.2 Discovering SGX Capability 1.2.1 Enumerate SGX via CPUID If CPUID.0x7.0:EBX.SGX (bit 2) is 1, then processor supports SGX and SGX capability and resource can be enumerated via new SGX CPUID (0x12). CPUID.0x12.0x0 reports SGX capability, such as the presence of SGX1, SGX2, enclave's maximum size for both 32-bit and 64-bit application. CPUID.0x12.0x1 reports the availability of bits that can be set for SECS.ATTRIBUTES. CPUID.0x12.0x2 reports the EPC resource's base and size. Platform may support multiple EPC sections, and CPUID.0x12.0x3 and further sub-leaves can be used to detect the existence of multiple EPC sections (until CPUID reports invalid EPC). Refer to 37.7.2 Intel SGX Resource Enumeration Leaves for full description of SGX CPUID 0x12. 1.2.2 Intel SGX Opt-in Configuration On processors that support Intel SGX, IA32_FEATURE_CONTROL also provides the SGX_ENABLE bit (bit 18) to turn on/off SGX. Before system software can enable and use SGX, BIOS is required to set IA32_FEATURE_CONTROL.SGX_ENABLE = 1 to opt-in SGX. Setting SGX_ENABLE follows the rules of IA32_FEATURE_CONTROL.LOCK (bit 0). Software is considered to have opted into Intel SGX if and only if IA32_FEATURE_CONTROL.SGX_ENABLE and IA32_FEATURE_CONTROL.LOCK are set to 1. The setting of IA32_FEATURE_CONTROL.SGX_ENABLE (bit 18) is not reflected by SGX CPUID. Enclave instructions will behavior differently according to value of CPUID.0x7.0x0:EBX.SGX and whether BIOS has opted-in SGX. Refer to 37.7.1 Intel SGX Opt-in Configuration for more information. 1.3 Enclave Life Cycle 1.3.1 Constructing & Destroying Enclave Enclave is created via ENCLS[ECREATE] leaf by previleged software. Basically ECREATE converts an invalid EPC page into SECS page, according to a source SECS structure resides in normal memory. The source SECS contains enclave's info such as base (linear) address, size, enclave attributes, enclave's measurement, etc. After ECREATE, for each 4K linear address space page, priviledged software uses EADD and EEXTEND to add one EPC page to it. Enclave code/data (resides in normal memory) is loaded to enclave during EADD for enclave's each 4K page. After all EPC pages are added to enclave, priviledged software calls EINIT to initialize the enclave, and then enclave is ready to run. During enclave is constructed, enclave measurement, which is a SHA256 hash value, is also built according to enclave's size, code/data itself and its location in enclave, etc. The measurement can be used to uniquely identify the enclave. SIGSTRUCT in EINIT leaf also contains the measurement specified by untrusted software, via MRENCLAVE. EINIT will check the two measurements and will only succeed when the two matches. Enclave is destroyed by running EREMOVE for all Enclave's EPC page, and then for enclave's SECS. EREMOVE will report SGX_CHILD_PRESENT error if it is called for SECS when there's still regular EPC pages that haven't been removed from enclave. Please refer to SDM chapter 39.1 Constructing an Enclave for more infomation. 1.3.2 Enclave Entry and Exit 1.3.2.1 Synchonous Entry and Exit After enclave is constructed, non-priviledged software use ENCLU[EENTER] to enter enclave to run. While process runs in enclave, non-priviledged software can use ENCLU[EEXIT] to exit from enclave and return to normal mode. 1.3.2.2 Asynchounous Enclave Exit Asynchronous and synchronous events, such as exceptions, interrupts, traps, SMIs, and VM exits may occur while executing inside an enclave. These events are referred to as Enclave Exiting Events (EEE). Upon an EEE, the processor state is securely saved inside the enclave and then replaced by a synthetic state to prevent leakage of secrets. The process of securely saving state and establishing the synthetic state is called an Asynchronous Enclave Exit (AEX). After AEX, non-priviledged software uses ENCLU[ERESUME] to re-enter enclave. The SGX userspace software maintains a small piece of code (resides in normal memory) which basically calls ERESUME to re-enter enclave. The address of this piece of code is called Asynchronous Exit Pointer (AEP). AEP is specified as parameter in EENTER and will be kept internally in enclave. Upon AEX, AEP will be pushed to stack and upon returning from EEE handling, such as IRET, AEP will be loaded to RIP and ERESUME will be called subsequently to re-enter enclave. During AEX the processor will do context saving and restore automatically therefore no change to interrupt handling of OS kernel and VMM is required. It is SGX userspace software's responsibility to setup AEP correctly. Please refer to SDM chapter 39.2 Enclave Entry and Exit for more infomation. 1.3.3 EPC Eviction and Reload SGX also allows priviledged software to evict any EPC pages that are used by enclave. The idea is the same as normal memory swapping. Below is the detail info of how to evict EPC pages. Below is the sequence to evict regular EPC page: 1) Select one or multiple regular EPC pages from one enclave 2) Remove EPT/PT mapping for selected EPC pages 3) Send IPIs to remote CPUs to flush TLB of selected EPC pages 4) EBLOCK on selected EPC pages 5) ETRACK on enclave's SECS page 6) allocate one available slot (8-byte) in VA page 7) EWB on selected EPC pages With EWB taking: - VA slot, to restore eviction version info. - one normal 4K page in memory, to store encrypted content of EPC page. - one struct PCMD in memory, to store meta data. (VA slot is a 8-byte slot in VA page, which is a particualr EPC page.) And below is the sequence to evict an SECS page or VA page: 1) locate SECS (or VA) page 2) remove EPT/PT mapping for SECS (or VA) page 3) Send IPIs to remote CPUs 6) allocate one available slot (8-byte) in VA page 4) EWB on SECS (or) page And for evicting SECS page, all regular EPC pages that belongs to that SECS must be evicted out prior, otherwise EWB returns SGX_CHILD_PRESENT error. And to reload an EPC page: 1) ELDU/ELDB on EPC page 2) setup EPT/PT mapping With ELDU/ELDB taking: - location of SECS page - linear address of enclave's 4K page (that we are going to reload to) - VA slot (used in EWB) - 4K page in memory (used in EWB) - struct PCMD in memory (used in EWB) Please refer to SDM chapter 39.5 EPC and Management of EPC pages for more information. 1.4 SGX Launch Control SGX requires running "Launch Enclave" (LE) before running any other enclaves. This is because LE is the only enclave that does not requires EINITTOKEN in EINIT. Running any other enclave requires a valid EINITTOKEN, which contains MAC of the (first 192 bytes) EINITTOKEN calculated by EINITTOKEN key. EINIT will verify the MAC via internally deriving the EINITTOKEN key, and only the EINITTOKEN that has matched MAC will be accepted by EINIT. The EINITTOKEN key derivation depends on some info from LE. The typical process is LE generates EINITTOKEN for other enclave according to LE itself and the target enclave, and calcualtes the MAC by using ENCLU[EGETKEY] to get the EINITTOKEN key. Only LE is able to get the EINITTOKEN key. Running LE requies the SHA256 hash of LE signer's RSA public key (SHA256 of sigstruct->modulus) to equal to IA32_SGXLEPUBKEYHASH[0-3] MSRs (the 4 MSRs together makes up 256-bit SHA256 hash value). If CPUID.0x7.0x0:EBX.SGX and CPUID.0x7.0x0:ECX.SGX_LAUNCH_CONTROL[bit 30] is set, then IA32_FEATURE_CONTROL is available, and IA32_FEATURE_CONTROL MSR has SGX_LAUNCH_CONTROL_ENABLE bit (bit 17) available. 1-setting of SGX_LAUNCH_CONTROL_ENABLE bit enables runtime change of IA32_SGXLEPUBKEYHASHn after IA32_FEATURE_CONTROL is locked. Otherwise, IA32_SGXLEPUBKEYHASHn are read-only after IA32_FEATURE_CONTROL is locked. After reset, IA32_SGXLEPUBKEYHASHn will be set to hash of Intel's default key. On system that has only CPUID.0x7.0x0:EBX.SGX set, IA32_SGXLEPUBKEYHASHn are not available. On such system EINIT will always treat IA32_SGXLEPUBKEYHASHn as Intel's default value thus only Intel's LE is able to run. On system with IA32_SGXLEPUBKEYHASHn available, it is BIOS's implementation to decide whether to provide configurations to user to set IA32_SGXLEPUBKEYHASHn in *locked* (IA32_SGXLEPUBKEYHASHn are read-only after IA32_FEATURE_CONTROL is locked) or *unlocked* mode (IA32_SGXLEPUBKEYHASHn are writable to kernel at runtime). Also BIOS may or may not provide configurations to allow user to set custom value of IA32_SGXLEPUBKEYHASHn. 1.5 SGX Interaction with IA32 and IA64 Architecture SDM Chapter 42 describes SGX interaction with various features in IA32 and IA64 architecture. Below outlines the major ones. Refer to Chapter 42 for full description of SGX interaction with various IA32 and IA64 features. 1.5.1 VMX Changes for Supporting SGX Virtualization A new 64-bit ENCLS-exiting bitmap control field is added to VMCS (encoding 0202EH) to control VMEXIT on ENCLS leaf functions. And a new "Enable ENCLS exiting" control bit (bit 15) is defined in secondary processor based vm execution control. 1-Setting of "Enable ENCLS exiting" enables ENCLS-exiting bitmap control. ENCLS-exiting bitmap controls which ENCLS leaves will trigger VMEXIT. Additionally two new bits are added to indicate whether VMEXIT (any) is from enclave. Below two bits will be set if VMEXIT is from enclave: - Bit 27 in the Exit reason filed of Basic VM-exit information. - Bit 4 in the Interruptibility State of Guest Non-Register State of VMCS. Refer to 42.5 Interactions with VMX, 27.2.1 Basic VM-Exit Information, and 27.3.4 Saving Non-Register. 1.5.2 Interaction with XSAVE SGX defines a sub-field called X-Feature Request Mask (XFRM) in the attributes field of SECS. On enclave entry, SGX HW verifies XFRM in SECS.ATTRIBUTES are already enabled in XCR0. Upon AEX, SGX saves the processor extended state and miscellaneous state to enclave's state-save area (SSA), and clear the secrets from processor extended state that is used by enclave (from leaking secrets). Refer to 42.7 Interaction with Processor Extended State and Miscellaneous State 1.5.3 Interaction with S state When processor goes into S3-S5 state, EPC is destroyed, thus all enclaves are destroyed as well consequently. Refer to 42.14 Interaction with S States. 2. SGX Virtualization Design 2.1 High Level Toolstack Changes: 2.1.1 New 'sgx' XL configure file parameter EPC is limited resource. In order to use EPC efficiently among all domains, when creating guest, administrator should be able to specify domain's virtual EPC size. And admin alao should be able to get all domain's virtual EPC size. For SGX Launch Control virtualization, we should allow admin to create VM with either VM's virtual IA32_SGXLEPUBKEYHASHn locked or unlocked, and we should also allow admin to create VM with custom IA32_SGXLEPUBKEYHASHn value. For above purposes, below new 'sgx' XL configure file parameter is added: sgx = 'epc=<size>,lehash=<sha256-hash>,lewr=<0|1>' In which 'epc' specifies VM's EPC size in MB and it's mandatory. When physical machine is in *locked* mode, both 'lehash' and 'lewr' cannot be specificed, as physical machine are unable to change IA32_SGXLEPUBKEYHASHn at runtime. Adding either 'lehash' and 'lewr' will cause failure to create VM in that case. And VM's initial IA32_SGXLEPUBKEYHASHn value will be set to value of physical MSRs. When physical machine is in *unlocked* mode, then VM's initial IA32_SGXLEPUBKEYHASHn value will be set to 'lehash' if specified, or Intel's default value. VM's SGX_LAUNCH_CONTROL_ENABLE bit in IA32_FEATURE_CONTROL will be set or cleared, depending on whether 'lewr' is specificied (or set to true or false expilicity). Please also refer to 2.2.4 Launch Control Support. 2.1.2 New XL commands (?) Administrator should be able to get physical EPC size, and all domain's virtual EPC size. For this purpose, we can introduce 2 additional commands: # xl sgxinfo Which will print out physical EPC size, and other SGX info (such as SGX1, SGX2, etc) if necessary. # xl sgxlist <did> Which will print out particular domain's virtual EPC size, or list all virtual EPC sizes for all supported domains. Alternatively, we can also extend existing XL commands by adding new option # xl info -sgx Which will print out physical EPC size along with other physinfo. And # xl list <did> -sgx Which will print out domain's virtual EPC size. Comments? In this RFC the two new commands are not implemented yet. 2.1.3 Notify domain's virtual EPC base and size to Xen Xen needs to know guest's EPC base and size in order to populate EPC pages for it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid. 2.2 High Level Xen Hypervisor Changes: 2.2.1 EPC Management Xen hypervisor needs to detect SGX, discover EPC, and manage EPC before supporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on, until invaid EPC is reported), but this is typically on MP-socket server on which each package would have its own EPC. EPC is reported as reserved memory (so it is not reported as normal memory). EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and free EPC pages for guest. Although typically on physical machine (at least existing machines), EPC is ~100M in size at maximum, but we cannot assume EPC size, thus in terms of EPC management, it's better to integrate EPC management to Xen's memmory management framework to take advantage of existing Xen's memory management algorithms. Specifically, one 'struct page_info' will be created for each EPC page, just like normal memory, and a new flag will be defined to identify whether 'struct page_info' is EPC or normal memory. Existing memory allocation API alloc_domheap_pages will be resued to allocate EPC page, by adding a new memflag 'MEMF_epc' to indicate EPC allocation, rather than memory allocation. The new 'MEMF_epc' can also be used for EPC ballooning (if required in the future), as with the new flag, existing XENMEM_increase{decrease}_reservation, XENMEM_populate_physmap can be resued for EPC as well. 2.2.2 EPC Virtualization This part is how to populate EPC for guests. We have 3 choices: - Static Partitioning - Oversubscription - Ballooning Static Partitioning means all EPC pages will be allocated and mapped to guest when it is created, and there's no runtime change of page table mappings for EPC pages. Oversubscription means Xen hypervisor supports EPC page swapping between domains, meaning Xen is able to evict EPC page from another domain and assign it to the domain that needs the EPC. With oversubscription, EPC can be assigned to domain on demand, when EPT violation happens. Ballooning is similar to memory ballooning. It is basically "Static Partitioning" + "Balloon driver" in guest. Static Partitioning is the easiest way in terms of implementation, and there will be no hypervisor overhead (except EPT overhead of course), because in "Static partitioning", there is no EPT violation for EPC, and Xen doesn't need to turn on ENCLS VMEXIT for guest as ENCLS runs perfectly in non-root mode. Ballooning is "Static Partitioning" + "Balloon driver" in guest. Like "Static Paratitioning", ballooning doesn't need to turn on ENCLS VMEXIT, and doesn't have EPT violation for EPC either. To support ballooning, we need ballooning driver in guest to issue hypercall to give up or reclaim EPC pages. In terms of hypercall, we have two choices: 1) Add new hypercall for EPC ballooning; 2) Using existing XENMEM_{increase/decrease}_reservation with new memory flag, ie, XENMEMF_epc. I'll discuss more regarding to adding dedicated hypercall or not later. Oversubscription looks nice but it requires more complicated implemetation. Firstly, as explained in 1.3.3 EPC Eviction & Reload, we need to follow specific steps to evict EPC pages, and in order to do that, basically Xen needs to trap ENCLS from guest and keep track of EPC page status and enclave info from all guest. This is because: - To evict regular EPC page, Xen needs to know SECS location - Xen needs to know EPC page type: evicting regular EPC and evicting SECS, VA page have different steps. - Xen needs to know EPC page status: whether the page is blocked or not. Those info can only be got by trapping ENCLS from guest, and parsing its parameters (to identify SECS page, etc). Parsing ENCLS parameters means we need to know which ENCLS leaf is being trapped, and we need to translate guest's virtual address to get physical address in order to locate EPC page. And once ENCLS is trapped, we have to emulate ENCLS in Xen, which means we need to reconstruct ENCLS parameters by remapping all guest's virtual address to Xen's virtual address (gva->gpa->pa->xen_va), as ENCLS always use *effective address* which is able to be traslated by processor when running ENCLS. -------------------------------------------------------------- | ENCLS | -------------------------------------------------------------- | /|\ ENCLS VMEXIT| | VMENTRY | | \|/ | 1) parse ENCLS parameters 2) reconstruct(remap) guest's ENCLS parameters 3) run ENCLS on behalf of guest (and skip ENCLS) 4) on success, update EPC/enclave info, or inject error And Xen needs to maintain each EPC page's status (type, blocked or not, in enclave or not, etc). Xen also needs to maintain all Enclave's info from all guests, in order to find the correct SECS for regular EPC page, and enclave's linear address as well. So in general, "Static Partitioning" has simplest implementation, but obviously not the best way to use EPC efficiently; "Ballooning" has all pros of Static Partitioning but requies guest balloon driver; "Oversubscription" is best in terms of flexibility but requires complicated hypervisor implemetation. We will start with "Static Partitioning". If "Ballooning" is required in the future, we will support it. "Oversubscription" should not be needed in forseeable future. 2.2.3 Populate EPC for Guest Toolstack notifies Xen about domain's EPC base and size by XEN_DOMCTL_set_cpuid, so currently Xen populates all EPC pages for guest in XEN_DOMCTL_set_cpuid, particularly, in handling XEN_DOMCTL_set_cpuid for CPUID.0x12.0x2. Once Xen checks the values passed from toolstack is valid, Xen will allocate all EPC pages and setup EPT mappings for guest. 2.2.4 Launch Control Support To support running multiple domains with each running its own LE signed by different owners, physical machine's BIOS must leave IA32_SGXLEPUBKEYHASHn *unlocked* before handing to Xen. Xen will trap domain's write to IA32_SGXLEPUBKEYHASHn and keep the value in vcpu internally, and update the value to physical MSRs when vcpu is scheduled in. This can guarantee that when EINIT runs in guest, guest's virtual IA32_SGXLEPUBKEYHASHn have been written to physical MSRs. SGX_LAUNCH_CONTROL_ENABLE bit in guest's IA32_FEATURE_CONTROL is controlled by new added 'lewr' XL parameter (see 2.1.1 New 'sgx' XL configure file parameter). If physical IA32_SGXLEPUBKEYHASHn are *locked* in machine's BIOS, then only MSR read is allowed from guest, and Xen will inject error for guest's MSR writes. In addition, if physical IA32_SGXLEPUBKEYHASHn are *locked*, then creating guest with 'lehash' parameter or 'lewr' will fail, as in such case Xen is not able to update guest's virtual IA32_SGXLEPUBKEYHASHn to physical MSRs. If physical IA32_SGXLEPUBKEYHASHn are not available (CPUID.0x7.0x0:ECX.SGX_LAUHCN_CONTROL is not present), then creating VM with 'lehash' and 'lewr' will also fail. In addition, any MSR read/write for IA32_SGXLEPUBKEYHASHn from guest is invalid and Xen will inject error in such case. 2.2.5 CPUID Emulation Most of native SGX CPUID info can be exposed to guest, expect below two parts: - Sub-leaf 0x2 needs to report domain's virtual EPC base and size, instead of physical EPC info. - Sub-leaf 0x1 needs to be consistent with guest's XCR0. For the reason of this part please refer to 1.5.2 Interaction with XSAVE. 2.2.6 EPT Violation & ENCLS Trapping Handling Only needed when Xen supports EPC Oversubscription, as explained above. 2.2.7 Guest Suspend & Resume On hardware, EPC is destroyed when power goes to S3-S5. So Xen will destroy guest's EPC when guest's power goes into S3-S5. Currently Xen is notified by Qemu in terms of S State change via HVM_PARAM_ACPI_S_STATE, where Xen will destroy EPC if S State is S3-S5. Specifically, Xen will run EREMOVE for guest's each EPC page, as guest may not handle EPC suspend & resume correctly, in which case physically guest's EPC pages may still be valid, so Xen needs to run EREMOVE to make sure all EPC pages are becoming invalid. Otherwise further operation in guest on EPC may fault as it assumes all EPC pages are invalid after guest is resumed. For SECS page, EREMOVE may fault with SGX_CHILD_PRESENT, in which case Xen will keep this SECS page into a list, and call EREMOVE for them again after all EPC pages have been called with EREMOVE. This time the EREMOVE on SECS will succeed as all children (regular EPC pages) have already been removed. 2.2.8 Destroying Domain Normally Xen just frees all EPC pages for domain when it is destroyed. But Xen will also do EREMOVE on all guest's EPC pages (described in above 2.2.7) before free them, as guest may shutdown unexpected (ex, user kills guest), and in this case, guest's EPC may still be valid. 2.3 Additional Point: Live Migration, Snapshot Support (?) Actually from hardware's point of view, SGX is not migratable. There are two reasons: - SGX key architecture cannot be virtualized. For example, some keys are bound to CPU. For example, Sealing key, EREPORT key, etc. If VM is migrated to another machine, the same enclave will derive the different keys. Taking Sealing key as an example, Sealing key is typically used by enclave (enclave can get sealing key by EGETKEY) to *seal* its secrets to outside (ex, persistent storage) for further use. If Sealing key changes after VM migration, then the enclave can never get the sealed secrets back by using sealing key, as it has changed, and old sealing key cannot be got back. - There's no ENCLS to evict EPC page to normal memory, but at the meaning time, still keep content in EPC. Currently once EPC page is evicted, the EPC page becomes invalid. So technically, we are unable to implement live migration (or check pointing, or snapshot) for enclave. But, with some workaround, and some facts of existing SGX driver, technically we are able to support Live migration (or even check pointing, snapshot). This is because: - Changing key (which is bound to CPU) is not a problem in reality Take Sealing key as an example. Losing sealed data is not a problem, because sealing key is only supposed to encrypt secrets that can be provisioned again. The typical work model is, enclave gets secrets provisioned from remote (service provider), and use sealing key to store it for further use. When enclave tries to *unseal* use sealing key, if the sealing key is changed, enclave will find the data is some kind of corrupted (integrity check failure), so it will ask secrets to be provisioned again from remote. Another reason is, in data center, VM's typically share lots of data, and as sealing key is bound to CPU, it means the data encrypted by one enclave on one machine cannot be shared by another enclave on another mahcine. So from SGX app writer's point of view, developer should treat Sealing key as a changeable key, and should handle lose of sealing data anyway. Sealing key should only be used to seal secrets that can be easily provisioned again. For other keys such as EREPORT key and provisioning key, which are used for local attestation and remote attestation, due to the second reason below, losing them is not a problem either. - Sudden lose of EPC is not a problem. On hardware, EPC will be lost if system goes to S3-S5, or reset, or shutdown, and SGX driver need to handle lose of EPC due to power transition. This is done by cooperation between SGX driver and userspace SGX SDK/apps. However during live migration, there may not be power transition in guest, so there may not be EPC lose during live migration. And technically we cannot *really* live migrate enclave (explained above), so looks it's not feasible. But the fact is that both Linux SGX driver and Windows SGX driver have already supported *sudden* lose of EPC (not EPC lose during power transition), which means both driver are able to recover in case EPC is lost at any runtime. With this, technically we are able to support live migration by simply ignoring EPC. After VM is migrated, the destination VM will only suffer *sudden* lose of EPC, which both Windows SGX driver and Linux SGX driver are already able to handle. But we must point out such *sudden* lose of EPC is not hardware behavior, and other SGX driver for other OSes (such as FreeBSD) may not implement this, so for those guests, destination VM will behavior in unexpected manner. But I am not sure we need to care about other OSes. For the same reason, we are able to support check pointing for SGX guest (only Linux and Windows); For snapshot, we can support snapshot SGX guest by either: - Suspend guest before snapshot (s3-s5). This works for all guests but requires user to manually susppend guest. - Issue an hypercall to destroy guest's EPC in save_vm. This only works for Linux and Windows but doesn't require user intervention. What's your comments? 3. Reference - Intel SGX Homepage https://software.intel.com/en-us/sgx - Linux SGX SDK https://01.org/intel-software-guard-extensions - Linux SGX driver for upstreaming https://github.com/01org/linux-sgx - Intel SGX Specification (SDM Vol 3D) https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf - Paper: Intel SGX Explained https://eprint.iacr.org/2016/086.pdf - ISCA 2015 tutorial slides for Intel® SGX - Intel® Software https://software.intel.com/sites/default/files/332680-002.pdf Boqun Feng (5): xen: mm: introduce non-scrubbable pages xen: mm: manage EPC pages in Xen heaps xen: x86/mm: add SGX EPC management xen: x86: add functions to populate and destroy EPC for domain xen: tools: add SGX to applying MSR policy Kai Huang (12): xen: x86: expose SGX to HVM domain in CPU featureset xen: x86: add early stage SGX feature detection xen: vmx: detect ENCLS VMEXIT xen: x86/mm: introduce ioremap_wb() xen: p2m: new 'p2m_epc' type for EPC mapping xen: x86: add SGX cpuid handling support. xen: vmx: handle SGX related MSRs xen: vmx: handle ENCLS VMEXIT xen: vmx: handle VMEXIT from SGX enclave xen: x86: reset EPC when guest got suspended. xen: tools: add new 'sgx' parameter support xen: tools: add SGX to applying CPUID policy docs/misc/xen-command-line.markdown | 8 + tools/libxc/Makefile | 1 + tools/libxc/include/xc_dom.h | 4 + tools/libxc/include/xenctrl.h | 16 + tools/libxc/xc_cpuid_x86.c | 68 ++- tools/libxc/xc_msr_x86.h | 10 + tools/libxc/xc_sgx.c | 82 +++ tools/libxl/libxl.h | 3 +- tools/libxl/libxl_cpuid.c | 15 +- tools/libxl/libxl_create.c | 10 + tools/libxl/libxl_dom.c | 65 ++- tools/libxl/libxl_internal.h | 2 + tools/libxl/libxl_nocpuid.c | 4 +- tools/libxl/libxl_types.idl | 11 + tools/libxl/libxl_x86.c | 12 + tools/ocaml/libs/xc/xenctrl_stubs.c | 11 +- tools/python/xen/lowlevel/xc/xc.c | 11 +- tools/xl/xl_parse.c | 86 +++ tools/xl/xl_parse.h | 1 + xen/arch/x86/Makefile | 1 + xen/arch/x86/cpu/common.c | 15 + xen/arch/x86/cpuid.c | 62 ++- xen/arch/x86/domctl.c | 87 ++- xen/arch/x86/hvm/hvm.c | 3 + xen/arch/x86/hvm/vmx/vmcs.c | 16 +- xen/arch/x86/hvm/vmx/vmx.c | 68 +++ xen/arch/x86/hvm/vmx/vvmx.c | 11 + xen/arch/x86/mm.c | 9 +- xen/arch/x86/mm/p2m-ept.c | 3 + xen/arch/x86/mm/p2m.c | 41 ++ xen/arch/x86/msr.c | 6 +- xen/arch/x86/sgx.c | 815 ++++++++++++++++++++++++++++ xen/common/page_alloc.c | 39 +- xen/include/asm-arm/mm.h | 9 + xen/include/asm-x86/cpufeature.h | 4 + xen/include/asm-x86/cpuid.h | 29 +- xen/include/asm-x86/hvm/hvm.h | 3 + xen/include/asm-x86/hvm/vmx/vmcs.h | 8 + xen/include/asm-x86/hvm/vmx/vmx.h | 3 + xen/include/asm-x86/mm.h | 19 +- xen/include/asm-x86/msr-index.h | 6 + xen/include/asm-x86/msr.h | 5 + xen/include/asm-x86/p2m.h | 12 +- xen/include/asm-x86/sgx.h | 86 +++ xen/include/public/arch-x86/cpufeatureset.h | 3 +- xen/include/xen/mm.h | 2 + xen/tools/gen-cpuid.py | 3 + 47 files changed, 1757 insertions(+), 31 deletions(-) create mode 100644 tools/libxc/xc_sgx.c create mode 100644 xen/arch/x86/sgx.c create mode 100644 xen/include/asm-x86/sgx.h -- 2.15.0 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.