|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH 00/15] RFC: SGX virtualization design and draft patches
On 7/17/2017 6:08 PM, Huang, Kai wrote: Hi Andrew,Thank you very much for comments. Sorry for late reply, and please see my reply below.On 7/12/2017 2:13 AM, Andrew Cooper wrote:On 09/07/17 09:03, Kai Huang wrote:Hi all,This series is RFC Xen SGX virtualization support design and RFC draft patches.Thankyou very much for this design doc.2. SGX Virtualization Design 2.1 High Level Toolstack Changes: 2.1.1 New 'epc' parameterEPC is limited resource. In order to use EPC efficiently among all domains, when creating guest, administrator should be able to specify domain's virtualEPC size. And admin alao should be able to get all domain's virtual EPC size.For this purpose, a new 'epc = <size>' parameter is added to XL configuration file. This parameter specifies guest's virtual EPC size. The EPC base address will be calculated by toolstack internally, according to guest's memory size, MMIO size, etc. 'epc' is MB in unit and any 1MB aligned value will be accepted.How will this interact with multi-package servers? Even though its fine to implement the single-package support first, the design should be extensible to the multi-package case.First of all, what are the implications of multi-package SGX?(Somewhere) you mention changes to scheduling. I presume this is because a guest with EPC mappings in EPT must be scheduled on the same package, or ENCLU[EENTER] will fail. I presume also that each package will have separate, unrelated private keys?The ENCLU[EENTE] will continue to work on multi-package server. Actually I was told all ISA existing behavior documented in SDM won't change for server, as otherwise this would be a bad design :)Unfortunately I was told I cannot talk about MP server SGX a lot now. Basically I can only talk about staff already documented in SDM (sorry :( ). But I guess multiple EPC in CPUID is designed to cover MP server, at lease mainly (we can do reasonable guess).In terms of the design, I think we can follow XL config file parameters for memory. 'epc' parameter will always specify totol EPC size that the domain has. And we can use existing NUMA related parameters, such as setting cpus='...' to physically pin vcpu to specific pCPUs, so that EPC will be mostly allocated from related node. If that node runs out of EPC, we can decide whether to allocate EPC from other node, or fail to create domain. I know Linux supports NUMA policy which can specify whether to allow allocating memory from other nodes, does Xen has such policy? Sorry I haven't checked this. If Xen has such policy, we need to choose whether to use memory policy, or introduce new policy for EPC.If we are going to support vNUAM EPC in the future. We can also use similar way to config vNUMA EPC in XL config.Sorry I mentioned scheduling. I should say *potentially* :). My thinking was as SGX is per-thread, then SGX info reported by different CPU package may be different (ex, whether SGX2 is supported), then we may need scheduler to be aware of SGX. But I think we don't have to consider this now.What's your comments?I presume there is no sensible way (even on native) for a single logical process to use multiple different enclaves? By extension, does it make sense to try and offer parts of multiple enclaves to a single VM?The native machine allows running multiple enclaves, even signed by multiple authors. SGX only has limit that before launching any other enclave, Launch Enclave (LE) must be launched. LE is the only enclave that doesn't require EINITTOKEN in EINIT. For LE, its signer (SHA256(sigstruct->modulus)) must be equal to the value in IA32_SGXLEPUBKEYHASHn MSRs. LE will generates EINITTOKEN for other enclaves (EINIT for other enclaves requires EINITTOKEN). For other enclaves, there's no such limitation that enclave's signer must match IA32_SGXLEPUBKEYHASHn so the signer can be anybody. But for other enclaves, before running EINIT, the LE's signer (which is equal to IA32_SGXLEPUBKEYHASHn as explained above) needs to be updated to IA32_SGXLEPUBKEYHASHn (MSRs can be changed, for example, when there's multiple LEs running in OS). This is because EINIT needs to perform EINITTOKEN integrity check (EINITTOKEN contains MAC info that calculated by LE, and EINIT needs LE's IA32_SGXLEPUBKEYHASHn to derive the key to verify MAC).SGX in VM doesn't change those behaviors, so in VM, the enclaves can also be signed by anyone, but Xen needs to emulate IA32_SGXLEPUBKEYHASHn so that when one VM is running, the correct IA32_SGXLEPUBKEYHASHn are already in physical MSRs.2.1.3 Notify domain's virtual EPC base and size to XenXen needs to know guest's EPC base and size in order to populate EPC pages for it. Toolstack notifies EPC base and size to Xen via XEN_DOMCTL_set_cpuid.I am currently in the process of reworking the Xen/Toolstack interface when it comes to CPUID handling. The latest design is available here: https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00378.html but the end result will be the toolstack expressing its CPUID policy in terms of the architectural layout.Therefore, I would expect that, however the setting is represented in the configuration file, xl/libxl would configure it with the hypervisor by setting CPUID.0x12[2] with the appropriate base and size.I agree. I saw you are planning to introduce new XEN_DOMCTL_get{set}_cpuid_policy, which will allow toolstack to query/set cpuid policy in single hypercall (if I understand correctly), so I think we should definitely use the new hypercalls.I also saw you are planning to introduce new hypercall to query raw/host/pv_max/hvm_max cpuid policy (not just featureset), so I think 'xl sgxinfo' (or xl info -sgx) can certainly use that to get physical SGX info (EPC info). And 'xl sgxlist' (or xl list -sgx) can use XEN_DOMCTL_get{set}_cpuid_policy to display domain's SGX info (EPC info).Btw, do you think we need 'xl sgxinfo' and 'xl sgxlist'? If we do, which is better? New 'xl sgxinfo' and 'xl sgxlist', or extending existing 'xl info' and 'xl list' to support SGX, such as 'xl info -sgx' and 'xl list -sgx' above?2.1.4 Launch Control Support (?)Xen Launch Control Support is about to support running multiple domains with each running its own LE signed by different owners (if HW allows, explained below). As explained in 1.4 SGX Launch Control, EINIT for LE (Launch Enclave) only succeeds when SHA256(SIGSTRUCT.modulus) matches IA32_SGXLEPUBKEYHASHn,and EINIT for other enclaves will derive EINITTOKEN key according to IA32_SGXLEPUBKEYHASHn. Therefore, to support this, guest's virtualIA32_SGXLEPUBKEYHASHn must be updated to phyiscal MSRs before EINIT (which also means the physical IA32_SGXLEPUBKEYHASHn need to be *unlocked* in BIOSbefore booting to OS).For physical machine, it is BIOS's writer's decision that whether BIOS would provide interface for user to specify customerized IA32_SGXLEPUBKEYHASHn (it is default to digest of Intel's signing key after reset). In reality, OS's SGX driver may require BIOS to make MSRs *unlocked* and actively write the hash value to MSRs in order to run EINIT successfully, as in this case, the driver will not depend on BIOS's capability (whether it allows user to customerizeIA32_SGXLEPUBKEYHASHn value).The problem is for Xen, do we need a new parameter, such as 'lehash=<SHA256>' to specify the default value of guset's virtual IA32_SGXLEPUBKEYHASHn? And do we need a new parameter, such as 'lewr' to specify whether guest's virtual MSRsare locked or not before handling to guest's OS?I tends to not introduce 'lehash', as it seems SGX driver would actively updatethe MSRs. And new parameter would add additional changes for upper layersoftware (such as openstack). And 'lewr' is not needed either as Xen can always*unlock* the MSRs to guest. Please give comments? Currently in my RFC patches above two parameters are not implemented. Xen hypervisor will always *unlock* the MSRs. Whether there is 'lehash' parameter or not doesn't impact Xen hypervisor's emulation of IA32_SGXLEPUBKEYHASHn. See below Xen hypervisor changes for details.Reading around, am I correct with the following?1) Some processors have no launch control. There is no restriction on which enclaves can boot.Yes that some processors have no launch control. However it doesn't mean there's no restriction on which enclaves can boot. Contrary, on those machines only Intel's Launch Enclave (LE) can run, as on those machine, IA32_SGXLEPUBKEYHASHn either doesn't exist, or equal to digest of Intel's signing RSA pubkey. However although only Intel's LE can be run, we can still run other enclaves from other signers. Please see my reply above.2) Some Skylake client processors claim to have launch control, but the MSRs are unavailable (is this an erratum?). These are limited to booting enclaves matching the Intel public key.Sorry I don't know whether this is an erratum. I will get back to you after confirming internally. Hi Andrew,I raised this internally, and it turns out that in the latest SDM Intel has fixed the statement, so that IA32_SGXLEPUBKEYHASHn MSRs are only available when both SGX and SGX_LC is present in CPUID. When I was writing the design and patches, I was referring to old SDM, and the old one doesn't mention SGX_LC in CPUID as condition. So it is my fault and this statement has been fixed in latest SDM (41.2.2 Intel SGX Launch Control Configuration): https://software.intel.com/sites/default/files/managed/7c/f1/332831-sdm-vol-3d.pdf However in latest SDM volume 4: Model-Specific Registers: https://software.intel.com/sites/default/files/managed/22/0d/335592-sdm-vol-4.pdfYou can still see that for IA32_SGXLEPUBKEYHASHn (table 2-2, register address 8CH): "Read permitted If CPUID.(EAX=12H,ECX=0H):EAX[0]=1". So there's still error in SDM. I don't think this will be an erratum. Intel will fix the error in vol 4 in next version SDM. We should refer to 41.2.2 as it has accurate description. 3) Launch control may be locked by the BIOS. There may be a custom hash, or it might be the Intel default. Xen can't adjust it at all, but can support running any number of VMs with matching enclaves.Yes Launch control may be locked by BIOS, although this depends on whether BIOS provides interface for user to configure. I was told that typically BIOS will unlock Launch Control, as SGX driver is expecting such behavior. But I am not sure we can always assume this.Whether there will be custom hash also depends on BIOS. BIOS may or may not provide interface for user to configure custom hash. So on physical machine, I think we need to consider all the cases. On machine that with Launch control *unlocked*, Xen is able to dynamically change IA32_SGXLEKEYHASHn so that Xen is able to run multiple VM with each running LE from different signer. However if launch control is *locked* in BIOS, then Xen is still able to run multiple VM, but all VM can only run LE from the signer that matches the IA32_SGXLEPUBKEYHASHn (which in most case should be Intel default, but can be custom hash if BIOS allows user to configure).Sorry I am not quite sure the typical implementation of BIOS. I think I can reach out internally and get back to you if I have something. I also reached out internally to find the typical BIOS implementation in terms of SGX LC. Typically BIOS will neither provide configuration options for user to set custom hash, nor select whether MSRs are locked or not. Typically for client machine, MSRs are locked with Intel default, and for server machine, MSRs are unlocked. But we cannot rule out 3rd party to provide different BIOS that may provide options for user to choose locked/unlocked mode, and/or for user to specify custom hash. Custom hash + locked mode may be useful for some special purpose (ex, IT management) as it provides most secure option -- that even kernel/VMM can only launch LE signed with particular signer. In case of VM, custom hash + locked mode may be even more useful than bare-metal as VM is usually supposed to run some particular purpose appliance. So I think it is better to keep 'lehash' and 'lewr' XL parameters. They both are optional -- the former provides custom hash, and the latter set VM to be in unlocked mode. If neither is specified, then VM will be in locked mode, and VM's virtual IA32_SGXLEPUBKEYHASHn either have Intel's default value (when physical machine is unlocked), or have machine's MSR values (when machine is in locked mode). And when physical machine is in locked mode, specifying either 'lehash' or 'lewr' will result in creating VM failure. So we have 3 XL parameters for SGX: 'epc', 'lehash' and 'lewr', probably we should consolidate them into one XL parameter, such sgx=['epc=<size>', 'lehash=<sha256>', 'lewr=[on|off]'] ? Thanks, -Kai 4) Launch control may be unlocked by the BIOS. In this case, Xen can context switch a hash per domain, and run all enclaves.Yes. With enclave == LE I think you meant.The eventual plans for CPUID and MSR levelling should allow all of these to be expressed in sensible ways, and I don't forsee any issues with supporting all of these scenarios.So do you think we should have 'lehash' and 'lewr' parameters in XL config file? The former provides custom hash, and the latter provides whether unlock guest's Launch control.My thinking is SGX driver needs to *actively* write LE's pubkey hash to IA32_SGXLEPUBKEYHASHn in *unlocked* mode, so 'lehash' alone is not needed. 'lehash' only has meaning when 'lewr' is needed to provide a default hash value in locked mode, as if we always use *unlocked* mode for guest, 'lehash' is not necessary.2.2 High Level Xen Hypervisor Changes: 2.2.1 EPC Management (?) Xen hypervisor needs to detect SGX, discover EPC, and manage EPC beforesupporting SGX to guest. EPC is detected via SGX CPUID 0x12.0x2. It's possible that there are multiple EPC sections (enumerated via sub-leaves 0x3 and so on, until invaid EPC is reported), but this is only true on multiple-socket server machines. For server machines there are additional things also needs to be done, such as NUMA EPC, scheduling, etc. We will support server machine in the futurebut currently we only support one EPC.EPC is reported as reserved memory (so it is not reported as normal memory). EPC must be managed in 4K pages. CPU hardware uses EPCM to track status of each EPC pages. Xen needs to manage EPC and provide functions to, ie, alloc and freeEPC pages for guest.There are two ways to manage EPC: Manage EPC separately; or Integrate it toexisting memory management framework.It is easy to manage EPC separately, as currently EPC is pretty small (~100MB), and we can even put them in a single list. However it is not flexible, for example, you will have to write new algorithms when EPC becomes larger, ex, GB. And you have to write new code to support NUMA EPC (although this will not comein short time).Integrating EPC to existing memory management framework seems more reasonable, as in this way we can resume memory management data structures/algorithms, and it will be more flexible to support larger EPC and potentially NUMA EPC. But modifying MM framework has a higher risk to break existing memory managementcode (potentially more bugs). In my RFC patches currently we choose to manage EPC separately. A newstructure epc_page is added to represent a single 4K EPC page. A whole array of struct epc_page will be allocated during EPC initialization, so that given the other, one of PFN of EPC page and 'struct epc_page' can be got by addingoffset. But maybe integrating EPC to MM framework is more reasonable. Comments? 2.2.2 EPC Virtualization (?)It looks like managing the EPC is very similar to managing the NVDIMM ranges. We have a (set of) physical address ranges which need 4k ownership granularity to different domains.I think integrating this into struct page_struct is the better way to go.Will do. So I assume we will introduce new MEMF_epc, and use existing alloc_domheap/xenheap_pages to allocate EPC? MEMF_epc can also be used if we need to support ballooning in the future (using existing XENMEM_{decrease/increase}_reservation. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |