[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Cache Allocation Technology(CAT) design for XEN
Hi all, we plan to bring Intel CAT into XEN. This is the initial design for that. Comments/suggestions are welcome. Background ========== Traditionally, all Virtual Machines ("VMs") share the same set of system cache resources. There is no hardware support to control allocation or availability of cache resources to individual VMs. The lack of such partition mechanism for cache resource makes the cache utilization for different types of VMs inefficient. While on the other side, more and more cache resources become available on modern server platforms. With the introduction of Intel Cache Allocation Technology ("CAT"), now Virtualization Machine Monitor ("VMM") has the ability to partition the cache allocation per VM, based on the priority of VM. CAT Introduction ================ Generally speaking, CAT introduces a mechanism for software to enable cache allocation based on application priority or Class of Service ("COS"). Cache allocation for the respective applications is then restricted based on the COS with which they are associated. Each COS can be configured using capacity bitmasks ("CBM") which represent cache capacity and indicate the degree of overlap and isolation between classes. For each logical processor there is a register exposed(IA32_PQR_ASSOC MSR) to allow the OS/VMM to specify a COS when an application, thread or VM is scheduled. Cache allocation for the indicated application/thread/VM is then controlled automatically by the hardware based on the COS and the CBM associated with that class. Hardware initializes COS of each logical processor to 0 and the corresponding CBM is set to all-ones, means all the system cache resource can be used for each application. For more information please refer to Section 17.15 in the Intel SDM [1]. Design Overview =============== - Domain COS/CBM association When enforcing cache allocation for VMs, the minimum granularity is defined as the domain. All Virtual CPUs ("VCPUs") of a domain have the same COS, and therefore, correspond to the same CBM. COS is used only in hypervisor and is transparent to tool stack/user. System administrator can specify the initial CBM for each domain or change it at runtime using tool stack. Hypervisor then choses a free COS to associate it with that CBM or find a existed COS which has the same CBM. - VCPU Schedule When VCPU is scheduled on the physical CPU ("PCPU"), its COS value is then written to MSR (IA32_PQR_ASSOC) of PCPU to notify hardware to use the new COS. The cache allocation is then enforced by hardware. - Multi-Socket In multi-socket environment, each VCPU may be scheduled on different sockets. The hardware CAT ability(such as maximum supported COS and length of CBM) maybe different among sockets. For such system, per-socket COS/CBM configuration of a domain is specified. Hypervisor then use this per-socket CBM information for VCPU schedule. Implementation Description ========================== In this design, one principal is that only implementing the cache enforcement mechanism in hypervisor but leaving the cache allocation policy to user space tool stack. In this way some complex governors then can be implemented in tool stack. In summary, hypervisor changes include: 1) A new field "cat_info" in domain structure to indicate the CAT information for each socket. It points to array of structure: struct domain_socket_cat_info { unsigned int cbm; /* CBM specified by toolstack */ unsigned int cos; /* COS allocated by Hypervisor */ } 2) A new SYSCTL to expose the CAT information to tool stack: * Whether CAT is enabled; * Max COS supported; * Length of CBM; * Other needed information from host cpuid; 3) A new DOMCTL to allow tool stack to set/get CBM for a specified domain for each socket. 4) Context switch: write COS of domain to MSR (IA32_PQR_ASSOC) of PCPU. 5) XSM policy to restrict the functions visibility to control domain only. Hypervisor interfaces: 1) Boot line param: "psr=cat" to enable the feature. 2) SYSCTL: XEN_SYSCTL_psr_cat_op - XEN_SYSCTL_PSR_CAT_INFO_GET: Get system CAT information; 3) DOMCTL: XEN_DOMCTL_psr_cat_op - XEN_DOMCTL_PSR_CAT_OP_CBM_SET: Set CBM value for a domain. - XEN_DOMCTL_PSR_CAT_OP_CBM_GET: Get CBM value for a domain. xl interfaces: 1) psr-cat-show: Show system/runtime CAT information. => XEN_SYSCTL_PSR_CAT_INFO_GET/XEN_DOMCTL_PSR_CAT_OP_CBM_GET 2) psr-cat-cbm-set [dom] [cbm] [socket]: Set CBM for a domain. => XEN_DOMCTL_PSR_CAT_OP_CBM_SET Hardware Limitation & Performance Improvement ============================================= As the COS of PCPU in IA32_PQR_ASSOC is changed on each VCPU context switch. If the change is frequent then hardware may fail to strictly enforce the cache allocation basing on the specified COS. Due to this the strict placement characteristic would soften if VCPU is migrated on different PCPU frequently. For this reason, lazy updating for IA32_PQR_ASSOC will be done. Also this design allows CAT to run in two modes: 1) Non Affinitized mode: Each VM can be freely scheduled on any PCPU assigning its COS as it does. 2) Affinitized mode: Each PCPU is assigned a fixed COS and a VM can be scheduled on the PCPU only when it has a same COS. It's less flexible but can be an option for those who must have strict COS placement or in cases where problems have arisen because of the less strict nature of the non-affinitized mode. However, no additional code is designed to support these two modes. CAT is already running in non affinitized mode by default. If affinitized mode is desirable, then existed "xl vcpu-pin" command can be used to pin all the VCPUs which has the same COS to certain fixed PCPUs so that these PCPUs always have the same COS set. [1] http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf Chao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |