[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Cache Allocation Technology(CAT) design for XEN



Hi all, we plan to bring Intel CAT into XEN. This is the initial
design for that. Comments/suggestions are welcome.

Background
==========
Traditionally, all Virtual Machines ("VMs") share the same set of system
cache resources. There is no hardware support to control allocation or
availability of cache resources to individual VMs. The lack of such
partition mechanism for cache resource makes the cache utilization for
different types of VMs inefficient. While on the other side, more and
more cache resources become available on modern server platforms.

With the introduction of Intel Cache Allocation Technology ("CAT"), now
Virtualization Machine Monitor ("VMM") has the ability to partition the
cache allocation per VM, based on the priority of VM.


CAT Introduction
================
Generally speaking, CAT introduces a mechanism for software to enable
cache allocation based on application priority or Class of Service
("COS"). Cache allocation for the respective applications is then
restricted based on the COS with which they are associated. Each COS can
be configured using capacity bitmasks ("CBM") which represent cache
capacity and indicate the degree of overlap and isolation between
classes. For each logical processor there is a register
exposed(IA32_PQR_ASSOC MSR) to allow the OS/VMM to specify a COS when an
application, thread or VM is scheduled. Cache allocation for the
indicated application/thread/VM is then controlled automatically by the
hardware based on the COS and the CBM associated with that class.
Hardware initializes COS of each logical processor to 0 and the
corresponding CBM is set to all-ones, means all the system cache
resource can be used for each application.

For more information please refer to Section 17.15 in the Intel SDM [1].


Design Overview
===============
- Domain COS/CBM association
When enforcing cache allocation for VMs, the minimum granularity is
defined as the domain. All Virtual CPUs ("VCPUs") of a domain have the
same COS, and therefore, correspond to the same CBM. COS is used only in
hypervisor and is transparent to tool stack/user. System administrator
can specify the initial CBM for each domain or change it at runtime using 
tool stack. Hypervisor then choses a free COS to associate it with that
CBM or find a existed COS which has the same CBM.

- VCPU Schedule
When VCPU is scheduled on the physical CPU ("PCPU"), its COS value is
then written to MSR (IA32_PQR_ASSOC) of PCPU to notify hardware to use 
the new COS. The cache allocation is then enforced by hardware.

- Multi-Socket
In multi-socket environment, each VCPU may be scheduled on different
sockets. The hardware CAT ability(such as maximum supported COS and length
of CBM) maybe different among sockets. For such system, per-socket COS/CBM
configuration of a domain is specified. Hypervisor then use this per-socket
CBM information for VCPU schedule.


Implementation Description
==========================
In this design, one principal is that only implementing the cache
enforcement mechanism in hypervisor but leaving the cache allocation
policy to user space tool stack. In this way some complex governors then
can be implemented in tool stack. 

In summary, hypervisor changes include:
1) A new field "cat_info" in domain structure to indicate the CAT
   information for each socket. It points to array of structure:
   struct domain_socket_cat_info {
       unsigned int cbm; /* CBM specified by toolstack  */
       unsigned int cos; /* COS allocated by Hypervisor */
   }
2) A new SYSCTL to expose the CAT information to tool stack:
     * Whether CAT is enabled;
     * Max COS supported;
     * Length of CBM;
     * Other needed information from host cpuid;
3) A new DOMCTL to allow tool stack to set/get CBM for a specified domain
   for each socket.
4) Context switch: write COS of domain to MSR (IA32_PQR_ASSOC) of PCPU.
5) XSM policy to restrict the functions visibility to control domain only.

Hypervisor interfaces:
1) Boot line param: "psr=cat" to enable the feature.
2) SYSCTL: XEN_SYSCTL_psr_cat_op
     - XEN_SYSCTL_PSR_CAT_INFO_GET: Get system CAT information;
3) DOMCTL: XEN_DOMCTL_psr_cat_op
     - XEN_DOMCTL_PSR_CAT_OP_CBM_SET: Set CBM value for a domain.
     - XEN_DOMCTL_PSR_CAT_OP_CBM_GET: Get CBM value for a domain.

xl interfaces:
1) psr-cat-show: Show system/runtime CAT information.
     => XEN_SYSCTL_PSR_CAT_INFO_GET/XEN_DOMCTL_PSR_CAT_OP_CBM_GET
2) psr-cat-cbm-set [dom] [cbm] [socket]: Set CBM for a domain.
     => XEN_DOMCTL_PSR_CAT_OP_CBM_SET


Hardware Limitation & Performance Improvement
=============================================
As the COS of PCPU in IA32_PQR_ASSOC is changed on each VCPU context
switch. If the change is frequent then hardware may fail to strictly
enforce the cache allocation basing on the specified COS. Due to this
the strict placement characteristic would soften if VCPU is migrated on
different PCPU frequently.

For this reason, lazy updating for IA32_PQR_ASSOC will be done. Also this
design allows CAT to run in two modes:

1) Non Affinitized mode: Each VM can be freely scheduled on any PCPU
assigning its COS as it does.

2) Affinitized mode: Each PCPU is assigned a fixed COS and a VM can be
scheduled on the PCPU only when it has a same COS. It's less flexible
but can be an option for those who must have strict COS placement or in
cases where problems have arisen because of the less strict nature of the
non-affinitized mode.

However, no additional code is designed to support these two modes. CAT is
already running in non affinitized mode by default. If affinitized mode
is desirable, then existed "xl vcpu-pin" command can be used to pin all
the VCPUs which has the same COS to certain fixed PCPUs so that these 
PCPUs always have the same COS set.

[1] 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.