[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 01/15] xen/common: add cache coloring common code
On 03.02.2024 11:57, Carlo Nonato wrote: > On Wed, Jan 31, 2024 at 4:57 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: >> On 29.01.2024 18:17, Carlo Nonato wrote: >>> +Background >>> +********** >>> + >>> +Cache hierarchy of a modern multi-core CPU typically has first levels >>> dedicated >>> +to each core (hence using multiple cache units), while the last level is >>> shared >>> +among all of them. Such configuration implies that memory operations on one >>> +core (e.g. running a DomU) are able to generate interference on another >>> core >>> +(e.g .hosting another DomU). Cache coloring allows eliminating this >>> +mutual interference, and thus guaranteeing higher and more predictable >>> +performances for memory accesses. >>> +The key concept underlying cache coloring is a fragmentation of the memory >>> +space into a set of sub-spaces called colors that are mapped to disjoint >>> cache >>> +partitions. Technically, the whole memory space is first divided into a >>> number >>> +of subsequent regions. Then each region is in turn divided into a number of >>> +subsequent sub-colors. The generic i-th color is then obtained by all the >>> +i-th sub-colors in each region. >>> + >>> +:: >>> + >>> + Region j Region j+1 >>> + ..................... ............ >>> + . . . >>> + . . >>> + _ _ _______________ _ _____________________ _ _ >>> + | | | | | | | >>> + | c_0 | c_1 | | c_n | c_0 | c_1 | >>> + _ _ _|_____|_____|_ _ _|_____|_____|_____|_ _ _ >>> + : : >>> + : :... ... . >>> + : color 0 >>> + :........................... ... . >>> + : >>> + . . ..................................: >>> + >>> +There are two pragmatic lesson to be learnt. >>> + >>> +1. If one wants to avoid cache interference between two domains, different >>> + colors needs to be used for their memory. >>> + >>> +2. Color assignment must privilege contiguity in the partitioning. E.g., >>> + assigning colors (0,1) to domain I and (2,3) to domain J is better >>> than >>> + assigning colors (0,2) to I and (1,3) to J. >> >> I can't connect this 2nd point with any of what was said above. > > If colors are contiguous then a greater spatial locality is achievable. You > mean we should better explain this? Yes, but not just that. See how you using "must" in the text contradicts you now suggesting this is merely an optimization. >>> +How to compute the number of colors >>> +*********************************** >>> + >>> +To compute the number of available colors for a specific platform, the >>> size of >>> +an LLC way and the page size used by Xen must be known. The first >>> parameter can >>> +be found in the processor manual or can be also computed dividing the total >>> +cache size by the number of its ways. The second parameter is the minimum >>> +amount of memory that can be mapped by the hypervisor, >> >> I find "amount of memory that can be mapped" quite confusing here. Don't you >> really mean the granularity at which memory can be mapped? > > Yes that's what I wanted to describe. I'll change it. > >>> thus dividing the way >>> +size by the page size, the number of total cache partitions is found. So >>> for >>> +example, an Arm Cortex-A53 with a 16-ways associative 1 MiB LLC, can >>> isolate up >>> +to 16 colors when pages are 4 KiB in size. >> >> I guess it's a matter of what one's use to, but to me talking of "way size" >> and how the calculation is described is, well, unusual. What I would start >> from is the smallest entity, i.e. a cache line. Then it would be relevant >> to describe how, after removing the low so many bits to cover for cache line >> size, the remaining address bits are used to map to a particular set. It >> looks to me as if you're assuming that this mapping is linear, using the >> next so many bits from the address. Afaik this isn't true on various modern >> CPUs; instead hash functions are used. Without knowing at least certain >> properties of such a hash function, I'm afraid your mapping from address to >> color isn't necessarily guaranteeing the promised isolation. The guarantee >> may hold for processors you specifically target, but then I think in this >> description it would help if you would fully spell out any assumptions you >> make on how hardware maps addresses to elements of the cache. > > You're right, we are assuming a linear mapping. We are going to review and > extend the documentation in order to fully specify when coloring can be > applied. > > About the "way size" it's a way of summarizing all the parameters into one. > We could ask for different cache parameters as you said, but in the end what > we are interested in is how many partitions is the cache capable of isolate > and how big they are. The answer is, in theory, as many partitions as the > number of sets, each one as big as a cache line, bacause we can't have > isolation inside a set. > Then memory mapping comes into place and the minimum granularity at which > mapping can happen actually lowers the number of partitions. > To recap we can isolate: > nr_sets * line_size / page_size > Then we simply named: > way_size = nr_sets * line_size > Another way of computing it: > way_size = cache_size / nr_ways > > We are ok with having two parameters: cache_size and nr_ways which are even > easier and intuitive to find for a normal user. Right, that's the aspect I was actually after. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |