[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Question about partitioning shared cache in Xen
Hi Andrew, Thank you very much for your quick reply! 2015-01-14 7:20 GMT-05:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>: > On 14/01/15 00:41, Meng Xu wrote: >> Hi, >> >> [Goal] >> I want to investigate the impact of the shared cache on the >> performance of workload in guest domain. >> I also want to partition the shared cache via page coloring mechanism >> so that guest domains can use different cache colors of shared cache >> and will not have interference in the shared cache. >> >> [Motivation: Why do I want to partition the shared cache?] >> Because the shared cache is shared among all guest domains (I assume >> the machine has multicores sharing the same LLC. For example, Intel(R) >> Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.), >> the workload in one domU can interfere another domU's memory-intensive >> workload on the same machine via shared cache. This shared-cache >> interference makes the execution time of the workload in a domU >> non-deterministic and increase a lot. (If we assume the worst case, >> the worst-case execution time of the workload will be too >> pessimistic.) A stable execution time is very important in real-time >> computation when the real-time program, like the control program on >> automobile, have to generate the result within a deadline. >> >> I did some quick measurements to show how shared cache can be used by >> a holistic domain to interfere the execution time of another domain's >> workload. I pin the VCPUs of two domains to different physical cores >> and use one domain to pollute the shared cache. The result shows that >> the shared-cache interference can make the execution time of another >> domain's workload slow down by 4x. The whole experiment result can be >> found at >> https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf >> . (The workload of the figure is a program reading a large array. I >> run the program for 100 times and draw the latency of accessing the >> array in a box plot. The first column with name "aloneâd1v1" is the >> boxplot latency when the program in dom1 runs alone. The fourth column >> "d1v1d2v1âpindiffcore" is the boxplot latency when the program in dom1 >> runs along with another program in dom2, and these two domains uses >> different cores. dom1 and dom2 have 1 vcpu with budget equal to >> period. The scheduler is credit scheduler.) >> >> [Idea of how to partition the shared cache] >> When a PV guest domain is created, it will call xc_dom_boot_mem_init() >> to allocate memory for the domain, which finally calls >> xc_domain_populate_physmap_exact() to allocate memory pages from >> domheap in Xen. >> The idea of partitioning the share cache is as follows: >> 1) xl tool change: Add an option in domain's configuration file which >> specifies which cache colors this domain should use. (I have done this >> and when I use xl create --dry-run, I can see the parameters are >> parsed to the build information.) >> 2) hypervisor change: Add another hypercall >> xc_domain_populate_physmap_exact_ca() which has one more parameter, >> i.e, the cache colors this domain should use. I also need to reserve a >> memory pool which sort the reserved memory pages based on its cache >> color. >> >> When a PV domain is created, I can specify the cache colors it uses. >> Then the xl tool will call the xc_domain_populate_physmap_exact_ca() >> to only allocate the memory pages with the specified cache colors to >> this domain. >> >> [Quick implementation] >> I attached my quick implementation patch at the end of this email. >> >> [Issues and Questions] >> After I applied the patch to Xen's commit point >> 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine, >> dom0 cannot boot up.:-( >> The error message from dom0 is: >> [ 0.000000] Kernel panic - not syncing: Failed to get contiguous >> memory for DMA from Xen! >> >> [ 0.000000] You either: don't have the permissions, do not have >> enough free memory under 4GB, or the hypervisor memory is too >> fragmented! (rc:-12) >> >> I tried to print every message in the function I touched in order to >> figure out where goes wrong but failed. :-( >> The thing I cannot understand is that: My implementation haven't >> reserve any memory pages in the cache-aware memory pool before the >> system boots up. Basically, every function I modified haven't been >> called before the system boots up. But the system crashes. :-( (The >> system can boot up and work perfectly before applying my patch.) >> >> I really appreciate it if any of you could point out the part I missed >> or misunderstood. :-) > > The error message is quite clear. I presume that your cache > partitioning algorithm has prevented dom0 from getting any > machine-contiguous pages for DMA. This prevents dom0 from using any > hardware, such as its disks or the network. Actually, I didn't partition the shared cache for dom0. dom0 should have a continuous memory as before. I didn't modify any function in the existing buddy allocator (so that other functionalities in Xen that depends on those functions won't be affected). I just first copy those memory allocation functions and modify it to allocate the memory pages for a specific cache color. The reason why I don't think dom0 has non-continous memory is because none of the memory allocation functions ( except the function that init the memory heaps for the buddy allocator) I added was called. I add a printk in every function I added, but none of them is called. So I think dom0 won't have non-continous memory. Did I miss something here? > > What I don't see is how you plan to isolate different colours in a > shared cache. I am guessing (seeing as the patch is full of debugging > and hard to follow) that you are using the low order bits in the > physical address to identify the colour, which will indeed prevent any > continuous allocations from happening. Is this what you are attempting > to do? Yes. I try to use the bits [A16, A12] to isolate different colors in a shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to index the cache set. Because page size is 4KB, we have page frame number's bits [A16, A12] overlapped with the bits used to index a shared cache's cache set. So we can control those [A16, A12] bits to control where the page should be placed. (The wiki pages about page coloring is here: http://en.wikipedia.org/wiki/Cache_coloring) However, I won't partition the shared cache for dom0, which is usually the administrative domain. I only partition the shared cache for guest domains whose configuration file (used to pass to 'xl create') has the option cache_colors = ["color id 1", "color id 2"], which indicate which cache colors in the shared cache this domain should use. If the patch has too many debug informations and cause you trouble to read it. I can attach the version of patch that has no such annoying debug information. Is that ok? Thank you very much again! Best, Meng > > ~Andrew > -- ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |