[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about partitioning shared cache in Xen



Hi Andrew,

Thank you very much for your quick reply!

2015-01-14 7:20 GMT-05:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
> On 14/01/15 00:41, Meng Xu wrote:
>> Hi,
>>
>> [Goal]
>> I want to investigate the impact of the shared cache on the
>> performance of workload in guest domain.
>> I also want to partition the shared cache via page coloring mechanism
>> so that guest domains can use different cache colors of shared cache
>> and will not have interference in the shared cache.
>>
>> [Motivation: Why do I want to partition the shared cache?]
>> Because the shared cache is shared among all guest domains (I assume
>> the machine has multicores sharing the same LLC. For example, Intel(R)
>> Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.),
>> the workload in one domU can interfere another domU's memory-intensive
>> workload on the same machine via shared cache. This shared-cache
>> interference makes the execution time of the workload in a domU
>> non-deterministic and increase a lot. (If we assume the worst case,
>> the worst-case execution time of the workload will be too
>> pessimistic.) A stable execution time is very important in real-time
>> computation when the real-time program, like the control program on
>> automobile, have to generate the result within a deadline.
>>
>> I did some quick measurements to show how shared cache can be used by
>> a holistic domain to interfere the execution time of another domain's
>> workload. I pin the VCPUs of two domains to different physical cores
>> and use one domain to pollute the shared cache. The result shows that
>> the shared-cache interference can make the execution time of another
>> domain's workload slow down by 4x. The whole experiment result can be
>> found at 
>> https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf
>>  . (The workload of the figure is a program reading a large array. I
>> run the program for 100 times and draw the latency of accessing the
>> array in a box plot. The first column with name "aloneâd1v1" is the
>> boxplot latency when the program in dom1 runs alone. The fourth column
>> "d1v1d2v1âpindiffcore" is the boxplot latency when the program in dom1
>> runs along with another program in dom2, and these two domains uses
>> different cores. dom1 and dom2 have 1 vcpu with budget equal to
>> period. The scheduler is credit scheduler.)
>>
>> [Idea of how to partition the shared cache]
>> When a PV guest domain is created, it will call xc_dom_boot_mem_init()
>> to allocate memory for the domain, which finally calls
>> xc_domain_populate_physmap_exact() to allocate memory pages from
>> domheap in Xen.
>> The idea of partitioning the share cache is as follows:
>> 1) xl tool change: Add an option in domain's configuration file which
>> specifies which cache colors this domain should use. (I have done this
>> and when I use xl create --dry-run, I can see the parameters are
>> parsed to the build information.)
>> 2) hypervisor change: Add another hypercall
>> xc_domain_populate_physmap_exact_ca() which has one more parameter,
>> i.e, the cache colors this domain should use. I also need to reserve a
>> memory pool which sort the reserved memory pages based on its cache
>> color.
>>
>> When a PV domain is created, I can specify the cache colors it uses.
>> Then the xl tool will call the xc_domain_populate_physmap_exact_ca()
>> to only allocate the memory pages with the specified cache colors to
>> this domain.
>>
>> [Quick implementation]
>> I attached my quick implementation patch at the end of this email.
>>
>> [Issues and Questions]
>> After I applied the patch to  Xen's commit point
>> 36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine,
>> dom0 cannot boot up.:-(
>> The error message from dom0 is:
>> [    0.000000] Kernel panic - not syncing: Failed to get contiguous
>> memory for DMA from Xen!
>>
>> [    0.000000] You either: don't have the permissions, do not have
>> enough free memory under 4GB, or the hypervisor memory is too
>> fragmented! (rc:-12)
>>
>> I tried to print every message in the function I touched in order to
>> figure out where goes wrong but failed. :-(
>> The thing I cannot understand is that: My implementation haven't
>> reserve any  memory pages in the cache-aware memory pool before the
>> system boots up. Basically, every function I modified haven't been
>> called before the system boots up. But the system crashes. :-( (The
>> system can boot up and work perfectly before applying my patch.)
>>
>> I really appreciate it if any of you could point out the part I missed
>> or misunderstood. :-)
>
> The error message is quite clear.  I presume that your cache
> partitioning algorithm has prevented dom0 from getting any
> machine-contiguous pages for DMA.  This prevents dom0 from using any
> hardware, such as its disks or the network.

Actually, I didn't partition the shared cache for dom0. dom0 should
have a continuous memory as before.

I didn't modify any function in the existing buddy allocator (so that
other functionalities in Xen that depends on those functions won't be
affected). I just first copy those memory allocation functions and
modify it to allocate the memory pages for a specific cache color.

The reason why I don't think dom0 has non-continous memory is because
none of the memory allocation functions ( except the function that
init the memory heaps for the buddy allocator) I added was called. I
add a printk in every function I added, but none of them is called. So
I think dom0 won't have non-continous memory.
Did I miss something here?


>
> What I don't see is how you plan to isolate different colours in a
> shared cache.  I am guessing (seeing as the patch is full of debugging
> and hard to follow) that you are using the low order bits in the
> physical address to identify the colour, which will indeed prevent any
> continuous allocations from happening.  Is this what you are attempting
> to do?

Yes. I try to use the bits [A16, A12] to isolate different colors in a
shared cache. A 2MB 16-way associate shared cache uses [A16, A6] to
index the cache set. Because page size is 4KB, we have page frame
number's bits [A16, A12] overlapped with the bits used to index a
shared cache's cache set. So we can control those [A16, A12] bits to
control where the page should be placed. (The wiki pages about page
coloring is here: http://en.wikipedia.org/wiki/Cache_coloring)

However, I won't partition the shared cache for dom0, which is usually
the administrative domain. I only partition the shared cache for guest
domains whose configuration file (used to pass to 'xl create') has the
option cache_colors = ["color id 1", "color id 2"], which indicate
which cache colors in the shared cache this domain should use.


If the patch has too many debug informations and cause you trouble to
read it. I can attach the version of patch that has no such annoying
debug information. Is that ok?

Thank you very much again!

Best,

Meng

>
> ~Andrew
>



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.