[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities



On Sat, Apr 04, 2015 at 04:14:15AM +0200, Dario Faggioli wrote:
> Hi Everyone,
> 
> This RFC series is the outcome of an investigation I've been doing about
> whether we can take better advantage of features like Intel CMT (and of PSR
> features in general). By "take better advantage of" them I mean, for example,
> use the data obtained from monitoring within the scheduler and/or within
> libxl's automatic NUMA placement algorithm, or similar.
> 
> I'm putting here in the cover letter a markdown document I wrote to better
> describe my findings and ideas (sorry if it's a bit long! :-D). You can also
> fetch it at the following links:
> 
>  * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf
>  * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown
> 
> See the document itself and the changelog of the various patches for details.

Very good summary and possible usage analysis. Most of the problems do
exist and some of them may be solved partially but some looks
unavoidable.

> 
> The series includes one Chao's patch on top, as I found it convenient to build
> on top of it. The series itself is available here:
> 
>   git://xenbits.xen.org/people/dariof/xen.git  wip/sched/icachemon
>   
> http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/icachemon
> 
> Thanks a lot to everyone that will read and reply! :-)
> 
> Regards,
> Dario
> ---
> 
> This is exactly what happens in the current implementation. Result looks as
> follows:
> 
>     [root@redbrick ~]# xl psr-cmt-attach 0
>     [root@redbrick ~]# xl psr-cmt-attach 1
>     Total RMID: 71
>     Name                                        ID        Socket 0        
> Socket 1        Socket 2        Socket 3
>     Total L3 Cache Size                                   46080 KB        
> 46080 KB        46080 KB        46080 KB
>     Domain-0                                     0         6768 KB            
> 0 KB            0 KB            0 KB
>     wheezy64                                     1            0 KB          
> 144 KB          144 KB            0 KB
> 
> Let's assume that RMID 1 (RMID 0 is reserved) is used for Domain-0 and RMID 2
> is used for wheezy64. Then:
> 
>     [root@redbrick ~]# xl psr-cmt-detach 0
>     [root@redbrick ~]# xl psr-cmt-detach 1
> 
> So now both RMID 1 and 2 are free to be reused. Now, let's issue the following
> commands:
> 
>     [root@redbrick ~]# xl psr-cmt-attach 1
>     [root@redbrick ~]# xl psr-cmt-attach 0
> 
> Which means that RMID 1 is now assigned to wheezy64, and RMID 2 is given to
> Domain-0. Here's the effect:
> 
>     [root@redbrick ~]# xl psr-cmt-show cache_occupancy
>     Total RMID: 71
>     Name                                        ID        Socket 0        
> Socket 1        Socket 2        Socket 3
>     Total L3 Cache Size                                   46080 KB        
> 46080 KB        46080 KB        46080 KB
>     Domain-0                                     0          216 KB          
> 144 KB          144 KB            0 KB
>     wheezy64                                     1         7416 KB            
> 0 KB         1872 KB            0 KB
> 
> It looks quite likely that the 144KB occupancy on sockets 1 and 2, now being
> accounted to Domain-0, is really what has been allocated by domain wheezy64,
> before the RMID "switch". The same applies to the 7416KB on socket 0 now
> accounted to wheezy64, i.e., most of this is not accurate and was allocated
> there by Domain-0.
> 
> This is only a simple example, others have been performed, restricting the
> affinity of the various domains involved in order to control on what socket
> cache load were to be expected, and all confirm the above reasoning.
> 
> It is rather easy to appreciate that any kind of 'flushing' mechanism, to be
> triggered when reusing an RMID (if anything like that even exists!) would
> impact system performance (e.g., it is not an option in hot paths), but the
> situation outlined above needs to be fixed, before the mechanism could be
> considered usable and reliable enough to do anything on top of it.

As I know, no such 'flushing' mechanism available at present. One
possible software solution to lighten this issue is rotating the RMIDs
with algorithm like 'use oldest unused RMID first'.

Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.