[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Question about the CAT and CMT in Xen



On Tue, Sep 01, 2015 at 09:51:57PM -0400, Meng Xu wrote:
> Hi Andrew and Chao,
> 
> [Important things go first] It turns out my machine (Intel E5-2618L
> v3) does have CAT capability!
> Xen gives the false alarm that my machine does not have it. This
> should be a bug, IMO. :-)

Even some Haswell Servers do support CAT, but it's usually
model-specific and the feature is not enumerated in a standard way as
listed in Intel SDM. Xen now however only takes care of the standard
enumeration so your case is not detected.

This could be done by adding cpu model check in Xen code or even use
updated firmware, which is I prefered.

> 
> 2015-09-01 10:42 GMT-04:00 Meng Xu <xumengpanda@xxxxxxxxx>:
> > 2015-09-01 10:30 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
> >> On 01/09/15 15:20, Meng Xu wrote:
> >>> 2015-09-01 9:04 GMT-04:00 Andrew Cooper <andrew.cooper3@xxxxxxxxxx>:
> >>>> On 01/09/15 13:55, Meng Xu wrote:
> >>>>> 2015-09-01 1:47 GMT-04:00 Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx>:
> >>>>>> On Mon, Aug 31, 2015 at 04:09:31PM -0400, Meng Xu wrote:
> >>>>>>> I looked into the xen/arch/x86/psr.c and found that the function
> >>>>>>> cat_cpu_init() just returned without initializing the variable
> >>>>>>> "cat_socket_enable".
> >>>>>>>
> >>>>>>> Both  !cpu_has(c, X86_FEATURE_CAT) and c->cpuid_level <
> >>>>>>> PSR_CPUID_LEVEL_CAT are evaluated as 1 inside the function
> >>>>>>> cat_cpu_init().
> 
> I'm thinking this check could be wrong for Intel E5-2618L v3. It
> should work on Chao's machine but not on mine. There should be a
> better way to check this probably. :-)
> 
> ---
> 
> I used another way to check the CAT capability, as suggested by Priya
> (cc.ed) from Intel.
> I did the following steps as Priya suggested:
> 1. Download msr-tools utility on your linux distribution to perform
> msr read write operations./ if you already have it installed modprobe
> msr
> 2. rdmsr 0xc91
> which returns 0xfffff
> 3. wrmsr -p 1 0xc91 0xf
> which does not return anything
> 4. wrmsr -p 1 0xc8f 0x100000000
> which does not return anything
> 5. rdmsr 0xc91
> which returns 0xf
> 
> This shows that the CPU does have the MSRs that are used for CAT.
> 
> I also run the CAT tools on Linux provided by Intel, which can be
> downloaded at 
> https://01.org/packet-processing/cache-monitoring-technology-memory-bandwidth-monitoring-cache-allocation-technology.
> It shows me that I can set up different cache partitions for different COS.
> Basically, the pre-configured cache setting provided by the CAT tools
> work perfectly on my machine. :-)
> 
> ------
> OK. That just check the registers are there and tools do not return
> error. It may still not work, right? :-)
> Well, I also did some performance evaluation by running a small simple
> benchmark I wrote.
> 
> The benchmark task sequentially access a 6MB array;
> 
> I run the benchmark on core 0 in the following scenarios:
> 
> Scenario 1): Core 0 is allocated for 8MB cache with CAT, the latency
> of accessing the 6MB array is around 5.5M cycles;
> Scenario 2): Core 0 is allocated for 4MB cache with CAT, the latency
> of accessing the 6MB array is around 16.9M cycles.
> The slowdown in scenario 2) is 16.9M / 5.5M ~=3x.
> 
> ------ISSUES-------
> I tried to run some noisy neighbors on another core to see how good
> the LLC isolation CAT can provide, but found some *weird result*.
> I run the benchmark task on core 0 and the noisy neighbor that access
> 20MB array on core 1;
> These two cores are configured to have *different* cache areas: core 0
> has  8MB cache, core 1 has 4MB cache;
> These two cores are in two isolated cpuset. No other tasks runs on
> these two cores.
> If I run the benchmark alone, the latency is around 5.5M cycles;
> but if run the benchmark along with the noisy neighbor, the latency
> *decreases* to 4.9M cycles.
> 
> I double checked that the Turbo Boost is disabled by checking the MSR
> value with the following command:
>     rdmsr -pi 0x1a0 -f 38:38
>     1=disabled
>     0=enabled
>     it returns 1.
> I also disabled the cache prefetch in BIOS.
> 
> Now I'm very confused. How come the latency decreases when a noisy
> neighbor is running. It seems that the noisy neighbor may help some
> hardware/software prefetcher to prefetch the data for the benchmark.
> But right now, I couldn't think out any other prefetchers that may
> cause this...
> The benchmark and the noisy neighbor are independent and don't share
> the array data.

Did you reboot the machine between your two tests and are the two cores
you used in the same socket?

Thanks,
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.