The reason
why I want to know this information from hardware performance
counter is because I want to know the interference among each
domains when they are running.
In addition,
when we measure the latency of accessing a large array, the
result is out of our expectation. We increase the size of an
array from 1KB to 12MB, which covers the L1(32KB), L2(256KB)
and L3(12MB) cache size. We expect that the latency of
accessing the whole array should have clear cut at around
32KB, 256KB and 12MB because the latency of L1 L2 and L3 are
several times different.
However, we
saw the latency does not increase much when the array size is
larger than the size of L1, L2, and L3. It's weird because if
we run the same task in Linux on bare machine, it is the
expected result.