|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 5/5] x86/ioapic: Drop function pointers from __ioapic_{read,write}_entry()
On 12/11/2021 10:43, Jan Beulich wrote: On 11.11.2021 18:57, Andrew Cooper wrote:Function pointers are expensive, and the raw parameter is a constant from all callers, meaning that it predicts very well with local branch history. The way the compiler lays out the code is unrelated to why this form is an improvement. Branch history is a function of "the $N most recently taken branches". This is because "how you got here" is typically relevant to "where you should go next". Trivial schemes maintain a shift register of taken / not-taken results. Less trivial schemes maintain a rolling hash of (src addr, dst addr) tuples of all taken branches (direct and indirect). In both cases, the instantaneous branch history is an input into the final prediction, and is commonly used to select which saturating counter (or bank of counters) is used.
Consider something like
while ( cond )
{
memcpy(dst1, src1, 64);
memcpy(dst2, src2, 7);
}
Here, the conditional jump inside memcpy() coping with the tail of the
copy flips result 50% of the time, which is fiendish to predict for.
However, because the branch history differs (by memcpy()'s return address which was accumulated by the call instruction), the predictor can actually use two different taken/not-taken counters for the two different "instances" if the tail jump. After a few iterations to warm up, the predictor will get every jump perfect despite the fact that memcpy() is a library call and the branches would otherwise alias. Bringing it back to the code in question. The "raw" parameter is an explicit true or false at the top of all call paths leading into these functions. Therefore, an individual branch history has a high correlation with said true or false, irrespective of the absolute code layout. As a consequence, the correct result of the prediction is highly correlated with the branch history, and it will predict perfectly[1] after a few times the path has been used. ~Andrew[1] Obviously, it's not actually perfect outside of a synthetic example. Aliasing in the predictor is a necessary property of keeping the logic small enough to provide an answer fast, but the less accidental aliasing there is, the faster the CPU performance in benchmarks, so incentives are in our favour here.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |