I saw something in linux kernel "if(likely( .... ))" when I search on it I found some thing called "Branch prediction".
So, how can I make my jumps more "predicable" for cpu ?
Posted on 2007-08-17 17:48:33 by Dite
From C, there are a few guidelines as far as branch prediction performance:

  • reduce the number of branches

  • for branches that may occur repeatedly (e.g. in or part of a loop), try to avoid having these branches taken close to 50% of the time

  • when using if, including chains of else if, put the most likely conditions first

  • use switch for long chains of comparing the same variable against constant integers, especially consecutive integers

In the first case, removing a branch means that it can't be mispredicted, avoiding the chance of the penalty for mispredicting.  :lol:  It also increases the chance of a branch being in the Branch Target Buffer (BTB, see second case), since it has a limited number of entries.

In the second case, the BTB keeps track of a (very rough) percentage of times that a branch is taken, and if the percentage is above 50%, it's predicted as being taken, and if below, it's predicted as being not taken.  There are also two or more levels of prediction (strongly predicted vs. weakly predicted) depending on the percentage.

In the third case, if a branch is not in the BTB, there is a specific prediction scheme used (neglecting speculative execution): If a branch is forward (an "if" statement), it is predicted as not being taken (going inside the "if" statement).  If a branch is backward (a loop), it is predicted as being taken (looping back to the beginning of the loop).  Of course, unconditional branches are predicted to be taken.  Variable jumps (not present in C, except for some "switch" implementations) aren't predicted at all.  This suggests that if the most likely case in a chain of "if"/"else if" is first, it will predict that the case is true.  It also suggests that small loops (just a few iterations) should be unrolled.

In the last case, although variable jumps are difficult to predict, having a variable jump based on a table lookup is better than having a large number of mispredictions from a chain of "if"/"else if", so switch usually indicates to the compiler to try to do a table or tree lookup for the jump instead of trying each condition sequentially.  When the code is not something that works with switch but that still contains a big string of "if"/"else if", you can still often make a binary search tree in the code of if/else, however, this will usually make the branches be taken close to 50% of the time, so it's not always a silver bullet.

From an assembly point of view it's a bit easier to see some of these things (well, it's at least easier to follow), but luckily this is one thing at which you can often trick the compiler into doing a good job.

I hope that helps explain it. Cheers! :)

Edit: minor clarification on last full paragraph
Posted on 2007-08-18 01:00:58 by hackulous
There's some "branch hint" instructions added in recent CPUs - in fact they're constructed with segment overrides, so they have no effect on older CPUs. Google :)

I wonder if the "if(likely(...))" is some GCC extension?
Posted on 2007-08-18 10:19:20 by f0dder

There's some "branch hint" instructions added in recent CPUs - in fact they're constructed with segment overrides, so they have no effect on older CPUs. Google :)

I never really looked into them much since they seemed kind of cryptic.  Are they easier to use than just reorganizing the jump directions for the BTB initial guess?

I wonder if the "if(likely(...))" is some GCC extension?

That's a neat idea.  It would be a neat way to indicate to the compiler whether the condition is most likely true so that it can try to do these optimizations a bit.

It might also just be some function somewhere else in the code.  Putting such an "if" around a very long check for whether the "real" condition is true could save a lot of time.
Posted on 2007-08-18 14:13:11 by hackulous
Thanks guys.

There's some "branch hint" instructions added in recent CPUs - in fact they're constructed with segment overrides, so they have no effect on older CPUs. Google :)

I wonder if the "if(likely(...))" is some GCC extension?

I've rechecked that I found this.
#define likely(x) __builtin_expect(!! ( x ), 1)
#define unlikely(x) __builtin_expect(!! ( x ), 0)
It's definetly a gcc ext.

And I googled "branch hint" and found some implemention.
2Eh - hint that the branch will not occur most of the time.
3Eh - hint that the branch will occur most of the time.

Posted on 2007-08-19 10:18:59 by Dite
Intel provides branch-prediction instructions for its Pentium-class processors, though there are no mnemonics for them. To indicate a likely branch in your code, precede it with this:

db 0x3E ; (or "db 03Eh", however you prefer to write hex numbers)

To indicate an unlikely taken branch, precede it with this:

db 0x2E ; (or "db 02Eh")

Basically, you are just declaring anonymous bytes with the afforementioned values. Not exactly obvious in a quick scan over code. It would probably promote clarity to #define those as macros... something like NEXT_JMP_LIKELY and NEXT_JMP_UNLIKELY, or something more cryptic, but assembly-ish, like NJL and NJU.

Modern Intel processors always assume that backward jumps will be taken and forward jumps not, so you really only need include these when the reverse is true for a particular section of code. In other words, if you have a backward jump that is likely to be taken, just let the processor handle it. Preceding it with "db 0x3E" will only force the processor to read that extra byte when it would have predicted correctly anyway.

If you are programming in C, there is a good chance that your compiler allows for inline assembly. Consult your compiler's manual on how to do that, and then the rest is essentially the same. The only caveat there is now you need to learn what assembly your C compiler emits so you know where exactly to place the inline assembly. Do do that, of course, consult the compiler's manual.
Posted on 2007-08-31 15:16:53 by TheAbysmal
The branch prediction hints are done as segment overrides, so they decode as harmless no-ops on older processors... so you might be able to use them directly in your current assembler without DBs, depending on how flexible it is wrt. segment override syntax :)

Inline assembly == yuck... and for MS 64bit compilers, it's totally unavailable. If you have routines worthy of optimizing, better write them using an external assembler anyway.
Posted on 2007-08-31 15:42:17 by f0dder