I remember seeing some code on how to branch execution without using any of the jump instructions (neither conditional nor unconditional).

I remember that the code used some sort of instruction that loaded the zero flag on the EAX reg, I think... it worked like lahf but for only one flag (not even sure it was the zero flag)...

anyway instead of doing
cmp bla, bla
jnz bla

he did
cmp bla, bla
;code that sets EAX/AX/AH to the value of one of the flags...
;from here on there are no jumps whatsoever, and yet he managed to branch

Do you guys know what I'm talking about?
I really need to find that code again... and whoever was explaining that was doing so very nonchalantly, like it was nothing really mysterious, just a convoluted way to avoid jumps.

Anyway I really need to know that, and if I could find that same page it'd be even better.
It was some sort of tutorial...
I googled a lot and couldn't find it.
Can anyone help?
Posted on 2009-09-22 18:57:22 by Cyborg
The setXX instructions are what you're looking for

.data
  funcTable dd func1,func2
.code
xor eax,eax
cmp var1,7
setle al
call funcTable

There's also the "push funcAddr | retn" trick to just jump.

These are a bunch of tricks that destroy any branch-prediction that could run the code faster (and that you could have facilitated,  to run even faster).
Posted on 2009-09-22 20:59:06 by Ultrano
Ultrano, thank you so much!
That was the exact instruction I read about!
You seem knowledgeable on that subject, can you direct me to where I can learn more of those tricks?
And does not using jump really speed things up significantly?
Isn't "call" just a push and a jump, and therefore it has a jump "embedded" in it?
Maybe I have to learn more about how the CPU tries to predict the branches in order to understand how using setle really saves time.

I want to code a regular expressions machine (DFA) and I was thinking that if I could get rid of all the jumps I could probably make it even faster...

(by the way, the push / retn was funny, I never thought about abusing ret like that :))
Posted on 2009-09-22 21:30:07 by Cyborg
Iirc, it's only these two tricks, or some slight variations of them.

Branch-prediction is detailed in cpu whitepapers. (there's a different document for each x86-based cpu: P4, c2d, sempron, phenom,...). SSE2 (or was it SSE3) adds prediction-hinting opcodes.

What you need in your regexp is probably just a table of function-pointers - this is if there would be a chain of 6+ cmp/jmp otherwise. You can also look into the cmovXX instructions and SSE comparison-masks.
Posted on 2009-09-22 22:07:53 by Ultrano
Great, thanks again!
Posted on 2009-09-23 01:48:12 by Cyborg
It sounds like what you're interested in is "conditional processing without branching", rather than "branching without jumping" :) - you can't fully avoid branching, but like Ultrano has shown there's some tricks for common situations, including arithmetic tricks that don't (directly) use the SETxx instructions.

You'll want to avoid push/ret since some CPUs have call/ret matching optimizations as part of branch prediction, and you'll penalize that by push/ret.

There's plenty of ways to branch without doing call/jmp/jcc (not that it helps anything wrt. branch prediction penalties) - dividing by zero is one, memory access is another (if you get a page fault), interrupt calls, ...
Posted on 2009-09-23 11:00:19 by f0dder
Yeah those spring to mind as the most common malware-oriented means by which to subvert heuristic analysis of execution path.
Lamentably, if its off-beat, if its not standard, and it does what you want, it smells like malware.
What's your agenda? Any reasoning behind your query aside from simple curiosity?
Posted on 2009-09-24 01:06:14 by Homer
Homer: if what he wants (as I think) is "conditional processing without branching", then the reason is most likely performance.
Posted on 2009-09-24 01:31:33 by f0dder
If you're clever, you can carefully mask EIP directly.
There's no reason to set up pagebreaks, or trap other kinds of exceptions.
The most clever methods I've seen have all involved SMC.
My personal favorite is to modify the return address of a call on the stack directly, manipulating EIP when we RET. I've used this in delta-driven (PC relative) code, without referring to any actual procedure.
And the more clever you are about it, the more it smells like malware.
"I was experimenting to see if it was faster than a jumptable" didn't get me far :P


Posted on 2009-09-24 02:24:27 by Homer
SMC = bad for performance these days, unless you do "generate once, run several".
Posted on 2009-09-24 02:27:31 by f0dder
As you can see, we don't need to write to the code segment to achieve it... so no performance penalty.
Posted on 2009-09-24 02:28:50 by Homer
"I was experimenting to see if it was faster than a jumptable" didn't get me far :P
I've used jump tables for things as diverse as windows message handling, character stream processing, virtual machine opcode processing and more - all non-malicious. Just saying :)


As you can see, we don't need to write to the code segment to achieve it... so no performance penalty.
I wouldn't call modifying EIP on stack "SMC" - and while it doesn't have the performance penalty of writing to your code section, it does have the penalty of disrupting CALL/RET branch prediction :)

Posted on 2009-09-24 02:36:03 by f0dder
There are tons of tricks you can do with clever bitmasking or making use of rollover and all that.
Check out this trick for example:
http://www.df.lth.se/~john_e/gems/gem0001.html
Or this:
http://www.df.lth.se/~john_e/gems/gem000f.html

And that's just vanilla 386 code. You can do even more stuff with MMX/SSE.
I once made an MMX routine based on this, which would get the smallest or largest number from an array of numbers, without a single jump (well, other than the loop itself).
Posted on 2009-09-24 03:30:57 by Scali

What's your agenda? Any reasoning behind your query aside from simple curiosity?

My agenda?
I was just thinking about the fastest way to have a regular expression machine.
So it was simple curiosity, really.
Mainly, I was just wondering, it there's something faster than jumps, then why doesn't the assembler/cpu/someone automatically use instruction-faster-than-jump instead of jumps?
Posted on 2009-09-27 15:51:38 by Cyborg
Mainly, I was just wondering, it there's something faster than jumps, then why doesn't the assembler/cpu/someone automatically use instruction-faster-than-jump instead of jumps?
Compilers and assembly programmers use the various tricks :)
Posted on 2009-09-27 16:05:22 by f0dder
There is a technique of branch elimination to remove the "IF" statements from your code. You create a bit mask in one of the registers using the carry flag as the seed. For example, to find the minimum of two numbers stored in eax and ebx:


; If b < a Then a = b
; a += ( ( b - a ) < 0 ? -1 : 0 ) & ( b - a )
sub ebx, eax
sbb ecx, ecx
and ecx, ebx
add eax, ecx


Or to find the absolute value of a number stored in eax:


; If a < 0 Then a = -a
; a = ( a ^ ( a < 0 ? -1 : 0 ) ) - ( a < 0 ? -1 : 0 )
mov ebx, eax
sar ebx, 31
xor eax, ebx
sub eax, ebx


Those are just two simple examples. I have written a binary search algorithm that didn't use any branching except for the loop so more complex algorithms are possible. I'm not sure if that's what you're looking for, but it seemed pretty close.

Spara
Posted on 2009-09-29 07:50:11 by Sparafusile
As I hinted previously, similar techniques can be used to manipulate EIP and 'branch without branching'.
Not that there's any performance benefit from such an approach...
Posted on 2009-09-29 08:09:23 by Homer

As I hinted previously, similar techniques can be used to manipulate EIP and 'branch without branching'.
Not that there's any performance benefit from such an approach...
If you manipulate EIP you are, by definition, branching :)
Posted on 2009-09-29 15:14:10 by f0dder