I am not very good at timing different code snips.
Given this perhaps someone already knows which is faster.

I would like a brief comparison (no testing verbal is fine) on
if a properly coded jump table or cmp MSG je combo is prefered
where speed is concerned in WndProcs...

I was curious as im starting to try and spread myself thin again :grin:
Only this time im concentrating on SBB for now. Interesting
instruction that one. Lots of possibilities.

Ive tried to reduce it as much as I could but perhaps it can be refined
more. And of course I am curious as to how a loop does instead of
multiple jumps and mispredicted branches



.data
Messages dd 0, WM_DESTROY, WM_LBUTTONDOWN
JumpTable dd OFFSET WP_DEFAULT, OFFSET WP_DESTROY, OFFSET WP_LBUTTONDOWN
.code

; #########################################################################

WndProc proc hWnd:DWORD, uMsg:DWORD, wParam:DWORD, lParam:DWORD
xor edx, edx ; edx = 0
mov ecx, 2 ; ecx = number of messages

start:
mov eax, uMsg ; eax = uMsg
sub eax, [Messages+ecx*4] ; eax = 0 if uMsg is in Messages table
cmp eax, 1 ; copy zero flag to carry flag
sbb eax, eax ; eax = 0FFFFFFFFh if uMsg is in Messages table, 0 otherwise
and eax, ecx ; eax = index into jump table uMsg is defined
add edx, eax ; store offset in edx
loop start

lea eax, [JumpTable+edx*4]
jmp DWORD PTR [eax]

WP_DEFAULT::
invoke DefWindowProc, hWnd, uMsg, wParam, lParam
ret
WP_LBUTTONDOWN::
<<<CODE HERE>>>
ret
WP_DESTROY::
invoke PostQuitMessage, 0
xor eax, eax
ret
WndProc endp

; #########################################################################
Posted on 2004-06-16 19:28:18 by Graebel
A jumptable is constant-time. You get the target address immediately from the table. So no matter how many different cases you have, you always require the same time: only one lookup from the table, and one jump.

A set of cmp/jmps is logarithmic time at best, if you construct it as a weighed binary tree. So for N cases, you require log N/log 2 cmp/jmp pairs.

The 'downside' to a jumptable is that the CPU cannot predict where the code will jump to until the address is actually loaded. So it will have to stall until it is known. Effectively this is a mispredicted brance.
But since it only happens once, no matter how many cases you have, it's still hard to beat.
Another potential problem is that the table is not cached, so you get a cache-miss aswell. But again this can only happen once, regardless of how many cases you have.
The cmp/jmp method can be predicted, if you have the same case multiple times in a row. But as soon as you get another case, you will have at least one mispredicted branch, and there are more branches to take in general, so there is more code to be processed until you reach the target.

In short, nothing can beat a jumptable :)
But the logarithmic cmp/jmp method is a good alternative when the table required would be too large to store.
Posted on 2004-06-17 02:15:32 by Scali
The jump-table algorithm shown is obviously not constant time. It is of worst-case linear order complexity O(N), where N is the number of messages in the table.

I may be wrong, but your WndProc's message deduction isn't going to help your application's efficiency greatly. It may make your code easier to maintain, though. How about using a call-table?
Posted on 2004-06-17 07:33:43 by death
Oh yea, I didn't really look at the code. This one uses a loop, which is rather useless :P
You just want to do:call
That's the whole idea.
Posted on 2004-06-17 10:23:35 by Scali
Nods, I know the absolute fastest would the direct table lookup. Mainly
I was just playing around with SBB and the different flags.

I would like something better than the cmp je combos, but also I dont
see where wasting 4096 some odd bytes for a complete lookup table
is justified. At the same time I thought I would see if there was a happy
medium between some sort of lookup and table size.

Perhaps there is a better algo out there for a direct lookup but use a
smaller table... perhaps a fast hash of sorts, right now im not sure. Its
just nagging at me... it just seems there should be a better solution it
just hasnt hit me yet.
Posted on 2004-06-17 11:30:44 by Graebel
Graebel, you could sort the table and use a binary search... or, "if you're one of those", order the table by message frequency. It's pretty irrelevant for a wndproc anyway, so I'd go for the shortest code.

Btw, you can use a single table of <msg,offset> pairs, and use reg*8+0 and reg*8+4. That way, you're guaranteed the offset part will be in the cache when you've found the msg - not that it matters that much :)
Posted on 2004-06-17 11:37:17 by f0dder
Well thanks much for the various inputs :)

I think I just found my happy medium. A duel table lookup. Saves space
and still quick. I save space by using a byte index table which is a forth
the size I would normally need. Wouldnt have thought of it without the
feedback.

Although I am not very happy I cant figure out how to patch the byte
table at compile time instead of run time... oh well, more research for me!



...
.data
ALIGN 4
TOffset dd 265 dup(0)
Table dd OFFSET WP_DEFAULT, OFFSET WP_DESTROY, OFFSET WP_LBUTTONDOWN

.code
...
; Patch the index jump table
lea eax, [TOffset+WM_DESTROY]
mov [eax], BYTE PTR 1
lea eax, [TOffset+WM_LBUTTONDOWN]
mov [eax], BYTE PTR 2
...

WndProc proc hWnd:DWORD, uMsg:DWORD, wParam:DWORD, lParam:DWORD
lea eax, [TOffset] ; eax = address of byte index table
xor ecx, ecx ; ecx = 0
add eax, uMsg ; adjust index
mov cl, BYTE PTR [eax] ; get table index
lea edx, [Table+ecx*4] ; edx = label address
jmp DWORD PTR [edx] ; jump to label

WP_DEFAULT::
ret
WP_LBUTTONDOWN::
ret
WP_DESTROY::
PostQuitMessage( 0 )
xor eax, eax
ret
WndProc endp
Posted on 2004-06-17 18:24:54 by Graebel