Hi all.

I am playing these days with some ideas about message handling procedures. Here is the last I write. The main idea is to avoid using of cmp/je series to determine what message we have and where to go to handle it, because this cost approx. 10..13 bytes per message whatevery we use "cmp , WM_SOMETHING" or "cmp eax, WM_SOMETHING". My method use table with 4 bytes per message and subroutine 28 bytes long. Only restriction is that branch must be in bounds of 64kbytes.

I am not very strong in speed optimizations and any opinion about this approach: speed, cache, branch prediction, possible hidden problems, etc. will be very helpfull.

The source is in FASM format but I think it is readable for everyone.



macro MessageList default, [message, msghandler] {
forward
dw message, msghandler - $ - 2
common
dw 0, default - $ - 2
}


;------------------------------------------------------
; proc JumpTo
; Searches for the message in the message table.
; Arguments:
; eax - message number to search in the table.
; Returns:
; ebx - address of the handler.
; Uses:
; ebx, edx - there is no need to preserve ebx,
; because this procedure is for use
; in message handlers where ebx is
; preserved on the input.
;------------------------------------------------------
JumpTo:
mov edx, [esp] ; get table address
add esp, 4 ; remove return address from stack.
.loop:
mov ebx, [edx]
add edx, 4
test bx,bx
jz .exit

cmp bx, ax
jne .loop

.exit:
shr ebx, 16
add ebx, edx
jmp ebx

[b]Using:[/b]
proc SizerWinProc, hwnd, wmsg, wparam, lparam
enter
push esi edi ebx

mov eax, [wmsg]
call JumpTo
MessageList .default, \
WM_CREATE, .create, \
WM_DESTROY, .destroy, \
WM_MOVE, .move, \
WM_PAINT, .paint, \
WM_MOUSEMOVE, .mousemove, \
WM_LBUTTONDOWN, .buttondown, \
WM_LBUTTONUP, .buttonup, \
WM_SETCURSOR, .setcursor
....................
Posted on 2003-09-25 00:45:00 by JohnFound
Hi, JohnFound
Cool idea to use 16 bit address :)

imho, optimizing calls to Win API for speed is waste of time :confused:

May be just avoid using of 16bit opcodes
And some aligment for table would be nice if it is big in size.
JumpTo:

mov edx, [esp] ; get table address
add edx, 3
and edx, -4

.......

call JumpTo
align 4
MessageList .default,
Posted on 2003-09-25 03:46:26 by S.T.A.S.
Hi S.T.A.S.

About Align, I simply don't know. Actually aligning is speed optimization... What will be the advantage of using align?
Posted on 2003-09-25 04:01:34 by JohnFound
OK, now one byte less in JumpTo procedure. :) Only difference is that default processing must be imediate after MessageList definition.
Come on, I actually need some criticism. Or you think all I write is masterpiece? :grin:



macro MessageList [message, msghandler] {
forward
dw message, msghandler - $ - 2
common
dd 0
}

;------------------------------------------------------
; proc JumpTo
; Searches for the message in the message table.
; defined with MessageList macro.
; Arguments:
; eax - message number to search in the table.
;------------------------------------------------------
JumpTo:
mov edx, [esp] ; get table address
add esp, 4 ; remove return address from stack.
.loop:
mov ebx, [edx]
add edx, 4
test ebx,ebx
jz .exit

cmp bx, ax
jne .loop

.exit:
shr ebx, 16
add ebx, edx
jmp ebx

; How to use:

mov eax, [wmsg]
call JumpTo
MessageList \
WM_CREATE, .create, \
WM_DESTROY, .destroy, \
WM_NOTIFY, .notify

invoke DefWindowProc, [hwnd], [wmsg], [wparam], [lparam]
Posted on 2003-09-25 10:16:25 by JohnFound
Hi JohnFound

Alignment is usefull when we're reading dwords from memory, because CPU is optimized to read them from adreses like (0XXXXXXXXh & 0FFFFFFFC). Though, it can waste up to 3 bytes of memory
Posted on 2003-09-25 20:27:16 by S.T.A.S.
The Intel Optimization manual says that "significant perfomance penalties" can be incurred if data is misaligned:
Assembly/Compiler Coding Rule 17
Align data on natural operand size address boundaries. If the data will be accessed with vector instruction loads and stores, align the data on 16 byte boundaries.

For best performance, align the data as follows:
    [*]Align 8-bit data at any address
    [*]Align 16-bit data to be contained within an aligned four byte word
    [*]Align 32-bit data so that it's base address is a multiple of four
    [*]Align 64-bit data so that it's base address is a multiple of eight
    [*]Align 80-bit data so that it's base address is a multiple of sixteen
    [*]Align 128-bit data so that it's base address is a multiple of sixteen
    A 64-byte or greater data structure or array should be aligned so that it's base address is a multiple of 64. Sorting data in decreasing size order is one heuristic for assisting with natural alignment. As long as 16-byte boundaries (and cache lines) are never crossed, natural alignment is not strictly necessary, though it is an easy way to enforce this.
Posted on 2003-09-25 20:53:54 by donkey
If you process many messages ( this ) method will only cost a couple bytes more and maybe faster, but the table is a little harder to generate. If you are optimizing for size:
        mov     edx, [esp] ; get table address

add esp, 4 ; remove return address from stack.
...is the same as POP EDX.
Posted on 2003-09-25 21:13:13 by bitRAKE
Since the dispatch code is in a subroutine, the cost of the method is 4 bytes per message + 8 bytes per wndproc. The subroutine is a fixed (constant) overhead sizewise.

Here is another search table approach...
  .code

mov eax,uMsg ; make the message ID a sentinel
mov jtable,eax ; by storing it at start of table
mov esi,offset end_jtable ; point past end of table
jloop:
sub esi,8 ; search backwards
cmp eax,[esi] ; check for matching message number
jne jloop ; repeat if not a match
jmp dword ptr [esi+4] ; dispatch to message handler

.data
jtable label dword
dd 0, doDefault ; message ID set before search starts
dd WM_DESTROY, doDestroy
dd WM_CLOSE, doClose
end_jtable equ $
The cost is 13 bytes of setup, 7 bytes in the loop, 3 bytes of dispatch, 8 bytes per message.
Posted on 2003-09-25 22:27:23 by tenkey

The Intel Optimization manual says that "significant perfomance penalties" can be incurred if data is misaligned:


Donkey, you are right, i read this too. But I wondering is this is reasonable in message processing, because it is slow process in generaly. Did, someone have any info, based on benchmarking/tests, how the speed of message handlers affects performance of the application in Windows. I can't figure out how to measure the performance in this case. Any ideas?

bitRAKE: Of course you are right about "pop edx" and I am stupid here. :( The good news is that now JumpTo is 22 bytes only. :) About binary tree - you know, usually we have max 20..30..40 messages to process. Do you think it is reasonable to use this method?
Posted on 2003-09-25 22:27:40 by JohnFound

About binary tree - you know, usually we have max 20..30..40 messages to process. Do you think it is reasonable to use this method?
No, it is not reasonable - as I have stated in that thread. Just think of the cache misses in normal use. Some testing has to be done to see where the turn over point is - how many messages would it take to make branching a faster method. And it would be different for each CPU!
Posted on 2003-09-25 22:35:08 by bitRAKE
With some registers use change and string operations - 18 bytes for JumpTo procedure.
;------------------------------------------------------

; proc JumpTo
; Searches for the message in the message table.
; Arguments:
; ebx - message number to search in the table.
; Uses:
; eax, esi
;------------------------------------------------------
JumpTo:
pop esi ; get table address and remove return
; address from the stack.
.loop:
lodsd
test eax,eax
jz .exit

cmp bx, ax
jne .loop

shr eax, 16
add esi, eax
.exit:
jmp esi


bitRAKE, I am agree that I have to make some tests, but I can't figure out how to test the message processing speed for normal application. I don't want to make some mass message sending and to make time measure, because this will not give real results. I can make two variants of the same program - one with classic and one with table message processing, but how to test?

Regards.
Posted on 2003-09-25 23:50:54 by JohnFound
To build the test you can "record" a real message loop and "play back" the values to the test routines. Another option would be to generate the test data based on a computational model (could include random bits).
Posted on 2003-09-26 00:53:42 by bitRAKE
I can't understand your Balkan logic...
Are you an expert or just a newbie?

"But I wondering is this is reasonable in message processing, because it is slow process
in generally.
"

It seems you are an expert here


"Did, someone have any info, based on benchmarking/tests, how the speed of message
handlers affects performance of the application in Windows.
I can't figure out how to measure the performance in this case. Any ideas?"

Here you just deny that "it is slow process in generally" and
it seems you are not an expert here


"The good news is that now JumpTo is 22 bytes only."

Who cares that your JumpTo is blab, bla bytes only
What about blah, bla bytes for whole proc
Where is pop esi edi ebx ret and why you need proc and enter


Using:
proc SizerWinProc, hwnd, wmsg, wparam, lparam
enter
push esi edi ebx

mov eax, [wmsg]
call JumpTo
MessageList .default,


"My method use table with 4 bytes per message and subroutine 28 bytes long."

Why do you use esi, edi,ebx ; preserve and restore them later
May be to use later lodsd (1 byte) and it is "your method" for
"size optimization"


"Or you think all I write is masterpiece? "
No, it is just a "lame" code
I assume you play with code for fun only and will be better
to search the forum for better examples

Here is my "master" code:


WndProc:
mov eax, [esp+2*4] ; eax=uMsg stack param
cmp eax, bigMess ; bigMess = message with
sbb edx, edx ; the biggest number +1
and eax, edx ; The dword ptr [TableProc+0*4] is a
jmp dword ptr [TableProc+eax*4] ; pointer to DefWindowProc


"Come on, I actually need some criticism."

Is it enough criticism for you?
Posted on 2003-09-26 18:14:04 by lingo12