In the second example, you only need to pass the 4 parameters so that you have the right set of values for the default windows procedure, otherwise you would only pass the three parameters, hWnd, wParam and lParam.


I think your suggestion could be done but you will have the problem of not knowing what the user defined message is until runtime so you would have to create the second array dynamically and do the appropriate offset adjustments to it to get it to work. With the original array, you can in fact load it in a static table and fill out the values yourself for each message that is processed and fill the rest with the default processing address. This is possible because MASM knows the offset of each label at assemble time so if you put the label names into the table, you will get the correct results at runtime.

Its just that a 1024 loop of DWORD writes is fast enough anyway alongside the normal loading speed of a program so there is little point of doing the extra work. It will make the disk file larger by 4k as well by using a static table in the initialised data section.

Posted on 2002-12-02 16:48:20 by hutch--
maybe you are not getting me completly or it is the other way around(ie, i am not getting you completly)

The array size of the usermsg array will be known at compile time.
Actually that size will be about the only thing the user will have to set.
As my asm is not that gooso d i am posting a bit of code in C to explain what i mean-

[COLOR=green]//the user msg array [/COLOR]
LRESULT (*usermsg[maxmsgs]) (WPARAM,LPARAM);

[COLOR=green]//the windows msgs array [/COLOR]

[COLOR=red]typedef[/COLOR] LRESULT (*pFunc) (WPARAM,LPARAM);

[COLOR=blue]#define[/COLOR] DeclareUsrMsg(msgname,no)\
const int msgname = WM_USER+no

[COLOR=blue]inline void[/COLOR] SetUsrMsg(int msgname,pfunc func)
[COLOR=green]//if msg=WM_USER+1, then index=0 [/COLOR]
int index=msgname-WM_USER-1;


[COLOR=blue]inline[/COLOR] LRESULT DispatchMsg(HWND hwnd,UINT uMsg,WPARAM wp,LPARAM lp)
int index=uMsg-WM_USER-1;
return usermsg[index](wp,lp);
return winmsg[uMsg](hwnd,uMsg,wp,lp);

Both the arrays will of course have to be initialized with
To use this in your code it will be something like this-


[COLOR=green]//event handler func[/COLOR]
LRESULT mymsgfnc(HWND hwnd,UINT uMsg,WPARAM wp,LPARAM lp)
[COLOR=green] //handler code [/COLOR]
[COLOR=green]//somewhere in your code put this[/COLOR]

All the user has to take care of is to define this variable in his
code somewhere-
const int maxmsgs;
which in the above case would be -

[COLOR=blue]const int[/COLOR] maxmsgs=1;

Posted on 2002-12-03 07:42:24 by clippy

I chewed over your idea yesterday and put it together something like this.

WndProc proc etc ....

cmp DWORD PTR uMsg, 1023
je @F

jmp wpOut

; -- ------------------------------------------
; process messages with value about 1023 here
; --------------------------------------------



WndProc endp

This solves the problem of needing a far larger array and only adds 2 instructions to the dispatching code for applications that process messages about 1023.

Compliments on a good idea.

Posted on 2002-12-03 16:13:57 by hutch--
umm Hutch,
Maybe i am bit confused(yet again), but isnt that the same thing that i have done in my post above?
[COLOR=blue]if[/COLOR] (uMsg>WM_USER)

int index=uMsg-WM_USER-1;
return usermsg[index](wp,lp);
return winmsg[uMsg](hwnd,uMsg,wp,lp);

Also what about custom user msgs in your code?

Compliments on a good idea.
Thanks :)
Posted on 2002-12-03 22:31:27 by clippy
My testing with return values from the API "RegisterWindowMessage" have values high up in the range, > 50k and WM_USER starts at 1024 so all you need to do is trap the messages higher than 1023 in a normal cmp / je block and do the main range below with the dispatcher.

It of course depends on if the app uses any messages above 1023 which most don't. Most small apps have a very short sequence of compares anyway in the WndProc but a big pig with many messages to process could probably benefit from the dispatcher.

Posted on 2002-12-04 03:42:21 by hutch--
Don't see why you need the compare, as the two ranges are contiguous. Different window classes usually have different sets of handler routines - e.g., the standard window message WM_COMMAND.

If you had used WM_APP, which was mistakenly recommended at one time, then the compare makes sense, as there is a gap between WM_USER (0400h) and WM_APP (0800h).
Posted on 2002-12-04 12:38:28 by tenkey
Thats exacly what i was thinking. You algo is gonna be perfect for a window thats highly used , ie, probably the applications main window or something, or maybe inside WM_COMMAND msgs,but the dispatcher is gonna be overkill if provided for each and and every dialog box.
Also a more general use of your algo would be to apply it in any place where switch/case statements are needed and the constants being compared to are not very large. It can make that portion of the code which was previously using switch/case to go extremly fast.
Posted on 2002-12-04 13:00:24 by clippy
humm... isn't this a bit overkill for wndproc style stuff?
I can see the advantages if you have an enormous amount of
messages to process, with equal/random probability, where
you'd end up going through too many Jcc's.

For a wndproc, does any of this matter at all? Obivously not
for the typical small win32asm-style programs or on fast CPUs,
so let's say... something sorta large, lots of messages, running
on a, say, 200mhz pentium/mmx. Would you be able to *feel* any
difference, like user interface responsiveness? Just wondering.

Personally, for gui coding, I'm using a dispatch scheme where
all messages go to separate procedures. It's probably somewhat
less wasteful then the Big Table(TM) approach, although slower
(either scanning a table, or using a switch statement). And it's
probably slower than linked Jcc's. But it's easy to maintain,
easy to debug, and... for simple message dispatching, does it matter?

Please don't take this as a flame/insult, I'm just wondering
whether there's any visible gains from optimizing something like
wndproc, even when targetting relatively low-end machines.
Posted on 2003-01-13 06:25:58 by f0dder
hi f0dder,
You are right actually this can be a bit of overkill if used for each and every window.
I personally find a much better use for it as a replacement for switch/case statements.
Posted on 2003-01-13 06:58:10 by clippy
There are a lot of apps where it would not make any difference but it does have another advantage, where I have converted applications from a normal Switch block style WndProc to a dispatcher of this style, you end up with smaller code, for the overhead of loading the addresses which is no big deal, you lose all of the CMP/JMP code in the WndProc.

I have seen in my time very large applications that had massive sized WndProcs and its here where this dispatching technique would be useful in speed terms as I have used applications that were very sluggish because of their size, even while they were not doing anything.

The actual technique has its real advantages in areas where you must handle massive ranges of branching at high speed, character based branching is one area, number based arrays is an obvious choice or anywhere else that wide choice is necessary.

Posted on 2003-01-13 23:54:40 by hutch--
hutch, how were those "massive wndprocs" done? lots of cmp/Jcc?
While I agree that this "massive table" :) approach is probably good
for high-speed stuff where you have a huge amount of cases, couldn't
a "massive wndproc" be handled efficiently with a smaller table of
{umsg,handlerproc} pairs? sure, you'd have to do a searchloop which
would take more time than a single indirect jump, but it'd take less
space (until you have "a lot" of messages, where the two-item table
would obviously end up taking more space). The table could of course
be pre-sorted for faster searching.

Again, no doubt the table based searching will be slower than a single
indirect jump, but question is: will you be able to feel it?
Posted on 2003-01-14 01:19:23 by f0dder

most of the stuff I saw were traditional Switch/Case systems for WndProc style procedures and while this is a reasonable performer it can start to get a bit slow when the number of items processed becomes large. In things like menus and command buttons, it probably does not matter but with GDI screen graphics you can start to see the lag if the app is big enough.

With the modification that handles anything over 1024 by a normal Switch/Case, an array of 4k is no big deal as it will probably fit onto an existing memory page but there is no doubt that an address based branching system has a smaller overhead in repeated loop terms than any search based system so if speed matters, it will outperform sequential searching in branching performance.

I am not sure exactly what you mean by a {umsg,handlerproc} system. I did post an example that had a seperate procedure for every message that was branched in much the same way, the table was filed with procedure addresses for each message processed and it worked OK. It ends up being a bit larger because of the extra parameters to do it but for people who like coding with seperate procedures for each message, it probably does the job OK.

Posted on 2003-01-14 02:29:31 by hutch--
This is from the 'case insensitive char match' thread.

on modern CPUs, look up tables can be (in a real situation) very slow, because if the cache hasn't yet be filled with the table, access to main memory will be extremely slow.

So can it be that this method of using wndproc makes it even slower as the array wont be cached?

Or am i just talking utter rubbish?
Posted on 2003-03-17 05:12:34 by clippy
TITZ?!?! WHERE!?!? =P
Posted on 2003-03-17 20:43:50 by x86asm

Interesting question but once the memory has been accessed a few times it is in cache anyway. Empirical testing shows the technique is a lot faster in algos where it matters, particularly byte based algos with 256 combinations.

The bigger the range being handled, the more advantage it has.

Posted on 2003-03-20 03:23:48 by hutch--
As an example-

Lets say in the you have a handler for WM_LBUTTONDOWN which generates numbers from 1 to 10million on each mouse click.

Of course this is an absured situation but what i mean to point out it is that if say for a particular handler you do a lot of job.
So after the handling is complete and the next msg is passesd to wndproc, wont the cache would have completly changed by then?
Thus wont it then make wndproc even slower as according to whats mentioned above, we will have to wait again for the table to be loaded in the array???
Posted on 2003-03-26 01:52:04 by clippy
and do the handlers even have to do a lot of work for this scenario to be true? with other processes and threads, other messages being handled (etc), won't the cache almost automatically be filled with other stuff? how much code/data will normally be used on a round-trip through the system?

I can see the direct-jump-table-thingy could be nice for a loop where you process a lot of symbols (but having few enough symbols that the table wont be thrown totally out of cache by other processing etc), but for something like a message pump, where you don't really know what happens from one message to the next, I have my doubts performance-wise. But then again, imho a GUI message pump isn't too performance critical, and I'm considering a silly table of message,proc (ie, search through table instead of direct jump), as typically there wont be too many messages to handle per window. The speed decrease probably cannot be felt even on older hardware, and the table-based approach can make life quite some easier for me if done right.
Posted on 2003-03-26 01:59:25 by f0dder
Where I tend to use this technique is where I have a line of hack branching that can be calculated once as an array of addresses and then replaced with a single jump to an address. With things like branching on character choice, the speed gain is considerable and in most instances, the extra code to set it up is smaller than a standard list of comparisons and jumps.

The win is the normal one, calculate the set of label offsets at either assemble time or even partly at run time once, then keep using them without any further overhead with high speed branching.

Posted on 2003-03-27 01:26:08 by hutch--

Please correct me if I am wrong, but I think there's a bug in this message dispatching routine. It expects messages < 1024, but this is not an assumption we can make...

Some apps use registered messages for interprocess communication, and for global notification of events, b broadcasting with PostMessage,HWND_BROADCAST,uMsg,wParam,lParam. Since window messages registered with RegisterWindowMessage are > 1024, if our app receives one of these it will cause an illegal memory access, or worse, a jump to arbitrary code.

So, I think there should be code to check for messages > 1024 and call DefWindowProc in that case.
Posted on 2003-06-27 11:46:40 by QvasiModo

echo... ...echo

Posted on 2003-06-28 15:05:53 by QvasiModo