This is a part of procedure from second Service Pack for MASM32:
.if lpIDList == 0
mov eax, 0 ; if CANCEL return FALSE
push eax
jmp @F
invoke SHGetPathFromIDList,lpIDList,lpBuffer
mov eax, 1 ; if OK, return TRUE (5 bytes)
push eax
jmp @F

I think, that it's not optimized at all.
I didn't tested, but I think, that my sample will work normal:
.if lpIDList == 0
xor eax, eax ; if CANCEL return FALSE
push eax
; jmp @F - JUMP not needed
invoke SHGetPathFromIDList,lpIDList,lpBuffer
xor eax,eax ; (2 bytes)
inc eax ; if OK, return TRUE (1 byte)
push eax
; jmp @F - JUMP not needed
Posted on 2001-08-17 19:47:24 by MemoBreaker
MemoBreaker, there isn't really any reason to optimize code around
an API call... especially not speed optimizations. As for size optimizations,
you say this is a proc... so the few bytes you can shave off won't
matter much either.
Posted on 2001-08-17 20:21:38 by f0dder
But JUMPs in this part of code is not needed! And it's a reason to update this procedure in next MASM32-package/service pack.
Posted on 2001-08-17 21:06:05 by MemoBreaker
True, I failed to see the jmp @F, sorry. I should go to bed I guess...
tired and struck by influenza...
Posted on 2001-08-17 21:12:04 by f0dder

MemoBreaker, there isn't really any reason to optimize code around
an API call... especially not speed optimizations. As for size optimizations,
you say this is a proc... so the few bytes you can shave off won't
matter much either.

Code around API calls has absolutly no difference to any other
Bytes are always bytes, and clock are always clocks.
If you think that speed and size are important and have smaller or (and) faster version you are always replace the old version with the new one.
It may be just a little better, or much better, but it's always BETTER.
Posted on 2001-08-20 05:11:18 by The Svin

Its a matter of fact that the 2 jumps are not needed, from memory to make the code easier to read after I wrote it, I changed it to .IF syntax and forgot to remove the jumps.

Performance is another matter, in the context of an API call, the size increase of 2 jumps is trivial, the block .IF syntax has a jump already there so there is no extra code being executed, just a couple of extra bytes for the two jumps.

I pursue speed where it matters, I try for clear code that is reliable other than that, I have seen too many optimisations in the past that were supposed to be genius but did not run so I am inclined to view reliability above that style of optimisation.

Like it or not, an API call is very slow alongside assembler instructions so a size or speed optimisation in the presence of an API call is wasted effort, it simply does not matter.

The pursuit of "byte perfect" code is a leftover from the DOS days, in win 32 on late model processors, cache, branch prediction, memory access speed, disk IO speed are the things that make code go faster, not nitpicking bytes.

Posted on 2001-08-20 06:51:19 by hutch--
Regarding speed and size there is no such a "general" diffinition as "API call".
API functions are very different in metter of speed and size.
Among other things is always importent if some particular API func thuncks into kernel mode (cause it itself costs more than 1000 clocks apart from time needed to execute the function itself)
And there are API functions that runs only 10 clocks (GetCommandLine for example).

About clear coding its very personal opinion - for example for
me raw dissasebled code looks clearer than C++ code. For any person "clear code" is the code he can better understand.
And more - I don't see any reason why optimized code SHOULD
look unclear.

Talking of code in std lybraries - what a difference if you don't optimize procedures that use API or doesn't use it - in any case
unoptimized code will increase code size in modules that use
this proc libraries, and will make it slow.

I see that it is very popular point of view on code around API -
that if it's near API it not need to be optimized
And yet I haven't seen any logical or math explonation why near presence of API function can justify sluppy coding.
I think's there just can't be one - or you right optimized code or not - there is nothing special excuse of API around for unoptimized code.

I think it because of absence of knowlege and expirience regarding code in API.
If we agree that API do a lot of work, then it worth to know how
and how long it runs, cause neither processor no user know if your program fast or slow because of yourown code or API -
for processor it's just chain of instructions, for the user it's just time taken to perform some work, size taken on harddrive and memory needed to run.
Posted on 2001-08-20 07:45:21 by The Svin
I agree, that pair of bytes do not influence speed in the given example. But If always add two bytes to each procedure, assembly code will be bigger than in other programming languages... :)
Posted on 2001-08-20 17:25:46 by MemoBreaker
I think Hutch's point is well taken. What he means if I'm not mistaken is that the execution time of the api calls swamps the relatively short time of the inline (Masm) code - in most cases. The few clocks you might save by hand optimizing that section of code for speed will not constitute a large enough percentagee of the OVERALL execution time of that section. The time it takes to do the optimization for such a small speed gain amounts to nitpicking.

If the cpu spends over 1000 clocks in the api call shaving off 100 clocks (a considerable feat) would reduce the execution time by less than 10%. Noone would suggest testing each api call to see how many clocks it uses because the typicall code using the api is sufficient.

Speed optimization generaly woon't be noticeable anyway in simple "one-time only" code. Where it really comes into play is in looping code and iteratively processing array type structures. But then again if there is an api call in the loop...

However one can and should always try to minimize code size except of course where it would interfere with existing speed optimizations.
Posted on 2001-09-08 09:35:19 by gfalen
gfalen: and when you can shave off less than 100 cycles... who cares.
Optimize where it matters :)
Posted on 2001-09-08 17:50:46 by f0dder
The important thing to remeber is that we are talking of std library
That means that size and speed of the procedure affect not one
module but all the modules that use it.
Comulative effect of size and speed.
Lots of slow apps can blame for it just a few akward proc from (for example) C library. Rewrite those procs and recompile all projects and it will make not one but all those procs faster.
It's so obvious for me that I didn't even answer to childish boolshit when one stated that it didn't worth it to spend time for
fastest algo to convert value to string 'cause (he said) one number doesn't worth it. Of course such algos are made to convert as many millions numbers as programmers need. Not one.
The same about optimization of any standart purpose proc.
If you don't care to do the optimization - it's up to you - do what you want. But be at least reasonable not blaming other people for the "unworthy optimization job" they are doing for.
Some times I almost shocked that ones could even declare those
who do optimization for free "gilty!".
They do things better , not worse - it's clear.
So if you don't - it's fine - but at least don't buzzzzzz about what have or doesn't have point.
Zero if canceled, not zero if the user has chosen a folder:

; #########################################################################

.model flat, stdcall ; 32 bit memory model
option casemap :none ; case sensitive

include \masm32\include\
include \masm32\include\
include \masm32\include\
include \masm32\include\

cbBrowse PROTO


; #########################################################################

BrowseForFolder proc hParent:DWORD, lpBuffer:DWORD, lpTitle:DWORD, lpString:DWORD

; ------------------------------------------------------
; hParent = parent window handle
; lpBuffer = 260 byte buffer to receive path
; lpTitle = zero terminated string with dialog title
; lpString = zero terminated string for secondary text
; ------------------------------------------------------


mov eax, hParent ; parent handle
mov bi.pidlRoot, 0
mov bi.hwndOwner, eax
mov bi.pszDisplayName, 0
mov eax, lpString ; secondary text
mov bi.lpszTitle, eax
mov eax, lpTitle ; main title
mov bi.lpfn, offset cbBrowse
mov bi.lParam, eax
mov bi.iImage, 0

invoke SHBrowseForFolder,ADDR bi
test eax,eax
push eax
je @F
invoke SHGetPathFromIDList,eax,lpBuffer
call CoTaskMemFree
mov eax,[esp-4]

BrowseForFolder endp

; #########################################################################

cbBrowse proc

invoke SetWindowText,[esp+8],[esp+16]

ret 16

cbBrowse endp

; #########################################################################

Posted on 2001-09-10 23:29:44 by The Svin
I agree with The Svin,
The good programming style is important for me rather
processors, cache, memory access speed, disk IO speed, etc
because that is HARDWARE and we are ASSEMBLY programmers rather JAVA, C++/bla bla programmers....
If you don't agree you can delete the book of Agner Fog
from your computer and start coding in #C or VB...
Posted on 2001-09-11 17:23:27 by buliaNaza
I guess it depends on what you consider important, out of date DOS style code where nitpicking bytes is the achievement or objectively measured performance that can be seen in working and living applications.

If the performance considerations of cache, disk IO speed, memory access speed, processor capacity and similar do not matter as long as the code is DOS style byte perfect, why bother at all, join the script kiddies where performance is not a consideration and feel profound at scripting.

Much of the reason why the programming world at large sees assembler as irrelevant is because much of what is written as assembler IS irrelevant. When you can confront the world with a dialogue box loading routine that is 1 cycle faster, the programming world at large will die laughing.

Introspecting at you naval may feel good but if you want to make a dent in the programming community at large, you will do it with sheer performance.

Nothing stings like sheer speed and it is not without purpose, as modern computers get bigger and faster, so do the tasks that are being solved by programmers and while many of the modern languages use this increase in power to cover up its templating style of coding, its good old fashioned low level procedural coding that does the hard stuff at competitive speeds.

This is where assembler has a lot to offer as it can actually deliver the performance required but messing around with irrelevancies fails to capture the real advantage of assembler. Saving a single byte here and there will not deliver the performance gains needed to handle ever increasing demands for performance where cache, processor, disk IO, opcode choice and algorithm design will.

I opt for the latter, have PHUN with the nitpicking.

Posted on 2001-09-11 20:13:00 by hutch--
Masm is good, but not very good for good optimization because of the jmp/call thing every time to invoke.

Just my oppinion though...
Posted on 2001-09-11 22:01:59 by Kenny
I respect your opinion but prefer Agner Fog...Sorry!

How to optimize for the Pentium family of microprocessors
Copyright ? 1996, 2000 by Agner Fog. Last modified 2000-03-31.

23. Reducing code size (all processors)
As explained in chapter 7, the code cache is 8 or 16 kb. If you have problems keeping the critical parts of your code within the code cache, then you may consider reducing the size of your code.
32 bit code is usually bigger than 16 bit code because addresses and data constants take 4 bytes in 32 bit code and only 2 bytes in 16 bit code. However, 16 bit code has other penalties such as prefixes and problems with accessing adjacent words simultaneously (see chapter 10.2 above). Some other methods for reducing the size or your code are discussed below.

Both jump addresses, data addresses, and data constants take less space if they can be expressed as a sign-extended byte, i.e. if they are within the interval from -128 to +127.

For jump addresses this means that short jumps take two bytes of code, whereas jumps beyond 127 bytes take 5 bytes if unconditional and 6 bytes if conditional.

Likewise, data addresses take less space if they can be expressed as a pointer and a displacement between -128 and +127. Example:

MOV EBX,DS:[100000] / ADD EBX,DS:[100004] ; 12 bytes
Reduce to:
MOV EAX,100000 / MOV EBX,[EAX] / ADD EBX,[EAX+4] ; 10 bytes

The advantage of using a pointer obviously increases if you use it many times. Storing data on the stack and using EBP or ESP as pointer will thus make your code smaller than if you use static memory locations and absolute addresses, provided of course that your data are within +/-127 bytes of the pointer. Using PUSH and POP to write and read temporary data is even shorter.

Data constants may also take less space if they are between -128 and +127. Most instructions with immediate operands have a short form where the operand is a sign-extended single byte. Examples:

PUSH 200 ; 5 bytes
PUSH 100 ; 2 bytes

ADD EBX,128 ; 6 bytes
SUB EBX,-128 ; 3 bytes
The most important instruction with an immediate operand which doesn't have such a short form is MOV.

MOV EAX, 0 ; 5 bytes
May be changed to:

XOR EAX,EAX ; 2 bytes
MOV EAX, 1 ; 5 bytes
May be changed to:
XOR EAX,EAX / INC EAX ; 3 bytes
PUSH 1 / POP EAX ; 3 bytes
MOV EAX, -1 ; 5 bytes
May be changed to:
OR EAX, -1 ; 3 bytes
If the same address or constant is used more than once then you may load it into a register. A MOV with a 4-byte immediate operand may sometimes be replaced by an arithmetic instruction if the value of the register before the MOV is known. Example:

MOV [mem1],200 ; 10 bytes
MOV [mem2],200 ; 10 bytes
MOV [mem3],201 ; 10 bytes
MOV EAX,100 ; 5 bytes
MOV EBX,150 ; 5 bytes
Assuming that mem1 and mem3 are both within -128/+127 bytes of mem2, this may be changed to:

MOV EBX, OFFSET mem2 ; 5 bytes
MOV EAX,200 ; 5 bytes
MOV [EBX+mem1-mem2],EAX ; 3 bytes
MOV [EBX],EAX ; 2 bytes
INC EAX ; 1 byte
MOV [EBX+mem3-mem2],EAX ; 3 bytes
SUB EAX,101 ; 3 bytes
LEA EBX,[EAX+50] ; 3 bytes
Be aware of the AGI stall in the LEA instruction (for PPlain and PMMX).

You may also consider that different instructions have different lengths. The following instructions take only one byte and are therefore very attractive: PUSH reg, POP reg, INC reg32, DEC reg32.
INC and DEC with 8 bit registers take 2 bytes, so INC EAX is shorter than INC AL.

XCHG EAX,reg is also a single-byte instruction and thus takes less space than MOV EAX,reg, but it is slower.

Some instructions take one byte less when they use the accumulator than when they use any other register.

MOV EAX,DS:[100000] is smaller than MOV EBX,DS:[100000]
ADD EAX,1000 is smaller than ADD EBX,1000
Instructions with pointers take one byte less when they have only a base pointer (not ESP) and a displacement than when they have a scaled index register, or both base pointer and index register, or ESP as base pointer.

MOV EAX,[array][EBX] is smaller than MOV EAX,[array][EBX*4]
MOV EAX,[EBP+12] is smaller than MOV EAX,[ESP+12]
Instructions with EBP as base pointer and no displacement and no index take one byte more than with other registers:

MOV EAX,[EBX] is smaller than MOV EAX,[EBP], but
MOV EAX,[EBX+4] is same size as MOV EAX,[EBP+4].
Instructions with a scaled index pointer and no base pointer must have a four byte displacement, even when it is 0:

LEA EAX,[EBX+EBX] is shorter than LEA EAX,[2*EBX].
Posted on 2001-09-11 23:49:38 by buliaNaza

I will let you in on a little secret, I produced the Winhelp format file for Agner Fog because the research he has done is so important to assembler programming.

The section of his work that you quoted does not address the original problem reported here, 2 non executed jumps in a piece of API code and there relevance to code execution speed.

Its relevance to cache size is lost when you apply it to an API call as it is a set of procedures in the system DLLs that are much larger than the code you are executing.

The reference work you quote is an area that is well known, there are good opcode lists available that give you the byte size but assuming that smaller code in byte count is faster is a mistake, there is a lot more to code speed than nitpicking bytes, considerations like instruction choice, order and fundamental algorithm design are the things that effect the speed of code.

The only objective test is the clock and there is no immediate correlation between pre built theories and the time taken to execute code.

On an effort to time basis, I will always give priority to code that requires performance over code that does not matter. I would rather have a fast algorithm than a nitpicked piece of API code that does not run any faster.

Posted on 2001-09-12 02:05:56 by hutch--
The section of his work that you quoted does not address the original problem reported here, 2 non executed jumps in a piece of API code and there relevance to code execution speed.

Steve, I respect you very much so let me be honest :)
Those "2 non executed jumps " were a piece of stupidity :)
which we all do from time to time when we are tired or in a harry.
In similar case if somebody points out to such things for me I always say: "Thanks, friend, I did a mistake". And really mean it :)
What I don't understand is why f0dder started on MemoBreaker when MemoBreaker pointed out to obvious mistake.
Correcting the mistake would not do revolution of course but it at least does:
1. Code clearer 'cause those jmps look wierd
2. A little shorter
3. Microscopically faster.
And to do it you need 1 second which I think is at least 1000 times smaller that time spent in thith thead for useless talks.
I absolutly don't understand what's wrong MemoBreaker'h done
I think, he just helped.(talking of friendly forum ;)

As for the rest what I hate of these philosopical talk that nodody give himself job to count and time before started spread his ideas.

For example I am almost sure that none of discussers take a job
to check size of old and new version of the proc and perform some comparation and calculation before having started talks.

We are doing intellegent job, a partionally sientific.
And needed to be up to scratch, armed with data,real and not imaginary data before final conculsion.
Posted on 2001-09-12 07:22:29 by The Svin

I have already agreed that the two jumps were not necessary and I have no criticism about MemoBreaker for either finding the mistake or posting it, my complaint has been about the importance of such a mistake, effectively changing a piece of code and forgetting to take out 2 unused jumps.

My comment has been that the omission is trivial and that the importance placed in removing them is not proportional to the lack of any reasonable gain. I am currently working on sort algorithms as I see them useful for assembler programmers where I see byte nitpicking as useless.

It is as I have said, a time based priority, I would prefer to work on worthwhile algorithms than saving 2 jumps that were not used. It is as simple as that. Next time I get time to play with the dialog boxes, I may remove the extra jumps if I remember but it will be for no speed gain whatsoever, only to try and make the code clearer to read for someone who is looking at how its done.


PS, if you have time, would you post the last qword to ascii conversion as a file as my netscape messes up the display of code posted inbetween the code tags. Commenting for people to read would be appreciated.
Posted on 2001-09-12 07:54:34 by hutch--
Yes, Steve, I understand.
And I very proud of you for the last tasks you work.
There are still a lot of asm programmers but even searching
through whole inet you can find just a few who try chellenge a
classical jeneral purpose tasks.
From my point of view those tasks and NEW algos peforming them have FUNDAMENTAL uncompareble value to all creative, thinkable programmers world.
The rest are just users no matter what do they think of themself and what their official speciality.
Actually ,I don't like words for me to be remembered as "optimization" or
"implementation". First of all I wish to be an inventor and a creator.
The same way I want to think of you.
And hope buliaNaza will make us huppy also with NEW ideas.
So wish good luck to you with your job and want you remember that I think that though not too many people will applode it - I personally
think this work is one of the most important job in general programming.
Not nice pictures of fancy controls can let feel real good asm programs - only hard and lenthy tasks such as compessions, searching whole disk for files with given words in, games and database processing .ect are real jobs with real chellenge.
Talking of fast hardware to justify bad programming is absolute boolshit 'cause ANYBODY who professionaly use PC can name hundred of tasks wich make them wait seconds and minutes and hours.
If fact, avaredge nowdays software is so slow that usual users
having tried those tasks once and having suffered waiting those
turtles to make a job usually forget about the task for the future.
This way software provoke users degradation.
I don't understand what happens to PC M$ world.
'Cause in other computer world there still a lot of carefull programming including assembly.

That's why I deleberatly distanced myself from usual Win32Asm
topics, and choose to discuss basics of common programming blocks realization in asm32. I don't consider myself an expert but I was almost shocked that a lot people here expirienced as in general programmers did not know even basics of using assembly.

I say it once again:
before anybody use anything somebody needs to create it.
And of course this "somebody" is a real man, and there always
other real man who can try to solve the task in other better way and it is named "progress".
So I buffled - why the progress almost stoped in PC algorithm research in assembly?
Really there are just a few people of the world of MILLIONS PC programmers who tried to move progress.
Taking the job - you're one of the few.
Good luck!
Posted on 2001-09-14 01:00:33 by The Svin
This sounds like a topic for the Crusades!

However, I do agree with The Svin--most asm programmers write just as bad code in asm as they do in C++. The whole purpose of using asm is to optimize the programs to run faster, not use asm to think you're getting some sort of speed increase. Like it or not, most compilers optimize the code as it's compiled, and it will actually run faster than readable asm code generated by humans. However, readable, like The Svin said, is a matter of opinion...

I guess what I'm trying to say is: asm programming gives the person an oppertunity to optimize code, not because of programming in asm makes the code faster by default.
Posted on 2001-09-14 11:43:33 by Kenny