Let us take, first, the usual killing example of XOR EAX EAX
trick instead of MOV EAX 0:


mov ecx 0FFFFFFF
call 'KERNEL32.GetTickCount' | mov D?Time1 eax
Align 32
L0:
xor ebx ebx
loop L0<
call 'KERNEL32.GetTickCount' | mov D?Time2 eax
mov ecx 0FFFFFFF
Align 32
L0:
mov ebx 0
loop L0<
call 'KERNEL32.GetTickCount' | mov D?Time3 eax

mov eax D?Time2 | sub eax D?Time1 | hexprint eax
mov eax D?Time3 | sub eax D?Time2 | hexprint eax


The 2 'Hexprint eax' give the same value. The ones who think
that 'xor' (33 DB) should be faster than 'mov' (BB 00 00 00 00),
because of the size of the instructions, do not understand the way the
processor works.

Now, let us suppose that some tricks like this really improve speed.
(This is really the case for many such tricks).

We usually consider that improving speed of 20 per cents, that way,
is a great performance.

What does it mean in real world? Any case 20 per cent more fast
will be widely given in 6 months processors evolution, any way.

What does it mean at the final user point of vue. Let us take 2
simple examples: String search feature in 2 Appications -say,
Acrobat Reader for the most ridiculous example, and any asm text
Editor (including the worst possible one: Mine )-. ;)

When you do a string search in Acrobat you certainaly ask yourself
"... but,... what does it do??? Does it compute the Hearth-Moon
distance between each encounted Char???....".

When you do such a search in any other Editor, you just have time
to release your mouse button and the result is up.

In SpAsm Editor, Right-Click on ANY symbol gives a result, depending
on the symbol nature, including search of Data or Code Labels
declarations, followed by a jump of edition to this declaration.
On a 1 Mo size source, this is nothing but a monster task. This task
is done in the click time, without any care of any kind of
optimisation in the implementation.

Now, how a Application can be too slow at the user point vue?
This is, in fact a great counter-performance, that require an
evoluted technic.

The base of counter-performances is the concept of reusable code.
Each time you write a routine in the reuse approach, you have to
feature it in such way that it can be called under any condition,
in various circonstances, with all possible required securities,
and so on. Then, the reuse approach tends easily to drive you to
call reusable code which, in turn calls for reusable code, which
in turn...

This is multiplying no use code run. Applying these 'evoluted
technics' in an Asm source, even with code level optimisations,
will give the same crazy results.

When an Asm written Application is not fast enough, what we have to
ask is not 'how can i optimize it?' but 'What did i do that is so
stupid, in the overall organisation?' or, instead of 'what could i
do?', ... 'what could i *not* do?'.


betov.
Posted on 2001-07-24 07:18:20 by Betov
(I believe this thread belongs to "the crusades")

I think, optimal programming is always important.
You described the 20% advantage, which will last only
6 Month - this is wrong, since you always keep this
advantage against other coding ways.

So, your code is always faster or can handle more data in the
same time than others.

even with increasing number of stupid VC++ or Java shit the
gap between hand-written, optimized programs and fast-to-the-market software opens more and more.

beaster.
Posted on 2001-07-24 07:34:12 by beaster
A friend of mine once said that code optimisation was like having a discussion with his wife. It really depends on what you are doing, recently a Boyer Moore search algorithm that I developed was used by a programmer to search entire disk drives for virus patterns in files and this meant recursively searching the same file multiple times for different virus patterns. On his AMD duron he was getting read speeds of about 800 meg/sec which seemed to keep him happy at the time.

Try and do this with SCASB and you will grow old waiting for the search to end. As is often the case with complex parsing, you may perform 50 seperate operations or more on the data you are working on and it is here that individual operation speed starts to add up. Do these operations with suboptimal code and they can get very slow.

In mant places speed does not matter, you can do all sorts of processing on small data and the time is not a consideration but try and do large amounts of data with the same code and the machine will take forever to finish the task.

It was not all that long ago that computers had HDDs of less than 500 meg but it is common now for them to have 50 gig disks, they used to have 8 or 16 meg of ram, now its common to have 256 meg or more and this means that the size of data tends to go up with the increase in machine size.

Relying on a faster processor to fix bad code is a mistake, QBASIC is very fast on a modern machine but it is doing very simple things on small amounts of data, try scanning a 100 meg database and you will start looking for speed.

At the algorithm level the search for speed is necessary to keep up with ever increasing demands for performance on ever increasing data size.

Regards,

hutch@pbq.com.au
Posted on 2001-07-24 07:54:04 by hutch--
Betov, your example of "xor eax, eax" is a bad one!

It is considerably better to use "xor eax, eax" as this is special cased on the P6 architecture in order to avoid partial regiser stalls.
It is also worth noting some very long instructions take a cycle longer to decode than others, and so avoiding very long instructions is a good thing TM! (I think its longer than 9 bytes, I could be wrong though).

I would agree that wasting time optimising for size it usually a waste of time. Who here hasn't got at least 64 meg? The only time you should ever consider size important is for small loops, then if you're clever with the instruction choice you can cram it on to a single cache line.....

Mirno
Posted on 2001-07-24 08:33:48 by Mirno
ummm correct me if I'm wrong but isn't the xor reg,reg more of a size optimization?... I mean compared to mov reg,0
Posted on 2001-07-24 09:38:18 by NervGaz
Beaster, "I believe this thread belongs to "the crusades";
Yes, i love putting the fire on board. :))
But this is not for my pleasure only. The concept of 'Specific Programming'
is far TOO SIMPLE to be explained, as it is entirely holded by the name.
And, as you may know, these days, people do not like a lot simple things.
So, i try to do it the reverse way.


Hutch, i the futur, like in the past, we can suppose that hard storages
spaces will grow faster than Processors speeds. So, what use will be the
optimizations of your Algo, when we will have entire movies on the disk?

Of course, like Beaster says, the positive benefit will always be there,
but, in such circumtances, the solution is to change of 'strategy'. I mean
*not* parsing the disk. When a task is too long, the best solution is to
not do it.

I have 2 tasks, in Spasm, which both do the same kind of computing. But
because of the nature of the concerned Data, these 2 tasks can not be
performed the same way: One takes most of the compile time, the other takes
about 0 ms. on a huge file, with zero code optimisation, in any case. I
suppose you clearly understand that i am attacking "Code level Optimisation",
not "Design Optimisation"... For example, an power_2_search algo is *not*
"Code level Optimisation", this is strategy, design.

Talking of "Code level Optimisation", how many per cents of speed did you
really gain with this alone? and does it make sense? (i know that 1 minute
multiplied by 60 makes one hour...).


Betov.
Posted on 2001-07-24 10:49:16 by Betov
Originally "xor eax, eax" was purely for size yes. However, in the P6 architecture the registers are slightly different in order to allow for the "parallel" execution of instruction.

Under certain circumstances, the processor will have to stall in order to be certain that the partial register write has been sucessfully completed ("retired" as the Intel terminology puts it).

Xor'ing is a special case in the Intel & AMD chips (which both use a similar run-time mapping of registers in their latest architectures), this avoids partial register stalls, which can cost quite a lot of processor cycles.

mov eax, 0
add ax, 1

Will partial register stall (the subsequent access of ax before the write has retired).

xor eax, eax
add ax, 1

This achieves the same result, but will not encounter the partial register stall.

Mirno
Posted on 2001-07-24 11:56:06 by Mirno
The best example I can think of right now is an algorithm to evaluate regular expressions: There are many layers of optimization to this problem - none of them are insignificant (or stupid).
IMHO :alright:
Posted on 2001-07-24 12:09:40 by bitRAKE
Please, send it to me BitRake. Thanks in advance.

betov.
Posted on 2001-07-24 12:51:26 by Betov
Sorry, you have to code it yourself. It requires a bit of research. Code wouldn't really help you understand what I mean - it requires a different type of understanding. Like Boyer Moore algorithm requires an understanding at a different level. This problem is even more dynamic and therefor harder to understand why optimal solutions are optimal. I'm studing right now - learning all the time. :)


and DWORD PTR var, byte 0 ;better than 'mov var, 0'?
Posted on 2001-07-24 13:57:21 by bitRAKE
Hi Betov,

Sometimes i agree with you sometimes i do NOT..

I agree that trying to make a general solution for ALL problems (extra parametrization) and reusability of code is a big mistake

I quote here Charles MOORE the creator of FORTH language and probably the only person in this world that invented a very new language, and created over 40 BIG/HUGE applications (like OFFICE) ALONE in his lifetime (until now) using the most "low level" HLL in existance (besides ASM of course)

Those applications are still working today, and are the Core of many NASA and Astronomical Laboratory/Observers all ovet the world (many of them include new made OS from SCRATCH)

He said something like this:

All my life i have searched for the "GEMS" of code that can be used again/reused from an application to another without speed or clarity loss...

Until now i have only found less then 40 such little routines, nothing special, 1 or 2 parameters max, and they are all primitives in my FORTH compiler and usually written in native ASM (for each processor)

All OTHER code has proved to be USELESS and had to be rewritten EVERY time to get the SPEED, CLARITY, and SIMPLICITY required by a GOOD PROGRAM/SYSTEM
.....



But I disagree about code optimizations,

My experience makes me consider MEDIUM code Optimizations very valuable, sometimes even low level code optimizations (inside inner loops for grafics)


And i dont talk about momentary commercial reasons that make our life such a pain.

For example, under normal conditions any SHIFT operation is much faster than a Multiplication, hardware constructors can choose to ignore shift operations and improve multiplications because compilers are unable to make good use of them... but this dosent change a TRUE fact ...

Swaping pointers is MUCH Faster then copying data across memory...

Using Registres is much faster than using level 1 cache, level 1 cache is much faster than level 2 cache, level 2 cache is faster them system RAM memory....

Those are pure FACTS....one can NOT ignore them...

Arguments regarding hardware evolution just dont stand...

check this link to see why:

http://webster.cs.ucr.edu/Page_asm/GreatDebate/debate4.html

However BAD arguments dont change a TRUE fact:

CODE REUSE must be avoided at ALL COST unless is very SIMPLE, CLEAR, the product of long EXPERIENCE and TESTING, has little or NO PARAMETERS, and of course .... is in ASM ;)

Oh yeah, and SPECIFIC things dosent necesary meand DIFFRENT things, one can make programs that look pretty much the same, but have very "specific" internals.

Specific means doing EXACTLY (NO MORE/NO LESS) what IS REQUIRED...in dosent mean DIFFRENT...

PS
====
use RDTSC to check performance not GetThickCount ;)
the later has not enough resolution...also check multiple runs (thousands...to average cache and first run problems)
Posted on 2001-07-24 17:04:37 by BogdanOntanu
I cant really get this. Why should reusable code be avoided?
All the books i have read till now, say the opposite?

Also do u mean we should stop using the macros, the api's etc. they are reusable code too.
Posted on 2001-07-24 17:11:28 by MovingFulcrum
Hey Moving

We are not here to tell YOU what to do ...

One must listen and Judge for HIMSELF...

Dont belive the Books either, READ them, UNDERSTAND them,
EXPERIMENT then...but NEVER BELIVE...allways Experimet

CODE REUSE is at great price today because it's the CORE of CORPORATE way of doing things, they dont care about program speed or eficiency...all they care is DEADLINES ;) and interchange or easy disposal of programers / creators....

Ultimate GOAL is to make programming an automated process, and change programmers and creators into USERS of prefabricated bloatware bricks, make programm like Cola bottles on a pipeline....

Please understand that this is a VALID Way for the Comercial world...but is patheticaly USELESS for HUMAN KIND...that is why from time to time creators are alowed to finally produce something...under well controlled environment....or else NOTHING will EVER Evolve....

I dont say ones should DO this ... just Understand it ;) doing will come with Understanding, while the reverse IS not TRUE...

I dont say macros or API are esentialy BAD....but pretty much..

Lets say one of your macros has a flow in it, reuse it and you will multiply that error 100 times, share it with a forum....place it for download....

Now rewrite it, what can it happen, do you think it will take so much time?

what IF you find a better way of doing it, or just another way that will give you latter more good ideeas for another function?

Why are you speeding? where are you going? who are you?

Constant change, inovation, experiment are the spices of LIFE...

Automation, constant, no change is the spice of DEATH

Life and death work together... switch from one to the other from time to time ...will you? :grin:
Posted on 2001-07-24 17:31:42 by BogdanOntanu
Bodgan, we agrea more than you think. Of course, as you know i like
provocating, and i don't do it the half way when i mean to counter
tendencies i don't like.

In fact, i always use shift operations each time i can avoid Mul
and Div, and so on. Simply because in this case, this is fully
accurate, easy to read, easy to write, much faster, and with less
instructions and reg involved. (and this is indead really code
level optimisation, not strategy). So, it comes against what i
mean, but such accurate cases are very, very few, and well known.

I also Swap pointers instead of copying data (of course), but
this is *not* Code level, this is strategy.

And, of course, i also use regs as much as i can and switch to
mem when i can't do without. This is programming art base (though
not at all an easy to learn base... ).

I am happy that you are at least this ONE guy who have read and
agrea with my second point (specific programming). I had expected
more reactions on this point and thaught that pure ridiculous
things like the upper "and DWORD PTR var, byte 0" would not be
a so great debate of inconsistant opinions against evidence.

The definition you give for 'specific' is mine. For example, i have
begin a little 'snippet' collection in SpAsm, that the user can
Copy by selection and Paste in his source... after all wished
modifications and required ajustements. This is much different of
doing a call to black box in a lib or in a never more controled
routine nested in several levels of included things.

May be i would tend a last funny provocation with this consideration,
that our agreament on main points relies less on hazard than on the
fact that you and me have written enough asm to know what is important
and what is of a 'savant idiot'.


HHHhhhhmmmmm!!!!! one more soon closed thread! Yeah, man. :))
Posted on 2001-07-24 18:33:34 by Betov
ha in your dreams, that will take a lot more :tongue:

et je pense que la terme exacte serait idiot savant, non? ;)
Posted on 2001-07-24 18:35:12 by Hiroshimator
There seems to be a sense that betov and I agree on and that is the relevance of optimisation. I am very much a man of optimisation where you can get a benefit out of it so if you can see a result in useful speed terms, I will go after it if its possible and can be done in a reasonable time frame.

The alternative position is nonsense like the fastest loading dialog box on the planet, byte level optimisation that is so trivial that it does not change the granularity of 512 byte build size.

The more or less mythical 90/10 rule says that if you need speed optimisation in the 10% of the program that does the work, go after it but for the 90% that sits there idling and taking up memory, get the size down if it matters but don't waste your life and time if it does not.

In the area of code reuse, I am an epistemological anarchist (if it works, do it) and it is one of the things that works for everybody at least some of the time. I look at WinMain/WndProc coding as hack OS code to get the app up and running so I don't see the point in rewriting it every time I want to start an app or test an idea. Libraries are much the same idea, if it is a block of code that is useful to you and you are not going to improve on it in a hurry, use it.

If you need to do something original, then you write it but I suggest that it is foolish to continually reinvent the wheel if the one you have is adequate for the task. This is the same view as I have on optimisation, do what is useful, don't waste you life and time on what is not.

I am a pragmatist in terms of the "corporate" way of doing things, if it is an advantage then it is worth doing, the action here is to differentiate between what is worth doing and what is not.

What I am against is trying to narrow the range of things that assembler language programmers can do, the whole drift of programming in assembler is to be relatively free of external restraints so avoiding one style of programming is as bad as being restricted by another.

Regards,

hutch@pbq.com.au
Posted on 2001-07-24 19:18:18 by hutch--
hutch... epon.

I would add that...

1) there is nothing wrong with playing at the other rather 90% of the relatively useless optimization stuff for learnin' purposes... OK learning isn't useless but the result will not be directly used.

2) the weasels have closed in on the asm community & we're on our own about...
trying to narrow the range of things that assembler language programmers can do
There will be no quarter given to us. You should see the looks I get when I say I'm learning assembler.

Do Java man, maybe Perl but why are you chipping at flint when you could get things done quickly? Because flint may be brittle but it still (even today) makes the sharpest blade.
Posted on 2001-07-25 12:43:23 by rafe