Okay so here it is: The two optimization articles I wrote for Assembly which addresses the advantages of using the LEA instruction and provided exact number of clock cycles that the CPU would take (the information about the CPU brand and model, the memory and the alignment of different segments is provided in the article) to execute certain instructions.

I have written the articles in LaTeX and the source code is available too if you ask for it. I have attached the output PDFs to this post. I don't know if it can be used in the ASM Wiki Book or not but it'd be great if one of the moderators could let me know.
Posted on 2007-02-07 00:50:35 by XCHG
Heh, "the exact number of clock cyles that the CPU would take" - you're going to be in for a surprise on modern CPUs :]
Posted on 2007-02-07 07:10:07 by f0dder
No, no surprises  :P I have an Intel Core Duo 1.83 GHZ on my laptop and yes, it is a lot different than the Pentium series. However, I think I have mentioned on which CPU the testings have been done. Maybe I should include other processors or brands in other articles.
Posted on 2007-02-08 00:30:39 by XCHG
As well as OS development, I'm interested in writing benchmarking programs to test out the various CPUs to find out things like clock cycles per instruction and certain things like that. As far as I know, the FPU will take more than one cycle to compute certain things, and I'm indeed interested in writing benchmarking programs to display graphs corresponding to such things. Of course, this would be done as a stand-alone program that wouldn't operate under any OS, so as to achieve more accurate results. Once I actually start creating credible programs, I'll look into it.

- keantoken
Posted on 2007-07-28 05:56:12 by keantoken
Thing is, CPUs aren't so simple anymore, so you can't just make a char of cycle timings and expect to time a piece of code by adding up those counts... instruction ordering, stalls, cache hits/misses et cetera all play a factor.
Posted on 2007-07-28 07:09:57 by f0dder
Looks like typos in section 3.3 (+ should be *)
Tobeabletomultiplyageneralpurposeregisterby2,4or8,youcanusethe
LEAinstructionwithINDEX*SCALEasshownbelow:
LEAEAX,
Theaboveexamplecalculates EAX=ECX*4.Youcanalsospecifythe
destinationoperand(theonetotheleft)tobethesameasthesourceoperand.
Forexample:
LEAEAX,
Andtheaboveexamplecalculates EAX=EAX*4.Notethatyoucando
thisoperationwiththe SHL instruction,too.
Posted on 2007-07-28 12:58:42 by JimG