It is now ten years since I first published my optimization manual. It has been so successful that I feel I have to keep it updated. The long awaited update is now available at

The manual has become so big that I had to split it into five volumes. It now covers C++, inline assembly and stand-alone assembly. CPU-specific optimization. Vector programming. ABI standards. C++ name mangling schemes. Microarchitecture details not found anywhere else. Complete lists of instruction timings, execution unit throughput, micro-operation breakdown, etc. for the latest microprocessors from Intel and AMD.

Covers the following operating systems: DOS, Windows, Linux, BSD, Mac OS X on 16, 32 and 64-bit x86 processors.
Posted on 2006-07-06 04:55:55 by agner
Hi Agner
Great work!
Thanks for sharing and maintaining it updated.

Posted on 2006-07-06 05:32:40 by Biterider
Agner, I can't wait to wrap my brain around it.. I still refer to your previous edition regularly.

I'd like to thank you on behalf of programmers everywhere.
Your efforts are highly appreciated, and I personally can't thank you enough. I'm sure the majority of us feel the same way :)

Regards, Homer.
Posted on 2006-07-06 05:38:53 by Homer
Nice, Agner - godt arbejde :)
Posted on 2006-07-06 10:56:36 by f0dder
considering the (high) quality of your 'product' Im suprised to see you sell it so hard. :lol:  :P

Posted on 2006-07-06 20:36:01 by asmrixstar

Thank you very much !!!!!!!!!!!!!  :D
Posted on 2006-07-07 03:21:16 by Siekmanski
I have updated my manual once again. Now covering everything about the new Intel Core 2 processor including a detailed study of the pipeline and execution units and complete lists of instruction timings.

This time my manual has come before the official manuals from Intel. Their software manuals for the Core 2 are not out yet. Thank you to a friendly person who gave me remote access to a prerelease sample of the Core 2. This enabled me to test almost everything.

The execution core is more powerful than anything we have seen until now. It can do up to three full 128-bit vector calculations per clock cycle. Unfortunately, the instruction fetch and predecode stage has not been expanded enough to keep up with the rest of the pipeline, so this is a serious bottleneck in many situations.

The section on AMD microarchitecture in my manual has also been revised.
Posted on 2006-08-14 02:29:27 by agner
Thanks, I'll be sure to grab a copy, and to inform my peers in other forums.
Posted on 2006-08-14 04:02:43 by Homer
Thanks again!  :D

Posted on 2006-08-14 04:03:52 by Biterider
Core2 sounds like a very nice architecture - especially considering that it's power comsumption has been dramatically reduced (off the top of my head I think it's something like 65W for a core2, 95W for something like AMD64x2 4400+, and 135W for the high-end P4 monsters).
Posted on 2006-08-14 04:55:21 by f0dder
I just wanted to say thanks. I thought the manual was great. :P
Posted on 2006-11-28 09:21:53 by Jeronimo0d0a

Have any of you read that paper?

I read only the intro, plus something of the first chapter. Thoose are filled to the roof of _wrong_ _assumtions_

(like his arguments againt writing assembly for instance)

May be reading the rest, but I fear it be a waste of time.
This book seem little but a strong argument against assembly.

and such argument would be definitly wrong guys? Wouldn't you say?

Posted on 2006-12-08 13:29:45 by Shakain
I don't see anything wrong in his arguments...

and you can just choose to ignore the introduction and focus on the actual optimization tips anyway.
Posted on 2006-12-08 15:27:36 by f0dder