Greeting my dear coder.

 i've read about U&V pipes for the pentium and found out that
 it's not difficult to optimize for the U&V pipe, it's just
 tricky and require abit of thinking. The question now is:

 if i optimize for the Penitum II U&V pipe, will it make
 any different on other Processor such as PIII and PI?

 and where can i get a reference or info on the PAIRING of
 each instruction? how do i find out if x instruction can/can't
 go into either the U or V pipes?

 thank for your time.

Posted on 2001-05-30 23:14:00 by disease_2000
There is a hlp file dealing with optimization in the \masm32\help\ directory. You can also download the Intel optimization manual on intel web site, and the Graphics Programming Black Book at Good luck !
Posted on 2001-05-31 03:04:00 by karim
disease, You will find that a PII has very similar architecture to a PIII so the code for one tends to work OK on the other. Have a good read of Agner Fog's optimisation manual for instruction timings and their capacity to pair in either the U or V pipeline. Generally you use the smaller instructions that pair better in both if you can write your code that way. The Intel manuals are very good information on the instructions so it is worth getting used to the PDF format that they come in. The first manual of architecture is good reading but a bit complicated and the second has a very extensive breakdown of each instruction that is available. Try out the method I suggested alsewhere to time algorithms as it helps you to tune your code to go a lot faster. With practice its reasonably easy to do. Regards,
Posted on 2001-05-31 05:12:00 by hutch--
The U & V pipes are on the P5 architecture, the Pentium & Pentium MMX. The P6 arch (Pentium PRO, PentiumII, and Pentium III) use an entirely different architecture. They also can process multiple instructions in one cycle (a-la U&V pipes), but they are restricted in a different way. The P6 arch is effectivly a CISC wrapper around a RISC processor. All the P6 instruction are broken down into micro-ops, and these are then executed by the processor core (the RISC bit). There are 3 decoders, one that can execute 3 micro-ops, and two that can perform 1 micro-op each. In order to maximise through put on the P6 arch, you need to organise code in a 3-1-1 micro-op form. See Agner Fogs help file for the instruction timings (there are two sections PPlain & P2/3), it gives a break down of all the micro-op timings. Also remember, there are plenty of other quirks to the Pentium family (partial register stalls etc.), get the intel docs on the instruction set, and their optimisation manual, plus read Agners help file! I'm not sure about the P4, and how to optimise for it, but its probably not worth it anyway (its a bit crap really)! :D Mirno
Posted on 2001-05-31 06:30:00 by Mirno

 thanx for the link Karim! it's funny how i brought that book
 (a special edition) whith no CD (was missig, i guess someone
 stoled. so they sold it to me only 9 buck!). and now, have a
 full version of a normal edition with alot of examples. thank

 thank for replying, Hutch and Mirno. i've check the Agner's help
 file and intel document (i actually downloaded all the PDF ;)
 from 386 to Pentium III). but didn't find any useful info, maybe
 later on when i understand more about assembly - it will come
 into use.

Posted on 2001-05-31 14:01:00 by disease_2000