What is quicker (in segment with default=1)?



678D1F LEA EBX,[BX] ; pairable in either pipe, takes 1(prefix 67)+1 cycles
0FB7DB MOVZX EBX,BX ; not pairable, takes 3 cycles


Is it correct?

Note both instructions perform the same operation.
Posted on 2003-07-23 16:16:34 by MazeGen
lea will partal reg stall. movzx is designed not to.
Posted on 2003-07-25 18:12:42 by Qages

lea will partal reg stall


LEA EBX, causes partial register stall? Are you sure? And MOVZX EBX,BX too?
Posted on 2003-07-29 15:34:36 by MazeGen
movzx is designed not to

this statment states that movzx is designed not to stall. stall is not included in the statment becuase it was the topic of the sentance before it.
Posted on 2003-07-29 16:20:36 by Qages
You guys are talking about two different CPUs.
MazeGen asks about 'pairability' which is a P5 specific thing. Qages talks about 'partial register stall' which is _not_ P5 thing. Maybe people can comment on the question if the target CPU is specified.

BTW, the partial register stall happens when you write to partial register and then read the larger part of the register within certain period of time. Reading (and not writing) partial register does not cause stalls.

Ah, one more thing. If you switch between 16 bit mode and 32 bit mode, that should be included in the timing calculation. (Of course, if you are working only in 16 bit mode, lea line is viable.)
Posted on 2003-07-29 17:35:17 by Starless

You guys are talking about two different CPUs.
MazeGen asks about 'pairability' which is a P5 specific thing. Qages talks about 'partial register stall' which is _not_ P5 thing. Maybe people can comment on the question if the target CPU is specified.

BTW, the partial register stall happens when you write to partial register and then read the larger part of the register within certain period of time. Reading (and not writing) partial register does not cause stalls.

Ah, one more thing. If you switch between 16 bit mode and 32 bit mode, that should be included in the timing calculation. (Of course, if you are working only in 16 bit mode, lea line is viable.)


Starless, I asked not only about pairability, but about speed in general. As I think, pairability is and will be important any time on any x86 processor (why do you think it's P5 specific?), partial register stalls as well.

I agree with you that the LEA EBX, doesn't cause stall - only when (according to Agner Fog's documents) "the LEA instruction will suffer an AGI stall on the PPlain and PMMX if it uses a base or index register which has been written to in the preceding clock cycle". So it looks that's is the only problem with LEA.

Segment with bit default=1 means 32 bit mode.

I will be glad if we will continue in this dialogue ;)
Posted on 2003-07-30 11:54:43 by MazeGen
(why do you think it's P5 specific?)

Because you added that movzx is not pairble, which means you were talking about P5 execution pipes. P6 has 5 execution ports which is not like U-V pipes on P5. (You know it very well since you read Agner's note.) Both lea and movzx are 1 micro-op instructions, and lea goes to port 0 and movzx goes to either port 0 or port 1.

If you want to use the word 'pairablity' to indicate the general parallel execution... On P6, lea is 'pairable' with any instruction which does not generate micro-ops which are bound to port 0, and movzx is 'pairable' with any instruction which does not generate micro-ops for port 0 and port 1 at the same time. (Actually, this is not an accurate description of P6 execution unit. You'd better read Intel's optimization manual for better explanation.)

Segment with bit default=1 means 32 bit mode.

Oh, did you mean the default addressing mode bit of the segment descriptor? Then, disregard my comment about it. Only thing that remains is the decoding overhead with the prefix.
Posted on 2003-07-30 17:06:35 by Starless
Thanks Starless, I have to study more ;)

BTW, what about newer Agner's optimizing docs? On his homepage there is last update from 2000-07-03. Can you tell me, what is the best for studying optimizations in these days?
Posted on 2003-08-03 04:30:46 by MazeGen
agners is still about the best but i guess u have to read the intel docs for the most up-to-date info
Posted on 2003-08-05 14:01:23 by AnotherWay83