What would be faster on a pentium 3 and 4. And generally speaking.
These surround the basic question if an operation on a memory address gets delayed or extra

cpu cycles for using:
+ register
+ displacment
* register
* immediate
etc.. if it does get stalled then maybe its better just to waste one cycle on adding it before hand.


mov ebx, [edi+esi*4]
or: add edi, 4
mov ebx, [edi]

xor eax, [edi+esi*4+4]
or: add esi, 4
xor ebx, [edi+esi*4]

xor eax, eax
or: mov eax, 0

xor ah, ah
or: mov ah, 0

pop eax ;4 cycles?
or: mov eax, [esp] ;1 cycle?
sub esp, 4 ;1 cycle?

inc esi
mov eax, [edi+esi*4]
or: add esi, 4
mov eax, [edi+esi]
Posted on 2004-11-12 05:17:59 by pwn
You are not doing an apples to apples comparison. You introduced a stall in some of your examples so it's not an exact comparison. Any time you are using the *4, you introduced a stall in the alternate example when you ADD 4 to a register. The MOV after it will stall until the result is ready. Outside of that problem using *4 is just as fast as not. The P4 optimization manual says that it should be slower, but I've only seen it slower with LEA.



mov ebx, [edi+esi*4]
or: add edi, 4
mov ebx, [edi] ;stalls waiting for EDI to get updated.






mov ebx, [edi+esi*4] ; fine
or: add edi, 4 ;ignoring stall is about the same speed
mov ebx, [edi]

xor eax, [edi+esi*4+4] ;same answer
or: add esi, 4
xor ebx, [edi+esi*4]

xor eax, eax ;xor eax is better it also removes dependencies
or: mov eax, 0

xor ah, ah ;use xor eax if possible in 32-bit code, you can get partial register stalls by accessing a portion of a register
or: mov ah, 0

pop eax ; pop is faster
or: mov eax, [esp]
sub esp, 4

inc esi ;got another stall here.
mov eax, [edi+esi*4]
or: add esi, 4
mov eax, [edi+esi]


If you are trying to learn more about optimizing, I have an assembler optimization web page. 60 tricks you can only do in assembler and not in a high level language to speed up your code.

http://www.visionx.com/markl/optimization_tips.htm
Posted on 2004-11-12 17:13:26 by mark_larson
thx. ill check out the site, it looks pretty good.
anyways as far as i know if you dont want to get a stall when you use a register to reference memory address, then your latest change of that register should be 4 instructions ago or more.



inc edi ; Change memory reference
mov eax, 3 ; bogus code
inc ecx ; bogus code
xor edx, edx ; bogus code
neg eax ; bogus code
mov [edi], edi ; Use memory reference. No AGI stalls.


that should eliminate the agi stall. but since its not really practical, and often hard to find code you can put in between, i rarely use it.
i think the same goes for all registers that you modify, and then need to read their values. you need the same spacing not to get stalls.
anyways i will check out your site. have a good day.
Posted on 2004-11-12 21:23:49 by pwn