What is generally faster:


; ...



push esi
push edi
push ebx

; ...

pop ebx
pop edi
pop esi

I only have a Pentium III Celeron but I wondered what Pentium 4 and Athlon chips prefer?

Posted on 2003-11-13 19:46:13 by C0D1F1ED
Pushad generates 8 internal push instructions and 1 mov, the other way has 3 pushes so individually is faster.

[b]PUSHAD =[/b]

mov temp,esp
push eax
push ecx
push edx
push ebx
push temp
push ebp
push esi
push edi
Posted on 2003-11-13 20:03:11 by donkey
The first option is faster to write :grin: Are the clock cycles saved worth the extra typing?
Posted on 2003-11-13 21:29:22 by Odyssey
On Athlons PUSHAD and POPAD only has a latency of 6 cycles each, whereas the PUSH/POP reg32 instructions have a latency of 3/4 cycles each, respectively. I know that two PUSHs/POPs can execute concurrently even though one is direct path and the other is vector path. My estimation is that pushing four registers is the same as pushing all of them. In the question above only three registers are being PUSHed/POPed, so the individual instructions would be faster.

*I haven't timed it - only read the manual.

**On a side note, IIRC, AMD is developing agressive stack optimizations into their processors from Clawhammer onward - similar to the Intel Centrino (notice the performance boost for greatly slower clocked CPU?). This is due to compilers heavy use of the stack.
Posted on 2003-11-13 22:10:58 by bitRAKE
Typing is not an issue, the code gets automatically generated. :cool: It is for a triangle rasterizer and is used per scanline, so it is executed several thousand times. I don't alter esp and ebp so these are the registers I need to preserve.

Since most compilers use individual push/pop I guess it's best for me to do the same. Thanks for the confirmation!
Posted on 2003-11-14 03:24:46 by C0D1F1ED