Hi,
I am just curious to know if this,

push var1
pop var2

is better than,

mov eax,var1
mov var2,eax

I would like to know are there any advantages or disadvantages using either technique?
Thank you,
Barry
Posted on 2002-11-21 21:31:22 by bgong68
bgong68,
Think of it this way. The PUSH var1 instruction has to make two memory references, var1 and the stack. POP has to make two more memory references, stack and var2. That's a total of 4 memory references. Now, MOV EAX,var1 makes one memory reference and MOV var2,EAX makes another memory reference for a total of two memory references. All things being equal, what do you think is faster? Ratch
Posted on 2002-11-21 21:43:46 by Ratch
Hi Ratch,
I am new to Assembly but I am going to say based on your explaination the 2 mov instructions are better. Thank you very much formaking it clear.
Barry ^_^
Posted on 2002-11-21 21:54:41 by bgong68
Ratch is correct as i remember from another thread where we discussed the same thing.

In a program that uses loops extensively mov is much better than pop or push...saves a lot of processor time.
Posted on 2002-11-21 22:38:13 by IwasTitan
Hi,

often procedures have to return 1 if successful so i looked at what the C compiler does, it makes:

push 1
pop eax

is this better than mov eax,1 ? i notice the mov eax,1 instruction is very long, maybe xor eax,eax then inc eax? I'm guessing the last way is the best or not?

thank you,
-stormix
Posted on 2002-11-22 07:32:08 by stormix
Please name that compiler, so we use it in threads on "asm is useless, compiler technology is much superior than any human". :grin:
Posted on 2002-11-22 08:04:58 by Maverick
Hi Maverick,

I'm not sure what you mean, this is msvc 6 with optimisations. Are you saying that it's way is the best?
Posted on 2002-11-22 08:12:27 by stormix
No, that code sequence is not optimal for speed, and I assume that you were optimizing for speed, not size (3 bytes). If the optimizer took care of the preceding code, there could be better (faster) solutions also for that same code size, but I doubt++ that the optimizer analyzes the CPU status coming from the previous code.

The PUSH/POP nature of that code is typical of the internal function of compilers, btw.

VC6 and VC7 are renowed compilers, with renowed optimizators.. expecially VC7, but as I was writing to bitRAKE in a email few hours ago, I've seen very embarassing code (with max speed-optimization turned on) generated even by VC7. E.g. it doesn't know "advanced" addressing modes, nor is capable to merge for example these two instructions:


MOV EAX,[EDI+123456]
PUSH EAX

into a banal:


PUSH [EDI+123456]

Considering that they're the base of function calling, for a compiler this is something basilary to say the least, and of course it appears very frequently (at each parameter for each function call :) ).

I'm sure++ VC7's optimizer is an extremely compicated and bloated piece of hardcoded-situations code.. if they thought in more simple terms, it would not fail in such naive ways. I've seen many other naive solutions I dont recall in detail now, the one I showed above is just the most frequently appearing (at each parameter push for each function call).

Looking at VC7's asm output may show many interesting details.
Posted on 2002-11-22 08:36:00 by Maverick
Doesn't opcode size affect the performance, because it may take longer to decode and move the instruction into cache? Not sure about this, big lack of knowledge here :/
Posted on 2002-11-22 10:22:26 by nyook
stormix,

You be the judge of the below code:


00000024 B8 00000001 MOV EAX,1 ;no memory references plus no posibility to separate and interleave

00000029 6A 01 PUSH 1 ;one memory reference plus posibility to separate and interleave
0000002B 58 POP EAX ;another memory reference plus posibility to separate and interleave

0000002C 33 C0 XOR EAX,EAX ;no memory reference plus posibility to separate and interleave
0000002E 40 INC EAX ;no memory reference plus posibility toseparate and interleave

Ratch
Posted on 2002-11-22 10:43:31 by Ratch
nyook:
Opcode size affects performance inside loops. If the loop is long enough to span a couple of cache lines, it can really thrash the caches and give a big performance hit.
Also very long instructions (greater than 7 or 8 bytes I think) take longer to decode (on the PII and above), but they are few and far between. In general there won't be several sucessive big fat instructions in sucession. Hence the decoders will outrun the executors creating a backlog which can be run while the "chunky monkey" instruction is decoded in two clocks. The decode/execution engines are balanced in favour of the decode, so the execution is starved as little as possible.

Its also worth remembering that as a human being we can select "mov eax, 1" over "xor eax, eax / inc eax" if it suits our purposes... Say alignment for example, compilers have trouble seeing such things.

As a general rule - memory accesses bad (even through the stack, which should be cached pretty much ALL the time) - if you can avoid them you should.

Also if this is at the end of a function call, then speed isn't critical - you can't be in a high speed loop if you are returning! So generally instruction size isn't a problem...

Mirno
Posted on 2002-11-22 11:14:43 by Mirno
I just played around with MSVC++ 6 and was unable to produce the "push 1, pop eax" using any optimization switch. Could you post the code that produced that, and post the compiler switches in your project settings?

CL.EXE normally does a nice job with speed/size optimization and I find it hard to believe that it would make such a simple error under a speed optimizing context.
Posted on 2002-11-22 12:16:17 by iblis
I use the flags in my own functions - why trash a perfectly good register. ;)
Posted on 2002-11-22 12:19:55 by bitRAKE
Indeed!

It would be nice if bool type functions in C would return the result in the carry flag. Does anyone know if there are any C compilers out there that do this?
Posted on 2002-11-22 12:26:33 by iblis
@Maverick:
I had thought it was optimising for speed but in fact it does this when optimising for size, otherwise it would put mov eax,1. I can't comment on vc7 but looking at vc6's asm output it does use push/pop a fair bit when putting immediate numbers into registers.

@Ratch:
Thank you very much, I can see which to use where now :)

@Iblis:
I created a DLL project then changed to "win32 - release" but changed optimisations to minimum size, the exact switches were "/MT /W3 /GX /O2 /FD". The DllMain function contained only "return TRUE;" and this is the disassembly:

.text:10001000 _DllMain@12 proc near
.text:10001000 push 1
.text:10001002 pop eax
.text:10001003 retn 0Ch
.text:10001003 _DllMain@12 endp

the cdecl stdcall and fastcall calling conventions all have the result returned in eax so i don't think you could do this with msvc (in C at least).

thanks for everyone's helpful comments on this :alright:

-stormix
Posted on 2002-11-23 07:08:32 by stormix
Hello stormix, you wrote:
@Maverick:
I had thought it was optimising for speed but in fact it does this when optimising for size, otherwise it would put mov eax,1. I can't comment on vc7 but looking at vc6's asm output it does use push/pop a fair bit when putting immediate numbers into registers.
By the way, here's an example of what I mentioned in my last post:

For CL.EXE I'm using /c /nologo /Ogtyb2 /Gs /G6 /Gz /Zp1 (AFAIK these produce best speed optimization)

E.g., this source:



void MyFunction(int a);

int a;

main() {
MyFunction(a);
}


Produces:



PUBLIC _main
EXTRN ?MyFunction@@YGXH@Z:NEAR ; MyFunction
; Function compile flags: /Ogty
; File d:\coding\visualc7\bin\test.cpp
_TEXT SEGMENT
_main PROC NEAR

; 8 : MyFunction(a);

mov eax, DWORD PTR ?a@@3HA ; a
push eax
call ?MyFunction@@YGXH@Z ; MyFunction

; 9 : }

xor eax, eax
ret 0
_main ENDP
_TEXT ENDS


Clearly the mov[]/push part is not optimal, and could be merged into a push[].
Posted on 2002-11-23 07:42:42 by Maverick
The only difference I can see is that one uses a register while the other does not. In the middle of an algo when you are short of registers, push/pop is a useful option. Theory is that the two MOV instructions are faster but it will depend on how and where it is used.

My general view is in NON speed critical code, use push/pop but if you are writing an algo that must perform well, work out which performs best within the algo design.

Regards,

hutch@movsd.com
Posted on 2002-11-24 02:31:19 by hutch--
From my point of view,
xor eax, eax
inc eax

is the best solution, as far as you just have to return 1.
It's understandable, it's fast, and it's small :)
Posted on 2002-11-24 03:51:42 by nyook
Yup, certainly better than PUSH 1 / POP EAX.
Posted on 2002-11-24 03:54:47 by Maverick

Yup, certainly better than PUSH 1 / POP EAX.


thx, I feel accepted :D ;)
Posted on 2002-11-24 09:06:48 by nyook