Not specifically win32 but still an x86 assembly question.
Why does gcc -S (linux) produce 'sub esp, 24' at the beginning of
the program?

What exactly is it used for? Since I don't see any local variables but removing it
produces a seg fault.
Posted on 2003-10-03 11:34:19 by grv575
Maybe not local variables, but arguments to procedure.
Posted on 2003-10-03 12:41:30 by comrade
gcc often inserts tricks with esp to align the stack to 16 bytes for optimal performance.

Thomas
Posted on 2003-10-03 12:56:38 by Thomas
Looks similar to how Intel compiler works, according to their optimization manual...
Posted on 2003-10-03 14:59:58 by QvasiModo
aligning the stack? maybe. the thing is if that is correct then it's just an optimization. I should be able to remove it and have a functioning program. But in that case the program core dumps...

Still confused.
Posted on 2003-10-03 18:23:29 by grv575
GCC is C++ compiler, right?

If so, initialization code is needed to call the constructors of global objects.
If it's only a C compiler, then I don't know.
Posted on 2003-10-03 19:51:54 by tenkey
Without your C code, no one could give you the answer. It may be local variable space, it may be stack aligning, or it may be some black magic. And, moreover, gcc code generation is quite different between its versions. If I were you, I would post the C code and gcc version - or generated .s file.
Posted on 2003-10-03 20:58:38 by Starless
OK sorry if this post is long but here goes...




#include <stdio.h>

main(){
}

----------------------------------gives----------------------------------

main:
pushl %ebp
movl %esp,%ebp
.L2:
movl %ebp,%esp
popl %ebp
ret

---------------------------------------------------------------------------

#include <stdio.h>

main(){
printf("");
}

----------------------------------gives----------------------------------

.LC0:
.string ""
main:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
addl $-12,%esp
pushl $.LC0
call printf
addl $16,%esp
.L2:
movl %ebp,%esp
popl %ebp
ret



Notice a 24 byte deficit (sub 8, sub 12, push (sub 4)). Then 16 is added back (Most likely to account for the -12 and -4 push. So the sub 8 might be necessary for something--what though???)

The O2 optimized version just removes the add 16



main:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
addl $-12,%esp
pushl $.LC0
call printf
movl %ebp,%esp
popl %ebp
ret





#include <stdio.h>

main()
{
int i = 5;
printf(i);
}

----------------------------------gives----------------------------------

main:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $5,-4(%ebp)
addl $-12,%esp
movl -4(%ebp),%eax
pushl %eax
call printf
addl $16,%esp
.L2:
movl %ebp,%esp
popl %ebp
ret

---------------------------------------------------------------------------

#include <stdio.h>

main()
{
int i = 5;
printf("test%d", i);
}

----------------------------------gives----------------------------------

.LC0:
.string "test%d"
main:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $5,-4(%ebp)
addl $-8,%esp
movl -4(%ebp),%eax
pushl %eax
pushl $.LC0
call printf
addl $16,%esp
.L2:
movl %ebp,%esp
popl %ebp
ret

----------------------------------O2-------------------------------
.LC0:
.string "test%d"
main:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
addl $-8,%esp
pushl $5
pushl $.LC0
call printf
movl %ebp,%esp
popl %ebp
ret



So again a 24 byte deficit here. What looks like is happening is that 24 bytes are being allocated in GENERAL for local variables whenever they occur (so some slack space is there...it's still 24 bytes for 2 vars).

What I don't get is why 8 bytes are allocated for a memory reference and no locals (LC0) or 24 bytes if you disregard the stack cleanup after printf. Maybe the rule is allocate a 24 byte area and then as an optimization remove the stack cleanup since there will be no more variables, pushing/poping.

OK after more testing it does look like it's just being aggressive in reserving space. What someone said earlier about aligned stacks is probably true. All that's actually necessary is subtracting from esp to adust for local variables (mov , 1) and then cleaning up after parameter passing.

Does anyone know though if 'leave' is faster than it's equivalent and also
'enter 4, 0' faster/slower/smaller?
Posted on 2003-10-03 23:22:00 by grv575
For the absolutely correct answer, you should dig in gcc source code. I guess it would be one of files in config/* and config/i386/*. But I may be wrong and it might be in one of the mess in the source root. (I don't want to go back to that mess.)

After some testing, it seems to me that this is one of gcc's internal rule macro -- a C macro in one of the above files. Which one is it? I don't know. But I'm sure that reserving 24 bytes on the stack is not related to stack alignment at all.

The seg fault you mentioned in the first post might be a result of one of later `movl' which happened to overwrite the return address, or saved ebp.

And... Intel recommends `leave' for P6 and later CPU and discourages the use of `enter'. That will give you an idea about how fast they are.

<aside>
If you use linux, you have a better (and free) compiler from Intel. Use it instead of gcc. gcc is not the best compiler in the world. It was the best in linux world simply because there was no other compiler. In fact, for any platform that has vendor-supplied compiler (commercial or not), gcc beats the vendor compiler only in price.
</aside>
Posted on 2003-10-04 01:08:28 by Starless