Okay I just wrote something on the stack....

Stack
The stack is a place where data could be stored and revived (well, sort of), and place where parameters, local variables are stored. esp and ebp are stack-related pointers. esp is the stack pointer, while ebp is the base pointer. When you enter/step into most functions, usually a stack frame would be created, and you can use ebp to access parameters and local variables. However functions can be created without the creating of a stack frame. An *important* point to note is that the stack should be aligned to DWORD (align to 4), if not it would raise some general protection fault (or simply known as GPF), and *NT* are extremely touchy to stack alignment issue.

Furthermore it is stated on an intel document that "A misaligned access in the data cache or on the bus costs at least three extra clock cycles on boundary, costs nine to twelve clock cycles on Pentium Pro and Pentium II processors. Intel recommends that data be aligned on the following boundaries for the best execution performance on all processors:
1) Align 8-bit data on any boundary
2) Align 16-bit data to be contained within an aligned 4-byte word
3) Align 64-bit data on any boundary which is a multiple of eight
4) Align 80-bit data on a 128-bit boundary (that is, any boundary which is a multiple of 16 bytes)"

Example (MASM):
.386
.model flat,stdcall
option casemap:none
include /masm32/include/user32.inc
include /masm32/include/kernel32.inc
includelib /masm32/lib/user32.lib
includelib /masm32/lib/kernel32.lib
.code
start:
jmp @F
testing db "Stack needs to be aligned to dword"
@@:
sub esp,2 ;remove the dword align, the code would definitely crash on *NT* system
invoke MessageBox,0,OFFSET testing,0,0
invoke ExitProcess,0
end start

Now, more about stacks and its related opcodes. The most common opcodes related to the stack are 'push' and 'pop'. The usage is something like push eax, as in you push the data on eax to the stack. The esp (which holds the pointer to the stack) is then decemented by the size of data you pushed onto the stack. Similarly,when you pop eax, the data to the stack is moved to eax. The esp is then incremented by the size of the data moved from the stack.

Example
push eax ; = mov ,eax sub esp,4
pop eax ; = mov eax, add esp,4

However, it is said that mov is faster than pushes and pops. Thus some member (stryker/arkane) at the forums (win32asm) have camed out with the xcall marco which is supposed to be faster than invokes, as it replaces all the pushes with mov and sub. Of course there are some limitations which are that the marco cannot handle direct memory and cannot handle BYTE, WORD, QWORD, TBYTE size parameters (Well, who uses parameters other than DWORD nowadays?).

;by gfalen
@str MACRO _str:VARARG
LOCAL @@1
IF @InStr(1, <_str>, <!"> )
.DATA
@@1 DB _str, 0
.CODE
EXITM <OFFSET @@1>
ELSE
EXITM <_str>
ENDIF
ENDM

;by stryker
xcall MACRO function:REQ, parameters:VARARG
LOCAL psize, paddr, plen
IFNB <parameters>
psize = 0
FOR param, <parameters>
psize = psize + 4
ENDM
IF psize EQ 4
push parameters
ELSE
sub esp, psize
psize = 0
FOR param, <parameters>
IF @SizeStr(<param> ) GT 4
paddr SUBSTR <param>, 1, 5
IFIDNI paddr, <ADDR >
paddr SUBSTR <param>, 6, @SizeStr(<param> ) - 5
lea eax, paddr
mov DWORD PTR , eax
ELSE
mov DWORD PTR , @str(<param> )
ENDIF
ELSE
mov DWORD PTR , @str(<param> )
ENDIF
psize = psize + 1
ENDM
ENDIF
ENDIF
call function
ENDM

The uses of push and pop are to store data temporarily (store data on the stack) and to pass parameter (pop are not used though). There are some opcodes that help to store and later restore values in the registers. They are namely pushad (pusha being the 16bit version), popad (popa being the 16bit version), pushfd (pushf being the 16bit version) and popfd (popf being the 16bit version). For pushad, all general registers are pushed onto the stack in the following order: eax, ecx, edx, ebx, esp ,ebp, esi and edi. For popad, the registers are popped off the stack in the following order: edi, esi, ebp, esp, edx, ecx, eax. For pushfd, the Flags register (EFLAGS) is transferred onto the stack. For popfd, the data from the stack are popped into the Flags register (EFLAGS).

Stack frame
Eariler on, I have mentioned that ebp is the base pointer and its uses are to access the local variables and parameters passed to the function. Below I have listed a sample MASM code (MASM have certain internal macro, one of it is to create an internal stack frame) and the code produced (viewed from a disassembler). The following codes shows how parameters can be access, and how a stack frame is created so as to access the parameter with ebp.

test47 proc par1:DWORD,para2,para3,para4
mov eax,par1
mov ecx,par2
mov edx,par3
mov ebx,par4
ret
test47 endp

*becomes* this after compiling (due to some MASM internal macros, which sets up the stack frame)

test47:
push ebp ; store value of ebp on stack
mov ebp,esp ; copy value of esp to ebp
mov eax, ; original value of ebp stored at , DWORD PTR = par1
mov ecx, ; DWORD PTR = par2
mov edx, ; DWORD PTR = par3
mov ebx, ; DWORD PTR = par4
leave
ret 10h ; sizeof parameters * number of parameters = 4*4

The code "push ebp" and "mov ebp,esp" creates a stack frame. The instruction "leave" removes the stack frame by esp and ebp back to their condition before the stack frame is initialized. the "ret 10h" tells the processor to transfers control from the procedure back to the instruction address saved on the stack (surprise, surprise the stack is used to store the initial value of ip when "calling" a function. The the address of the function is loaded to eip and code continues with excution according to eip), and then 'release' 16 bytes. One may ask why the first parameter is stored in DWORD PTR and not DWORD PTR. This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR contains the original value of ebp. Parameters could be accessed via DWORD PTR

The above code shows how a stack frame is created and how ebp is used to access the parameters passed to the functions. The following code (MASM) would show how ebp can be used to access local variables (Local variables are acutally data *stored* on the stack).

test124 proc par1:DWORD,para2,para3,para4
LOCAL buffer[32]:BYTE
LOCAL dd1:DWORD
LOCAL dd2:DWORD
mov eax,dd1
mov dd2,eax
lea eax,buffer
ret
test124 endp

*becomes* this after compiling (due to some MASM internal macros, which sets up the stack frame)

test124:
push ebp
mov ebp,esp
add esp, -28h ; to ensure that the local variables are not corrupted by data pushed onto the stack
mov eax, ; DWORD PTR = dd1
mov ,eax ; DWORD PTR = dd2
lea eax, ; = address of first byte in the array
leave
ret 10h ; sizeof parameters * number of parameters = 4*4

Okay, so the code is almost similar to the above code, creating a stack frame. The instruction "add esp,-28h" might seem weird, but it has its purpose. It is to ensure the values stored in local variables are not corrupted any data when something is pushed onto the stack. (Hopefully I do make some sense.) However I cannot comprehend why MASM produce "add esp,-28h" instead of "sub esp,28h". Maybe it is due to some macro defined deep into MASM. Local variables differ from parameters in the fact that they are accessed by negative displacement (Remember the fact that when you push something, the value of esp would decrease). I think it would be easier to understand how to access local variables by examining how to calculate the displacement needed to access a certain local variable (by looking at the above example) than my explanation.

Some code *gurus* definitely cares about how big the code size and how fast their code runs. To optimise their code, they might even not have a stack frame in their functions (Yes, it is possible and I would show you how). Removing stack frame can shave off some clocks and some bytes (push ebp = 1byte, mov ebp,esp = 2 byte, leave = 1 byte, total bytes saved = 4). When stack frame is removed, remember that pushing data would cause a change in the value of esp. You need to manually adjust the offsets from esp. Also, if you don't have a stack frame you don't want 'leave'. The following codes are ways to create functions without stack frame.

call function1
...
function1:
nop ; to represent whatever code present
ret 4*numberofparameter

or

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
function2 proc par1:DWORD,par2,par3,par4
nop ; to represent whatever code present
ret 4*4
function2 endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF

or

function3 proc
par1 equ <esp+4>
par2 equ <esp+8>
par3 equ <esp+12>
par4 equ <esp+16>
nop ; to represent whatever code present
ret 4*4
function3 endp

And so this concludes my discussion on stack. Thank you and have a nice day.

Posted on 2003-05-01 01:58:41 by roticv
Nice work roticv, even if the order things are presented is a bit messy :) - you cover a good amount of things though. Misaligned stack doesn't necessarily give GPF, but it does have weird effects - and it's true that especially NT is _very_ picky.

Since you're also covering data alignment a bit, it's worth noting that SSE2 requires 16byte data alignment - unless you use the much slower non-aligned instructions. Using the standard load/store on nonaligned adresses gives an exception (privileged instruction, iirc).

It's worth mentioning that you shouldn't push word-sized data to the stack, as you get the dreaded misalign that way.

It's worth mentioning that "ret 10h" is because of the STDCALL calling convention, while C calling convention would only do "ret" and leave it to the caller to adjust esp. Instead of "release 16 bytes", perhaps say "add esp, 16 - thereby removing function parameters from the stack".

Perhaps rewrite the whole "It is to ensure the values stored in local variables are not corrupted any data when something is pushed onto the stack." thing as "reserve stack space for local variables"... might still be worth explaining the "to avoid push trashing local variables" thing.

Usually, the reason for not using a stack frame is either that you don't need it, or that you want to use EBP as a general purpose register - not so much to save the push ebp :). Push/pop ebp would still be needed if you want to use EBP as a general purpose register, since you have to return it in it's original state.

You should also add that if you don't use a stack frame, you cannot use local variables (in the automated masm way), nor can you access function parameters the usual way - you have to handcode (well, unless somebody has macros) all ESP references, and remember to further adjust these if you do push/pop.
Posted on 2003-05-01 03:50:25 by f0dder
roticv,

If you are worried about stack alignment to 4, this macro will probably do the job OK.



align_4 MACRO reg
add reg, 3
shr reg, 2
shl reg, 2
ENDM

align_4 esp


Regards,

hutch@movsd.com
Posted on 2003-05-01 03:54:09 by hutch--
hutch, the stack alignment problem is more an issue with "push wordsized_variable" - under normal operation, you don't really need to do any stack alignment, except perhaps if you're 16-byte aligning your local stack variables for speed and/or SSE2 compliance.
Posted on 2003-05-01 03:56:30 by f0dder
f0dder,

I am sure roticv know how to use any suggestion I make without assistance.

Regards,

hutch@movsd.com
Posted on 2003-05-01 04:20:36 by hutch--

roticv,

If you are worried about stack alignment to 4, this macro will probably do the job OK.



align_4 MACRO reg
add reg, 3
shr reg, 2
shl reg, 2
ENDM

align_4 esp


Regards,

hutch@movsd.com
Why not instead:


align_4 MACRO reg
add reg, 3
and reg, 0FFFFFFFCh
ENDM
Posted on 2003-05-01 05:01:37 by Maverick
maverick,

I like it. :alright:

Regards,

hutch@movsd.com
Posted on 2003-05-01 05:09:27 by hutch--
Maverick, what about a generic align (well, only powers of two are necessary).
Manually aligning stack to 4 is a bit silly, since it automatically is under normal circumstances.
The problem with unaligned stack that roticv is describing comes from pushing word-sized variables - a "no-no" under win32, which should be fixed instead of doing symptomatic treatment.

A general align macro is useful though, and could be used for aligning locals to, say, 16 byte alignment as required by SSE2 data.
Posted on 2003-05-01 05:14:43 by f0dder

a general ALIGN macro would be extremely simple, but too much assembler-dependent, that's why I'll refrain from showing any.

In any case, it's just a simple:


ADD Reg,Alignment + AlignmentOffset - 1
AND Reg,-Alignment

I see aligning stack on arbitrary boundaries as a very useful feature, in some situations. The way I visualize it is just "stack is local memory, quickly allocable/freeable". So it's natural that in some weird, rare but certainly possible cases, one may need a certain alignment.
I used to align to e.g. 256 in some gfx routines, when the partial register stalls typical of the P6 core were still a thing to come. But there's still some need left in specifying arbitrary Alignment and AlignmentOffset heap and stack allocations.
Posted on 2003-05-01 05:34:16 by Maverick
Thanks for your comments :) I will make neccessary change s and addition.:alright: I almost thought no one read this thread... hehehe
Posted on 2003-05-01 06:25:40 by roticv
Roticv, after some work is done on it, this could end up a pretty useful addition to the FAQ section.
:alright:
Posted on 2003-05-01 06:27:16 by f0dder
roticv,

"surprise, surprise the stack is used to store the initial value of ip when "calling" a function. "

and here:
"This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR contains the original value of ebp."
and here:
push eax ; =mov ,eax sub esp,4

Are you sure?
What is better for you: to have a stack frame or to haven't?


"DWORD PTR "
What is positionofparameter? 0,1,2,3 or 1,2,3, 4

"(Local variables are acutally data *stored* on the stack)."



test124 proc par1:DWORD,para2,para3,para4
LOCAL buffer[32]:BYTE
LOCAL dd1:DWORD
LOCAL dd2:DWORD
mov eax,dd1 ; in eax ->garbage due to dd1 isn't initialized
mov dd2,eax ; you must write something in dd1
lea eax,buffer ; before reading
ret
test124 endp


"However I cannot comprehend why MASM produce "add esp,-28h" instead of "sub esp,28h".
Maybe it is due to some macro defined deep into MASM. "

and

"Of course there are some limitations which are that the marco cannot handle direct memory and cannot handle BYTE, WORD, QWORD, TBYTE size parameters.."

Just don't use macros and other HLL stuff (due to it "hides" and limits the things)
and you will have the freedom to write what you want including non 286asm code! It is assembly...

"(Well, who uses parameters other than DWORD nowadays?)."
Who uses sub/mov rather then push/pop! The stack is just a memory...

"This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR contains the original value of ebp"
What do you have at DWORD PTR?

"The uses of push and pop are to store data temporarily (store data on the stack) and to pass parameter (pop are not used though)."
Why not?


strlen:
pop ecx ; ecx = return address
[B]pop eax ; eax = parameter->lpstr[/B]
push ecx ; ecx = return address
..... ; strlen code
....
ret ; rather then [B]ret 4[/B]


You can substitute:

"code *gurus*"
with "people"

"And so this concludes my discussion on stack. Thank you and have a nice day."
with "To be continued..."

Regards,
Lingo
Posted on 2003-05-01 06:55:19 by lingo12
hutch,
you can find the original here:
http://www.asmcommunity.net/board/showthread.php?threadid=7342&perpage=15&highlight=and%20eax%200FFFFFFFCh&pagenumber=2


"Huh try this instead:
add eax, 3
and eax, 0FFFFFFFCh

Mirno"


"A more general solution would be this:
add eax, R-1
and eax, -R
where R is the power of two you want to round up to.

Gliptic"



Regards,
Lingo
Posted on 2003-05-01 17:34:23 by lingo12

Now, more about stacks and its related opcodes. The most common opcodes related to the stack are 'push' and 'pop'. The usage is something like push eax, as in you push the data on eax to the stack. The esp (which holds the pointer to the stack) is then decemented by the size of data you pushed onto the stack. Similarly,when you pop eax, the data to the stack is moved to eax. The esp is then incremented by the size of the data moved from the stack.

Example
push eax ; = mov ,eax sub esp,4
pop eax ; = mov eax, add esp,4


A slight error it seems...
Actually, push decrements the stack pointer and then stores the data at the new pointer location...



push eax ;
; = sub esp,4
; mov [esp],eax
pop eax ;
; = mov eax,[esp]
; add esp,4
Posted on 2003-05-01 21:59:55 by V Coder
What about word push? will it crashes on NT?.

So Ret 4 is releasing 4 byte at stack is not it ?




Push X ; Sub ESP, 4
push Y ; Sub ESP,4 , so ESP is Sub by 8
mov eax,[esp] ; Test the values
ret 8 ;Release 8 byte.


huh ?
Posted on 2003-05-02 09:59:15 by realvampire
Lingo,

Thanks for the link to the old thread, my head always blocked to any "magic number" type algos but the two examples do make sense of how it works.

Regards,

hutch@movsd.com
Posted on 2003-05-02 10:14:43 by hutch--

What about word push? will it crashes on NT?.

So Ret 4 is releasing 4 byte at stack is not it ?




Push X ; Sub ESP, 4
push Y ; Sub ESP,4 , so ESP is Sub by 8
mov eax,[esp] ; Test the values
ret 8 ;Release 8 byte.
PUSH won't crash. But the above code will. RET will attempt to use Y as the return address.

RET n works this way:

Step 1: pop the EIP value first
Step 2: pop n bytes (assumed to be the function arguments)
Posted on 2003-05-02 15:30:31 by tenkey
word-sized pushes (ie, unaligned stack) won't necessarily _crash_ on NT - though there's a good chance it might. However, stuff will "work very weird" - the easiest example is to "sub esp, 2" and show a MessageBox.
Posted on 2003-05-03 06:35:17 by f0dder
Yes I was mistaken. push eax = sub esp, 4 mov ,eax

"This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR contains the original value of ebp"
What do you have at DWORD PTR?

dword ptr = return address
Posted on 2003-05-09 09:17:53 by roticv
So it will work




push eax
push ecx
mov esp,ebp
ret

Posted on 2003-05-09 09:44:08 by realvampire