The Stack
Stack
While programming in assembly language, you will often require to save the contents of a register to free it for other purposes. You could either copy the contents of the register to another available register or to a memory location reserved for such a purpose. An array of such reserved memory locations is called a stack. It is used for a number of things but mainly for local variables.
The stack is a linear data structure--an array of allocatable locations in memory. Memory allocations and deallocations in the stack occur on a last-in-first-out (LIFO) basis. The first data to come into the stack becomes the last one to go out. So, whenever you PUSH some data into the stack, remember to POP it out in the reverse order. It is important to make sure that everything that you push onto the stack is popped off it as well. This is called balancing the stack and if it is not done your program will crash in short order. For example, say you PUSHed in:
1 2 3 4one after another. You'll have to POP them out in the reverse:
4 3 2 1You can think of the stack as a closed box of fixed size with only one of the sides (top) open. Because the size of the box is fixed, you can put only an allowable number of CDs inside. If the box can contain 16 CDs, you can't put a 17th CD inside. Nonetheless, the one that you put in first will always be the last one to come out. Also, the last one you put in will be the first one to be picked out. This simply means that you'll have to pick out the CDs in the reverse order of placement. The structure of the stack In the early DOS times, the .COM (COre iMage) executable format was limited to 65,536 bytes. The code, data, and the stack had to be fit into this tiny space, and saving room was high priority. Code and data occupied space downward, from byte 0100 (origin), and the stack was organized backward, from byte 0FFFF. The code was usually placed first, then the static data, then what we now call, the virtual data, eventually growing down, while the stack was placed eventually growing upward. Stacks in executables are still placed proceeding backward. There can be many stacks present at a time in memory, but there's only one current stack. Each stack is located in memory in its own segment to avoid overwriting other parts of memory. This current stack segment is pointed to by the stack segment (SS) register. The offset address to the most recently allocated stack location, called the top of the stack (TOS), is held in another register called the stack pointer (SP). The 16-bit SP register is used only in the 16-bit environment; in a 32-bit environment, the 32-bit extended stack pointer (ESP) register is used to point to the TOS. A pointer called the stack base (SB) points to the first entry that you put into the stack. Adding an entry into the stack fills the next allocatable location (lower memory address) and changes (decrements) the stack pointer (SP) to point to the that location. Removing an entry empties the current location and changes (increments) the stack pointer to point to the previous allocated location (higher memory address).
- Each entry pushed into the stack makes the allocated section of the stack grow downward (toward lower memory addresses) and decreases SP by unit size.
- Each entry popped out shortens the allocated section of the stack upward (toward higher memory addresses) and increments SP by unit size (4 bytes for 32-bit; 2 for 16-bit; 1 for 8-bit).
; (TODO)Consider a stack of a capacity of 16 bytes. It can hold only 16 bytes, or 8 words, or 4 dwords of temporary data. If you try to allocate memory any further in the stack after it is full, a stack overflow occurs. The most recent data that you tried to push into the stack is lost. Using the stack We use the PUSH instruction to put data into the next allocatable stack location and the POP instruction to remove a data entry from the current stack location (SP). Besides local variables there are two other very important functions of the stack. The first is to pass data to procedures including Windows API functions, for example, the following INVOKE call:
INVOKE SendMessage,[hWnd],WM_CLOSE,0,0
is actually assembled as follows:
push 0 push 0 push WM_CLOSE push [hWnd] call [SendMessage]The parameters are pushed in reverse order because Windows uses the STDCALL convention in which the stack is reversed but the function will balance it for you on return. For calling conventions that you can use in Windows, refer to Win CallingConventions. The second function of the stack is to hold the return addresses of procedures. When you call a procedure, the address that it was called from is pushed onto the stack then the program jumps to the procedure. When the CPU encounters a RET it will pop the return address off the stack and jump back to that address. It is, therefore, critical to balance the stack--if the return address is not where it is expected to be on RET the program will crash. For the most part if you do not manipulate the stack directly, you never have to worry about this but you will find that the stack is a very powerful tool and eventually you will need to keep these things in mind. A common question is: "Where is the stack located?" The stack is located in memory and is reserved for use by your program. esp and ebp are stack-related pointers. esp is the stack pointer, while ebp is the base pointer. When you enter/step into most functions, usually a stack frame would be created, and you can use ebp to access parameters and local variables. However functions can be created without the creating of a stack frame. An important point to note is that the stack should be aligned to DWORD (align to 4), if not it would raise some general protection fault (or simply known as GPF), and NT are extremely touchy to stack alignment issue. According to fodder, one of our friends on the Win32 ASM board, "Misaligned stack doesn't necessarily give GPF, but it does have weird effects - and it's true that especially NT is very picky." Example (MASM):
.386
.model flat,stdcall
option casemap:none
include /masm32/include/user32.inc
include /masm32/include/kernel32.inc
includelib /masm32/lib/user32.lib
includelib /masm32/lib/kernel32.lib
.code
start:
jmp @F
testing db "Stack needs to be aligned to dword"
@@:
sub esp,2 ; remove dword align to crash app on Windows NT
invoke MessageBox,0,OFFSET testing,0,0
invoke ExitProcess,0
end start
Now, more about stacks and its related opcodes. The most common opcodes related to the stack are 'push' and 'pop'. The usage for push is something like push eax, as in you push the data on eax to the stack. The esp (which holds the pointer to the stack) is then decemented by the size of data you pushed onto the stack. Similarly,when you pop eax, the data to the stack is moved to eax. The esp is then incremented by the size of the data moved from the stack.
Example:
push eax ; => sub esp,4 ;push decrements esp by byte size
; mov [esp],eax
pop eax ; => mov eax,[esp] ;pop increments esp by byte size
; add esp,4
However, mov is much faster than pushes and pops since it saves bytes and requires less clock cycles to execute. Thus some member (stryker/arkane) at the forums (win32asm) have came out with the xcall macro which is supposed to be faster than invokes, as it replaces all the pushes with mov and sub (note, however, that the result is much larger if you use push instructions). Of course there are some limitations which are that the macro cannot handle direct memory and cannot handle BYTE, WORD, QWORD, TBYTE size parameters.
;by gfalen
@str MACRO _str:VARARG
LOCAL @@1
IF @InStr(1, <_str>, <!"> )
.DATA
@@1 DB _str, 0
.CODE
EXITM <OFFSET @@1>
ELSE
EXITM <_str>
ENDIF
ENDM
;by stryker
xcall MACRO function:REQ, parameters:VARARG
LOCAL psize, paddr, plen
IFNB <parameters>
psize = 0
FOR param, <parameters>
psize = psize + 4
ENDM
IF psize EQ 4
push parameters
ELSE
sub esp, psize
psize = 0
FOR param, <parameters>
IF @SizeStr(<param> ) GT 4
paddr SUBSTR <param>, 1, 5
IFIDNI paddr, <ADDR >
paddr SUBSTR <param>, 6, @SizeStr(<param> ) - 5
lea eax, paddr
mov DWORD PTR ~[esp+psize*4], eax
ELSE
mov DWORD PTR ~[esp+psize*4], @str(<param> )
ENDIF
ELSE
mov DWORD PTR ~[esp+psize*4], @str(<param> )
ENDIF
psize = psize + 1
ENDM
ENDIF
ENDIF
call function
ENDM
The uses of push and pop are to store data temporarily (store data on the stack) and to pass parameter (pop are not used though). There are some opcodes that help to store and later restore values in the registers. They are namely pushad (pusha being the 16bit version), popad (popa being the 16bit version), pushfd (pushf being the 16bit version) and popfd (popf being the 16bit version). For pushad, all general registers are pushed onto the stack in the following order: eax, ecx, edx, ebx, esp ,ebp, esi and edi. For popad, the registers are popped off the stack in the following order: edi, esi, ebp, esp, edx, ecx, eax. For pushfd, the Flags register (EFLAGS) is transferred onto the stack. For popfd, the data from the stack are popped into the Flags register (EFLAGS).
Stack frame
Eariler on, I have mentioned that ebp is the base pointer and its uses are to access the local variables and parameters passed to the function. Below I have listed a sample MASM code (MASM have certain internal macro, one of it is to create an internal stack frame) and the code produced (viewed from a disassembler). The following codes shows how parameters can be access, and how a stack frame is created so as to access the parameter with ebp.
; MASM example:
test47 proc par1:DWORD, para2, para3, para4
mov eax,par1
mov ecx,par2
mov edx,par3
mov ebx,par4
ret
test47 endp
// HLA example
procedure test47( par1:dword; para2:dword; para3:dword; para4:dword );
@nostackalign; @nodisplay; @stdcall;
begin test47;
mov( par1, eax );
mov( para2, ecx );
mov( para3, edx );
mov( para4, ebx );
end test47;
becomes this after compiling (due to some MASM internal macros, which sets up the stack frame)
test47:
push ebp ; store value of ebp on stack
mov ebp,esp ; copy value of esp to ebp
mov eax,[ebp+08h] ; original value of ebp stored at [ebp+04h], DWORD PTR[ebp+08h] = par1
mov ecx,[ebp+0Ch] ; DWORD PTR [ebp+0Ch] = par2
mov edx,[ebp+10h] ; DWORD PTR [ebp+10h] = par3
mov ebx,[ebp+14h] ; DWORD PTR [ebp+14h] = par4
leave
ret 10h ; sizeof parameters * number of parameters = 4*4
; Actual MASM code output from the HLA compiler:
1_test47__hla_ proc near32
push ebp
mov ebp, esp
mov eax, dword ptr [ebp+8] ;/* par1 */
mov ecx, dword ptr [ebp+12] ;/* para2 */
mov edx, dword ptr [ebp+16] ;/* para3 */
mov ebx, dword ptr [ebp+20] ;/* para4 */
xL1_test47__hla___hla_:
leave
ret 16
L1_test47__hla_ endp
The code "push ebp" and "mov ebp,esp" creates a stack frame. The instruction "leave" removes the stack frame by esp and ebp back to their condition before the stack frame is initialized (mov esp,ebp pop ebp). There is opcode that does the opposite of leave, but is not used as it is slow. The "ret 10h" tells the processor to transfers control from the procedure back to the instruction address saved on the stack (surprise, surprise the stack is used to store the initial value of the instruction pointer when "calling" a function. The address of the function is loaded to eip and code continues with excution according to eip). "ret 10h" (something like add esp, 16 - thereby removing function parameters from the stack)is because of the STDCALL calling convention, while C calling convention would only do "ret" and leave it to the caller to adjust esp. One may ask why the first parameter is stored in DWORD PTR[ebp+08h] and not DWORD PTR[ebp+04h]. This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR[ebp+04h] contains the original value of ebp. Parameters could be accessed via DWORD PTR [ebp+4*positionofparameter]
The above code shows how a stack frame is created and how ebp is used to access the parameters passed to the functions. The following code (MASM & HLA) would show how ebp can be used to access local variables (Local variables are acutally data stored on the stack).
test124 proc par1:DWORD, para2, para3, para4
LOCAL buffer[32]:BYTE
LOCAL dd1:DWORD
LOCAL dd2:DWORD
mov eax,dd1 ; Spare me the crap, this is just an example.
mov dd2,eax
lea eax,buffer
ret
test124 endp
// HLA example:
procedure test124( par1:dword; para2:dword; para3:dword; para4:dword );
@nostackalign;
@leave;
@nodisplay;
@stdcall;
var
buffer: byte[32];
dd1: dword;
dd2: dword;
begin test124;
mov( dd1, eax );
mov( eax, dd2 );
lea( eax, buffer );
end test124;
becomes this after compiling (due to some MASM internal macros, which sets up the stack frame)
push ebp
mov ebp,esp
add esp, -28h ; reserve stack space for local variables
mov eax,[ebp-24h] ; DWORD PTR ~[ebp-24h] = dd1
mov [ebp-28h],eax ; DWORD PTR ~[ebp-28h] = dd2
lea eax, [ebp-20h]; [ebp-20h] = address of first byte in the array
leave
ret 10h ; sizeof parameters * number of parameters = 4*4
; Actual HLA compiler output:
L2_test124__hla_ proc near32
push ebp
mov ebp, esp
sub esp, 40
mov eax, dword ptr [ebp-36] ;/* dd1 */
mov dword ptr [ebp-40], eax ;/* dd2 */
lea eax, byte ptr ~[ebp-32] ;/* buffer */
xL2_test124__hla___hla_:
leave
ret 16
L2_test124__hla_ endp
Okay, so the code is almost similar to the above code, creating a stack frame. The instruction "add esp,-28h" (sub esp, 40 in the HLA output) might seem weird, but it has its purpose. It is to ensure the values stored in local variables are not corrupted any data when something is pushed onto the stack. (Hopefully I do make some sense.) However I cannot comprehend why MASM produce "add esp,-28h" instead of "sub esp,28h". Maybe it is due to some macro defined deep into MASM. Local variables differ from parameters in the fact that they are accessed by negative displacement (Remember the fact that when you push something, the value of esp would decrease). I think it would be easier to understand how to access local variables by examining how to calculate the displacement needed to access a certain local variable (by looking at the above example) than my explanation.
Some code gurus definitely cares about how big the code size and how fast their code runs. To optimise their code, they might even not have a stack frame in their functions (Yes, it is possible and I would show you how). Removing stack frame can shave off some clocks and some bytes (push ebp = 1byte, mov ebp,esp = 2 byte, leave = 1 byte, total bytes saved = 4). When stack frame is removed, remember that pushing data would cause a change in the value of esp. You need to manually adjust the offsets from esp. Also, if you don't have a stack frame you don't want 'leave'. The following codes are ways to create functions without stack frame.
According to [user]f0dder[/user], "Usually, the reason for not using a stack frame is either that you don't need it, or that you want to use EBP as a general purpose register - not so much to save the push ebp . Push/pop ebp would still be needed if you want to use EBP as a general purpose register, since you have to return it in it's original state". He further states that "If you don't use a stack frame, you cannot use local variables (in the automated masm way), nor can you access function parameters the usual way - you have to handcode (well, unless somebody has macros) all ESP references, and remember to further adjust these if you do push/pop".
call function1
...
function1:
nop ; to represent whatever code present
ret 4*numberofparameter
or
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
function2 proc par1:DWORD,par2,par3,par4
nop ; to represent whatever code present
ret 4*4
function2 endp
OPTION PROLOGUE:PROLOGUEDEF
OPTION EPILOGUE:EPILOGUEDEF
or
function3 proc
par1 equ <esp+4>
par2 equ <esp+8>
par3 equ <esp+12>
par4 equ <esp+16>
nop ; to represent whatever code present
ret 4*4
function3 endp
In HLA, you can tell the compiler to skip the generation of the stack frame by using the @noframe procedure option and the @basereg and @parmoffset compile-time variables. Note that when using this option you cannot use the @stdcall scheme, so the parameters will be passed on the stack in the opposite order.
?@basereg := esp; // Tell HLA to use ESP as the base register.
?@parmoffset := 4; // Parameters start at offset 4 since we're not pushing EBP anymore.
procedure nostk( par1:dword; para2:dword; para3:dword; para4:dword );
@nodisplay; // Don't automatically generate code
@noframe; // for the stack frame.
// Note: no @stdcall option!
begin nostk;
mov( par1, eax );
mov( para2, ecx );
mov( para3, edx );
mov( para4, ebx );
ret( _parms_ ); // "_parms_" is a constant HLA creates that specifies
// the number of bytes of parameters.
end nostk;
?@basereg := ebp; // Now switch back to EBP as the base register.
Here's the code the HLA compiler emits:
L3_nostk__hla_ proc near32
mov eax, dword ptr [esp+16] ;/* par1 */
mov ecx, dword ptr [esp+12] ;/* para2 */
mov edx, dword ptr [esp+8] ;/* para3 */
mov ebx, dword ptr [esp+4] ;/* para4 */
ret 16
xL3_nostk__hla___hla_:
L3_nostk__hla_ endp
If you really need to use the stdcall calling sequence with HLA without building a stack frame, you can use equates, just as you do with MASM, e.g.,
procedure nostk2( _par1:dword; _para2:dword; _para3:dword; _para4:dword );
@nodisplay;
@noframe;
const
par1 :text := "(type dword [esp+4])";
para2 :text := "(type dword [esp+8])";
para3 :text := "(type dword [esp+12])";
para4 :text := "(type dword [esp+16])";
begin nostk2;
mov( par1, eax );
mov( para2, ecx );
mov( para3, edx );
mov( para4, ebx );
ret( _parms_ );
end nostk2;
-- Output from HLA compiler:
L4_nostk2__hla_ proc near32
mov eax, dword ptr [esp+4] ;/* (type dword [esp+4]) */
mov ecx, dword ptr [esp+8] ;/* (type dword [esp+8]) */
mov edx, dword ptr [esp+12] ;/* (type dword [esp+12]) */
mov ebx, dword ptr [esp+16] ;/* (type dword [esp+16]) */
ret 16
xL4_nostk2__hla___hla_:
L4_nostk2__hla_ endp
Further notes on the stack
For one that seriously did mess around with the stack, he/she will realise that somehow windows will mysteriously terminate the program or perhaps having a weird GPF(general protection fault), all these depending on the variants of Windows. So what is the limiting fact, one would ask. Let me tell you, the stack is bounded by 2 things, namely the lower stack boundary and the upper stack boundary. The lower stack boundary is located at fs:[8] and the upper stack boundary at fs:[4]. fs:[4] and fs:[8] will make sure that when you enter some part of the kernel(that checks ESP against fs:[4] and fs:[8]) the OS won't kill your program.
For example
mov dword ptr fs:[4], ffffffffh mov dword ptr fs:[8], 0The above example allows the stack to be located on any memory location as long as the memory is committed and accessible. In short you can create your stack. I would strongly suggest that the values in fs:[4] and fs:[8] is restored on the exit of you program. It is only a few instruction and hopefully it does ensure compatibility across the different OS.