In "Assembly Language Step-by-Step", Jeff Duntemann makes a big fuss about the C calling convention and how when programing for Linux, a program cannot modify the EBX, ESP, EBP, ESI and EDI registers. I'm interested in Linux but also BSD and OS X. Looking around the web for information, I cannot find any information that specifically and definitely confirms what Duntemann writes. It seems to me an operating system that depends on application programmers following a convention would be an operating system that crashes frequently. ;-)
You can modify these registers without worry if you are just writing a simple program. It's considered good practice that when you write a procedure you should save/restore these registers so that the calling procedures don't need to worry about you trashing registers that it might depend on. When programming in assembly you are generally making calls to the syscall interrupt which uses most of those as arguments.
The C Calling convention does not clean up the stack for you but it does specify that those registers will not be trashed when a procedure is called. Ecx and Eax however are almost guaranteed to be trashed and you should not expect them to be the same after a C function call. So when writing procedures which interface with C programs you should always save those registers otherwise your C program might break.
Duntemann's discussion results in the following boilerplate code for NASM
main:
push ebp
mov ebp, esp
push ebx
push esi
push edi
; guts of program here
pop edi
pop esi
pop ebx
mov esp,ebp
pop ebp
ret
Keep in mind the idea of optimization. Instead of arsing with boiler plate routines just keep in mind that if you use a variable in your procedure, save it before and restore it after. So if you aren't using Edi, Esi, Ebx or whatever then don't worry about saving it. He's just trying to instill the standards on you from the git-go. It's actually a good thing to do since, when you get to doing things like C/ASM mixed projects, if you end up calling a procedure written in assembly from a C program, the compiler might be using Edi to reference a certain value; When your C code calls your Asm routine, if it uses the Edi register then your C program's Edi reference is no longer valid.
Another thing is, this is pretty much true with whatever calling convention you are using. StdCall and the rest also expect you to save/restore your registers. The overall idea is to minimize the size of your programs by centralizing the saving/restoring of used registers to the procedure instead of having the programmer who uses those procedures to know which registers the procedure is using and push'ing them each time the procedure is called and pop'ing them off after the procedures execution or taking the slightly cheaper approach and using pushad/popad before and after each routine.
Does anyone have a definitive reference regarding these registers being preserved for Linux, OS X, and/or FreeBSD. I can't find anything and there is nothing mentioned in the FreeBSD Developer's Handbook which has a large-ish Assembly section.
The FreeBSD handbook's assembly section doesn't really cover much and most of the calls it makes are to the system's IDT in which case variables will get trashed all the time. Check out the Agner docs which where posted by drizz, it is a much better resource for learning calling conventions.
Also it seems "ret" is not sufficient to exit a program from OS X. It is necessary to have
push 1
sub esp, 4
int 0x80
Thanks,
Peter
That's right. FreeBSD and OSX both require a 4-byte padding between the call and the arguments to IDT calls. A common method on FreeBSD is to call a procedure containing nothing but the int 0x80 call which pushes EIP onto the stack to act as a padding and uses it afterwords to return to the original call.
msg db "Hello World!"
msg_len dd $-msg
bsd_kernel:
int 0x80
ret
hello_world:
push [msg_len] ; length of message
push msg ; address of message
push 1 ; write to standard output
push 4 ; kernel function `write'
call bsd_kernel ; send the interrupt
push 0 ; error code for application
push 1 ; kernel function `exit'
call bsd_kernel ; send the interrupt
On a lot of systems you can simply call `ret' but that's because your program's procedure is being called from a startup routine which the C compiler you link it with has put in place. When doing assembly and linking without the C compiler (or using the -nostartfiles option) your procedure must make the call to the system `exit' routine itself instead of using ret to return to the startup routine provided by your compiler.
If you want to be completely sure Peter, you can install both of these systems and test it yourself with a debugger (like gdb).
Call a few APIs consecutively and watch the registers in the debugger.
I do agree that if you are building applications for a system you should do so on that system. But setting up multi-boot or anything like that is not needed. Most Linux and FreeBSD distributions can be installed into .bin files and ran through QEmu with very little overhead. I have a Linux distro installed into a flat binary file on a USB disk with QEmu setup to auto-run when the drive is detected. All I have to do is plug the drive in a USB port and it'll open the OS up in full screen (and I use Ctrl+Alt to switch between my Windows OS and the Linux OS). It's actually pretty convenient when paired with "persistent mode" distributions.
Side Note: I do now have a separate netbook with Linux installed on it, but I still use the USB version for doing cross platform development so that I don't have to switch computers.
I also don't believe in the idea of using debuggers and disassemblers to learn assembly. I have admitted in the past that was how I learned, but honestly have you seen my code? It teaches horrible coding practices which I myself have trouble avoiding and is the reason many people call my source files `unreadable' and `obfuscated'. Learn programming from manuals, books, and tutorials. They have a lot more to offer than debugging and disassembling.
And by the way, the C convention is very good for 32 bit systems. IMO it's better than stdcall since you can regroup stack cleanup with add esp, 4*x
Personally I like stdcall. I don't like having to clean up the stack after every procedure. It's okay for things like printf where variable length arguments are required but since those routines are far and few between stdcall just seems more practical.