In "Assembly Language Step-by-Step", Jeff Duntemann makes a big fuss about the C calling convention and how when programing for Linux, a program cannot modify the EBX, ESP, EBP, ESI and EDI registers. I'm interested in Linux but also BSD and OS X. Looking around the web for information, I cannot find any information that specifically and definitely confirms what Duntemann writes. It seems to me an operating system that depends on application programmers following a convention would be an operating system that crashes frequently. ;-)

Duntemann's discussion results in the following boilerplate code for NASM

main:
  push ebp
  mov ebp, esp
  push ebx
  push esi
  push edi

  ; guts of program here

  pop edi
  pop esi
  pop ebx
  mov esp,ebp
  pop ebp
  ret


Does anyone have a definitive reference regarding these registers being preserved for Linux, OS X, and/or FreeBSD. I can't find anything and there is nothing mentioned in the FreeBSD Developer's Handbook which has a large-ish Assembly section.

Also it seems "ret" is not sufficient to exit a program from OS X. It is necessary to have


  push 1
  sub esp, 4
  int 0x80


Thanks,
Peter
Posted on 2009-08-08 19:46:31 by petermichaux
This will help you:
http://www.agner.org/optimize/#manuals
direct link w ww.agner.org/optimize/calling_conventions.pdf
Posted on 2009-08-08 21:10:38 by drizz
If you want to be completely sure Peter, you can install both of these systems and test it yourself with a debugger (like gdb).
Call a few APIs consecutively and watch the registers in the debugger.

And by the way, the C convention is very good for 32 bit systems. IMO it's better than stdcall since you can regroup stack cleanup with add esp, 4*x
Posted on 2009-08-08 21:29:11 by ChaperonNoir

In "Assembly Language Step-by-Step", Jeff Duntemann makes a big fuss about the C calling convention and how when programing for Linux, a program cannot modify the EBX, ESP, EBP, ESI and EDI registers. I'm interested in Linux but also BSD and OS X. Looking around the web for information, I cannot find any information that specifically and definitely confirms what Duntemann writes. It seems to me an operating system that depends on application programmers following a convention would be an operating system that crashes frequently. ;-)


You can modify these registers without worry if you are just writing a simple program. It's considered good practice that when you write a procedure you should save/restore these registers so that the calling procedures don't need to worry about you trashing registers that it might depend on. When programming in assembly you are generally making calls to the syscall interrupt which uses most of those as arguments.

The C Calling convention does not clean up the stack for you but it does specify that those registers will not be trashed when a procedure is called. Ecx and Eax however are almost guaranteed to be trashed and you should not expect them to be the same after a C function call. So when writing procedures which interface with C programs you should always save those registers otherwise your C program might break.


Duntemann's discussion results in the following boilerplate code for NASM

main:
  push ebp
  mov ebp, esp
  push ebx
  push esi
  push edi

  ; guts of program here

  pop edi
  pop esi
  pop ebx
  mov esp,ebp
  pop ebp
  ret



Keep in mind the idea of optimization. Instead of arsing with boiler plate routines just keep in mind that if you use a variable in your procedure, save it before and restore it after. So if you aren't using Edi, Esi, Ebx or whatever then don't worry about saving it. He's just trying to instill the standards on you from the git-go. It's actually a good thing to do since, when you get to doing things like C/ASM mixed projects, if you end up calling a procedure written in assembly from a C program, the compiler might be using Edi to reference a certain value; When your C code calls your Asm routine, if it uses the Edi register then your C program's Edi reference is no longer valid.

Another thing is, this is pretty much true with whatever calling convention you are using. StdCall and the rest also expect you to save/restore your registers. The overall idea is to minimize the size of your programs by centralizing the saving/restoring of used registers to the procedure instead of having the programmer who uses those procedures to know which registers the procedure is using and push'ing them each time the procedure is called and pop'ing them off after the procedures execution or taking the slightly cheaper approach and using pushad/popad before and after each routine.


Does anyone have a definitive reference regarding these registers being preserved for Linux, OS X, and/or FreeBSD. I can't find anything and there is nothing mentioned in the FreeBSD Developer's Handbook which has a large-ish Assembly section.


The FreeBSD handbook's assembly section doesn't really cover much and most of the calls it makes are to the system's IDT in which case variables will get trashed all the time. Check out the Agner docs which where posted by drizz, it is a much better resource for learning calling conventions.


Also it seems "ret" is not sufficient to exit a program from OS X. It is necessary to have


  push 1
  sub esp, 4
  int 0x80


Thanks,
Peter


That's right. FreeBSD and OSX both require a 4-byte padding between the call and the arguments to IDT calls. A common method on FreeBSD is to call a procedure containing nothing but the int 0x80 call which pushes EIP onto the stack to act as a padding and uses it afterwords to return to the original call.

msg	db "Hello World!"
msg_len dd $-msg

bsd_kernel:
int 0x80
ret

hello_world:
push ; length of message
push msg ; address of message
push 1 ; write to standard output
push 4 ; kernel function `write'
call bsd_kernel ; send the interrupt

push 0 ; error code for application
push 1 ; kernel function `exit'
call bsd_kernel ; send the interrupt


On a lot of systems you can simply call `ret' but that's because your program's procedure is being called from a startup routine which the C compiler you link it with has put in place. When doing assembly and linking without the C compiler (or using the -nostartfiles option) your procedure must make the call to the system `exit' routine itself instead of using ret to return to the startup routine provided by your compiler.


If you want to be completely sure Peter, you can install both of these systems and test it yourself with a debugger (like gdb).
Call a few APIs consecutively and watch the registers in the debugger.


I do agree that if you are building applications for a system you should do so on that system. But setting up multi-boot or anything like that is not needed. Most Linux and FreeBSD distributions can be installed into .bin files and ran through QEmu with very little overhead. I have a Linux distro installed into a flat binary file on a USB disk with QEmu setup to auto-run when the drive is detected. All I have to do is plug the drive in a USB port and it'll open the OS up in full screen (and I use Ctrl+Alt to switch between my Windows OS and the Linux OS). It's actually pretty convenient when paired with "persistent mode" distributions.

Side Note: I do now have a separate netbook with Linux installed on it, but I still use the USB version for doing cross platform development so that I don't have to switch computers.

I also don't believe in the idea of using debuggers and disassemblers to learn assembly. I have admitted in the past that was how I learned, but honestly have you seen my code? It teaches horrible coding practices which I myself have trouble avoiding and is the reason many people call my source files `unreadable' and `obfuscated'. Learn programming from manuals, books, and tutorials. They have a lot more to offer than debugging and disassembling.


And by the way, the C convention is very good for 32 bit systems. IMO it's better than stdcall since you can regroup stack cleanup with add esp, 4*x


Personally I like stdcall. I don't like having to clean up the stack after every procedure. It's okay for things like printf where variable length arguments are required but since those routines are far and few between stdcall just seems more practical.
Posted on 2009-08-09 00:02:39 by Synfire

The C Calling convention does not clean up the stack for you but it does specify that registers will not be trashed when a procedure is called. Ecx and Eax however are almost guaranteed to be trashed and you should not expect them to be the same after a C function call. So when writing procedures which interface with C programs you should always save those registers otherwise your C program might break.


That is the general gist that I understood. The uncomfortable part is this C Calling Convention seems to be mostly floating around in the ether. Sometimes I see ESP as one of the "sacred" registers. Sometimes it is not listed.



Check out the Agner docs which where posted by drizz, it is a much better resource for learning calling conventions.


Even those docs don't make me feel like I'm reading something official. It is still all a little folklore-ish.



On a lot of systems you can simply call `ret' but that's because your program's procedure is being called from a startup routine which the C compiler you link it with has put in place. When doing assembly and linking without the C compiler (or using the -nostartfiles option) your procedure must make the call to the system `exit' routine itself instead of using ret to return to the startup routine provided by your compiler.


Thanks. I think connects some information I read in Advanced Programming in the UNIX Environment with this exit business. I think the startup routine you are describing lives in libc, correct?

Peter
Posted on 2009-08-09 00:34:47 by petermichaux
Oh Peter, you can be sure Agner Fog knows what he's talking about.
Did you also see this page :
http://en.wikipedia.org/wiki/X86_calling_conventions

Synfire : Of course when I meant installing, I really meant to use Virtual Machines haha. I never want to have to touch anything GRUB-related again  :P
I've heard a lot of good things about QEMU / KVM. Here I just use VMWARE but that might change very soon !
A friend gave me the following commands when using qemu kvm.

qemu-kvm -m 512 -cdrom atrolinux.iso
qemu-kvm -m 512 -no-acpi  -cdrom winxp.iso -hda winxp.img -boot d
dd if=/dev/sr0 of=winxp.iso
qemu-img create -f qcow winxp.img 6G
qemu-kvm -m 512  -hda winxp.img -cdrom /dev/sr0 -localtime
Posted on 2009-08-09 16:27:04 by ChaperonNoir
It seems to me an operating system that depends on application programmers following a convention would be an operating system that crashes frequently. ;-)

If an app messes the stack, that app is terminated. This is true for almost every OS I've seen. I don't know why the OS should be the one to crash? ^^ Maybe in DOS/win9x times you could BSOD an OS with a shitty code but those times are long since gone.
Posted on 2009-08-09 17:08:41 by ti_mo_n


Check out the Agner docs which where posted by drizz, it is a much better resource for learning calling conventions.

Even those docs don't make me feel like I'm reading something official. It is still all a little folklore-ish.

You probably won't find much official documentation for something that isn't officially supported/recommended :). Remember that the systems are written by and for C programmers, and are portable across a relatively wide range of systems; writing official assembly documentation for each systems, and keeping it up to date, probably seems like a bit too much work when it's expected that almost everybody will be using the official C interfaces.

That said, I do believe the ABI should be officially documented for each supported platform.
Posted on 2009-08-13 09:18:34 by f0dder
Google "Intel ABI" - that's the "official name", AFAIK. I consider it somewhat misleading, since the hardware doesn't care, but if you want to interface with any other language (not just C), you need to do it.

Why would Duntemann lie? I even happen to know how he learned it. When upgrading from the first edition to the second (Third Edition is recently published), he found an old Linux example that no longer worked. He posted it to a.l.a. (or c.l.a.x. perhaps). Someone - I think it was H. Peter Anvin (maintainer of the x86 port of Linux... plus Nasm) who spotted that he hadn't preserved ebx. Apparently his old kernel "didn't care", but the new one did. So if you want your programs to keep running, do it - even if some kernel/library version "doesn't care".

If it's "all your own code", you can do whatever you like - but don't try to "ret" without esp being correct. That one, the hardware does care about!

Best,
Frank

Posted on 2009-11-02 16:25:32 by fbkotler