on my MSDN library I found the following:

When using __asm to write assembly language in C/C++ functions, you don't need to preserve the EAX, EBX, ECX, EDX, ESI, or EDI registers. However, using these registers will affect code quality because the register allocator cannot use them to store values across __asm blocks. In addition, by using EBX, ESI or EDI in inline assembly code, you force the compiler to save and restore those registers in the function prologue and epilogue.

You should preserve other registers you use (such as DS, SS, SP, BP, and flags registers) for the scope of the __asm block. You should preserve the ESP and EBP registers unless you have some reason to change them (to switch stacks, for example).

Does this mean that by maintaining registers and flags, and restoring them before leaving the __asm block the compilers tries to implement optimizations???? Or are some of these drawbacks still present???

Posted on 2003-10-28 08:28:39 by yaa
It means that the compiler can't save a value in a register before an asm block if that register is used inside the block, instead it will have to use the stack to save it, and retrieve it when needed afterwards so the generated code will be slower than if registers were used.
(If it needs to save a value, of course).

Actually msvc always saves/restores ebx/esi/edi if an asm block is present inside a function (even if these registers aren't used inside the block), at least msvc6 does.

Posted on 2003-10-28 11:24:30 by hitchhikr
Originally posted by hitchhikr
Actually msvc always saves/restores ebx/esi/edi if an asm block is present inside a function (even if these registers aren't used inside the block), at least msvc6 does.

No it doesn't, only in debug mode maybe..

int bla(int j)
int k;
mov eax, j
shl eax, 1
mov k, eax
return k;

00401000 55 push ebp
00401001 8B EC mov ebp,esp
00401003 51 push ecx
00401004 8B 45 08 mov eax,dword ptr [ebp+8]
00401007 D1 E0 shl eax,1
00401009 89 45 FC mov dword ptr [ebp-4],eax
0040100C 8B 45 FC mov eax,dword ptr [ebp-4]
0040100F 8B E5 mov esp,ebp
00401011 5D pop ebp
00401012 C3 ret

As you can see the places where asm and C meet are not implemented very efficiently, it's best to keep asm functions external and call them form C, instead of mixing both languages directly. VC.NET does a bit better though:

00401000 51 push ecx
00401001 8B 44 24 08 mov eax,dword ptr [esp+8]
00401005 D1 E0 shl eax,1
00401007 89 04 24 mov dword ptr [esp],eax
0040100A 8B 04 24 mov eax,dword ptr [esp]
0040100D 59 pop ecx
0040100E C3 ret

Posted on 2003-10-28 13:01:15 by Thomas
Mine does:

void mylousyfunction(void) {
int i;
i = 4;
__asm {
mov eax,i
inc eax

00401000 push ebp
00401001 mov ebp,esp
00401003 push ecx
00401004 push ebx
00401005 push esi
00401006 push edi
00401007 mov dword ptr ,4
0040100E mov eax,dword ptr
00401011 inc eax
00401012 pop edi
00401013 pop esi
00401014 pop ebx
00401015 mov esp,ebp
00401017 pop ebp
00401018 ret

But then maybe it's a matter of compiler option or compiler version (i'm using vc standard) or service pack update or something else...

Posted on 2003-10-28 13:29:59 by hitchhikr
Are you sure you're compiling in release mode?

Posted on 2003-10-28 14:10:19 by Thomas
Debug version:

00401020 push ebp
00401021 mov ebp,esp
00401023 sub esp,44h
00401026 push ebx
00401027 push esi
00401028 push edi
00401029 lea edi,
0040102C mov ecx,11h
00401031 mov eax,0CCCCCCCCh
00401036 rep stos dword ptr
7: int i;
8: i=4;
00401038 mov dword ptr ,4
9: _asm {
10: mov eax,i
0040103F mov eax,dword ptr
11: inc eax
00401042 inc eax
12: }
13: }
00401043 pop edi
00401044 pop esi
00401045 pop ebx
00401046 add esp,44h
00401049 cmp ebp,esp
0040104B call __chkesp (004010b0)
00401050 mov esp,ebp
00401052 pop ebp
00401053 ret

Posted on 2003-10-28 14:14:54 by hitchhikr
The standard version lacks the optimization levels needed to do that.
Posted on 2003-10-29 18:58:28 by gliptic
The recommendation that Thomas made is the most efficient in terms of code output. When you use an optimising compiler, it needs to be able to work on the whole section of code and when you embed inline asm into C code, you interfere with the optimisation.

If you make seperate modules in assembler, you can concentrate on writing minimum stack overhead and then handle the available registers in any way you require once you understand what you need to preserve and who does not need preservation.

Effectively you optimise the assembler code you write and let the compiler optimise the C code and this way you will probably get the best output.

Posted on 2003-10-29 19:29:47 by hutch--