I'm new to assembly programming, so hopefully the answer here is simple or at least not too difficult!

I am writing an object that is supposed to free itself, including both data AND code, before returning. (The object in question is a COM object on Windows but the fact that it's COM doesn't matter too much.) The code needs to work on both x86 and x64.

The general approach is that the last things the free function should do are:
1) clean up the stack;
2) do not mess with the return address;
3) set up the stack for VirtualFree; and finally
3) jump to the Win32 API call VirtualFree.

VirtualFree will free the data and code for the object, so it is self-deleting. Then it returns directly to the caller. If it returns to my free function, the code will crash because VirtualFree just freed the memory.

Here is a sample of some C code with sample assembly for each line:

static ULONG WINAPI SampleObj_Release(IUnknown *This)
77B14C10  push        ebp 
77B14C11  mov        ebp,esp
77B14C13  sub        esp,48h
77B14C16  push        ebx 
77B14C17  push        esi 
77B14C18  push        edi 
SampleObj *p = (SampleObj*)This;
77B14C19  mov        eax,dword ptr
77B14C1C  mov        dword ptr ,eax
ULONG refs = _InterlockedDecrement(&p->dwRefs);
77B14C1F  mov        eax,dword ptr
77B14C22  add        eax,8
77B14C25  or          ecx,0FFFFFFFFh
77B14C28  lock xadd  dword ptr ,ecx
77B14C2C  dec        ecx 
77B14C2D  mov        dword ptr ,ecx
if (refs != 0) return refs;
77B14C30  je          SampleObj_Release+27h (77B14C37h)
77B14C32  mov        eax,dword ptr
77B14C35  jmp        SampleObj_Release+66h (77B14C76h)


// TODO: force a trampoline jump in assembly!
return p->pFixedEntries->pVirtualFree(p->pVtbl, 0, MEM_RELEASE);
77B14C5E  push        8000h
77B14C63  push        0   
77B14C65  mov        eax,dword ptr
77B14C68  mov        ecx,dword ptr
77B14C6A  push        ecx 
77B14C6B  mov        edx,dword ptr
77B14C6E  mov        eax,dword ptr
77B14C71  mov        ecx,dword ptr
77B14C74  call        ecx 
77B14C76  pop        edi 
77B14C77  pop        esi 
77B14C78  pop        ebx 
77B14C79  mov        esp,ebp
77B14C7B  pop        ebp 
77B14C7C  ret        4   

What is the right assembly to do this? I know that in x86, I can get away with using inline assembly in MSVC towards the end. But in x64, I think that I have to use MASM or equivalent to write out the entire function call.

Note that WINAPI is __stdcall, so the arguments in x86 get pushed onto the stack and are freed by the callee, not the caller. On x64, the first four arguments are passed via registers and shadow stack space is allocated. Since the initial function just takes one argument (the this pointer), and VirtualFree takes three arguments, it should work. The return value is not of any consequence.

Posted on 2010-10-12 20:55:27 by SeanTek

Who gave you the impression that code and data of a COM object are deallocated together?
Just imagine what would happen if you wanted to create a New Instance!!

Truth is, the code never gets deallocated - it remains in memory at all times, there are no exceptions that I can think of.
The code, together with the Class Template, are embedded in the binary file you compiled, and remain available pending subsequent utilization.
Only thing that we are releasing is some (typically Heap) memory which represented an Instance of our Class Template (ie, a clone in heap memory).

Therefore, after calling VirtualFree, the CODE is ok to continue executing, as long as it does NOT reference any data fields (since the data did get released).

Now, just to convince you that this is the case, imagine that you were in fact correct about the code being associated with an object instance - this would imply that there's a whole copy of the code for EVERY INSTANCE of the class (which would be silly), and it also implies that the code must be PC-RELATIVE in order to execute at arbitrary address in memory, rather than optimized for some fixed virtual address.. which do you consider is more likely?
Posted on 2010-10-12 21:35:41 by Homer
Ah, but that is the point. In this case, I am explicitly creating an object that does not require its DLL (with the code sections) to be loaded in memory at all times. The code executes after the DLL is loaded, but the DLL can be unloaded while the object (or at least a barebones fraction of the object) remains in memory.

This is a pretty unusual task, but I thought the whole point of going to low-level Assembly is to unusual things.  :thumbsup:

In this case, yes, there's a whole copy of the code for every instance of the class. But the size is minimal, and far less than the size of a page (4096 bytes) to hold it. It does not have to be that way--one could have a reference count on the code and when the last instance is destroyed, the code is also VirtualFree'd.

The code is all PC-relative. In this case, the code is merely stub code: 99% of the main functionality is written in C/C++, not Assembly. Also, I suppose that this is a more general technique for just-in-time freeing of code--it doesn't have to be COM at all.
Posted on 2010-10-12 21:53:01 by SeanTek
I'm writing the implementation to it.
Posted on 2010-10-12 22:21:03 by SeanTek
Then you answered your own question - rather than "call" the VirtualFree function and return to caller, we JMP to it, and return to OUR CALLER'S CALLER.

At the end of the VirtualFree function, there is a RET.
If we did not CALL the function but instead JMP to it, whose return address is on top of the call stack? We are returning, but not to our DLL code - instead, we are returning to the code that called our DLL stub function which lead to the VirtualFree function via a JMP - makes sense?

In fact we can abstract this further, we could deliberately POP THE RETURN ADDRESS from the stack and shove ANY RETURN ADDRESS we wanted to, and when we return we will then return "to somewhere else" !!!

But I leave this to your avid imagination ;)
Yes, things we can do in ASM don't always follow "the rules" since we are the game masters.
Posted on 2010-10-13 05:37:18 by Homer
In order to jump directly to a STDCALL function, you need to effectively remove all stack data associated with the current function, before stacking the arguments for the function you're jumping to. You also need to ensure that the "invariant" registers EBX, ESI, EDI, and EBP have the correct values before jumping to the function.

A simple-minded way to accomplish this is to first stack the arguments, stack a copy of the return address, restore registers, and then use the scratch registers EAX, ECX, and EDX to move the arguments and return address upward to replace the data that must be unstacked. Set ESP, and then jump.

The default installation of VC uses CDECL as the default for "normal" C++ functions, which means you leave the arguments and replace the return address and local variables with the new arguments and a copy of the old return address. If you set the compiler option to use STDCALL or explicitly use __stdcall or WINAPI, you replace/remove the current argument list in addition to the return address and locals.
Posted on 2010-10-17 02:50:41 by tenkey
Thanks all! Looks like I got it working.

Assume the pointer to VirtualFree is in EAX, and the pointer to the memory to free is in EDX

    pop      esi
    pop      ecx                    ; now ecx contains the return address
    add      esp,4                  ; from ret 4 equivalence (pop arg0 aka this pointer--
                                    ; we clean the stack)
    push      8000h                  ; MEM_RELEASE
    push      0                      ; 0 (size)
    push      edx                    ; memory to free

    push      ecx                    ; return pointer
    jmp      eax                    ; tail-call jump to VirtualFree(...)
Posted on 2010-10-19 18:38:06 by SeanTek