I'm new to assembly programming, so hopefully the answer here is simple or at least not too difficult!
I am writing an object that is supposed to free itself, including both data AND code, before returning. (The object in question is a COM object on Windows but the fact that it's COM doesn't matter too much.) The code needs to work on both x86 and x64.
The general approach is that the last things the free function should do are:
1) clean up the stack;
2) do not mess with the return address;
3) set up the stack for VirtualFree; and finally
3) jump to the Win32 API call VirtualFree.
VirtualFree will free the data and code for the object, so it is self-deleting. Then it returns directly to the caller. If it returns to my free function, the code will crash because VirtualFree just freed the memory.
Here is a sample of some C code with sample assembly for each line:
What is the right assembly to do this? I know that in x86, I can get away with using inline assembly in MSVC towards the end. But in x64, I think that I have to use MASM or equivalent to write out the entire function call.
Note that WINAPI is __stdcall, so the arguments in x86 get pushed onto the stack and are freed by the callee, not the caller. On x64, the first four arguments are passed via registers and shadow stack space is allocated. Since the initial function just takes one argument (the this pointer), and VirtualFree takes three arguments, it should work. The return value is not of any consequence.
Thanks!
I am writing an object that is supposed to free itself, including both data AND code, before returning. (The object in question is a COM object on Windows but the fact that it's COM doesn't matter too much.) The code needs to work on both x86 and x64.
The general approach is that the last things the free function should do are:
1) clean up the stack;
2) do not mess with the return address;
3) set up the stack for VirtualFree; and finally
3) jump to the Win32 API call VirtualFree.
VirtualFree will free the data and code for the object, so it is self-deleting. Then it returns directly to the caller. If it returns to my free function, the code will crash because VirtualFree just freed the memory.
Here is a sample of some C code with sample assembly for each line:
static ULONG WINAPI SampleObj_Release(IUnknown *This)
{
77B14C10 push ebp
77B14C11 mov ebp,esp
77B14C13 sub esp,48h
77B14C16 push ebx
77B14C17 push esi
77B14C18 push edi
SampleObj *p = (SampleObj*)This;
77B14C19 mov eax,dword ptr
77B14C1C mov dword ptr ,eax
ULONG refs = _InterlockedDecrement(&p->dwRefs);
77B14C1F mov eax,dword ptr
77B14C22 add eax,8
77B14C25 or ecx,0FFFFFFFFh
77B14C28 lock xadd dword ptr ,ecx
77B14C2C dec ecx
77B14C2D mov dword ptr ,ecx
if (refs != 0) return refs;
77B14C30 je SampleObj_Release+27h (77B14C37h)
77B14C32 mov eax,dword ptr
77B14C35 jmp SampleObj_Release+66h (77B14C76h)
...
// TODO: force a trampoline jump in assembly!
return p->pFixedEntries->pVirtualFree(p->pVtbl, 0, MEM_RELEASE);
77B14C5E push 8000h
77B14C63 push 0
77B14C65 mov eax,dword ptr
77B14C68 mov ecx,dword ptr
77B14C6A push ecx
77B14C6B mov edx,dword ptr
77B14C6E mov eax,dword ptr
77B14C71 mov ecx,dword ptr
77B14C74 call ecx
}
77B14C76 pop edi
77B14C77 pop esi
77B14C78 pop ebx
77B14C79 mov esp,ebp
77B14C7B pop ebp
77B14C7C ret 4
What is the right assembly to do this? I know that in x86, I can get away with using inline assembly in MSVC towards the end. But in x64, I think that I have to use MASM or equivalent to write out the entire function call.
Note that WINAPI is __stdcall, so the arguments in x86 get pushed onto the stack and are freed by the callee, not the caller. On x64, the first four arguments are passed via registers and shadow stack space is allocated. Since the initial function just takes one argument (the this pointer), and VirtualFree takes three arguments, it should work. The return value is not of any consequence.
Thanks!
Who gave you the impression that code and data of a COM object are deallocated together?
Just imagine what would happen if you wanted to create a New Instance!!
Truth is, the code never gets deallocated - it remains in memory at all times, there are no exceptions that I can think of.
The code, together with the Class Template, are embedded in the binary file you compiled, and remain available pending subsequent utilization.
Only thing that we are releasing is some (typically Heap) memory which represented an Instance of our Class Template (ie, a clone in heap memory).
Therefore, after calling VirtualFree, the CODE is ok to continue executing, as long as it does NOT reference any data fields (since the data did get released).
Now, just to convince you that this is the case, imagine that you were in fact correct about the code being associated with an object instance - this would imply that there's a whole copy of the code for EVERY INSTANCE of the class (which would be silly), and it also implies that the code must be PC-RELATIVE in order to execute at arbitrary address in memory, rather than optimized for some fixed virtual address.. which do you consider is more likely?
Ah, but that is the point. In this case, I am explicitly creating an object that does not require its DLL (with the code sections) to be loaded in memory at all times. The code executes after the DLL is loaded, but the DLL can be unloaded while the object (or at least a barebones fraction of the object) remains in memory.
This is a pretty unusual task, but I thought the whole point of going to low-level Assembly is to unusual things. :thumbsup:
In this case, yes, there's a whole copy of the code for every instance of the class. But the size is minimal, and far less than the size of a page (4096 bytes) to hold it. It does not have to be that way--one could have a reference count on the code and when the last instance is destroyed, the code is also VirtualFree'd.
The code is all PC-relative. In this case, the code is merely stub code: 99% of the main functionality is written in C/C++, not Assembly. Also, I suppose that this is a more general technique for just-in-time freeing of code--it doesn't have to be COM at all.
This is a pretty unusual task, but I thought the whole point of going to low-level Assembly is to unusual things. :thumbsup:
In this case, yes, there's a whole copy of the code for every instance of the class. But the size is minimal, and far less than the size of a page (4096 bytes) to hold it. It does not have to be that way--one could have a reference count on the code and when the last instance is destroyed, the code is also VirtualFree'd.
The code is all PC-relative. In this case, the code is merely stub code: 99% of the main functionality is written in C/C++, not Assembly. Also, I suppose that this is a more general technique for just-in-time freeing of code--it doesn't have to be COM at all.
I'm writing the implementation to it.
Then you answered your own question - rather than "call" the VirtualFree function and return to caller, we JMP to it, and return to OUR CALLER'S CALLER.
At the end of the VirtualFree function, there is a RET.
If we did not CALL the function but instead JMP to it, whose return address is on top of the call stack? We are returning, but not to our DLL code - instead, we are returning to the code that called our DLL stub function which lead to the VirtualFree function via a JMP - makes sense?
In fact we can abstract this further, we could deliberately POP THE RETURN ADDRESS from the stack and shove ANY RETURN ADDRESS we wanted to, and when we return we will then return "to somewhere else" !!!
But I leave this to your avid imagination ;)
Yes, things we can do in ASM don't always follow "the rules" since we are the game masters.
At the end of the VirtualFree function, there is a RET.
If we did not CALL the function but instead JMP to it, whose return address is on top of the call stack? We are returning, but not to our DLL code - instead, we are returning to the code that called our DLL stub function which lead to the VirtualFree function via a JMP - makes sense?
In fact we can abstract this further, we could deliberately POP THE RETURN ADDRESS from the stack and shove ANY RETURN ADDRESS we wanted to, and when we return we will then return "to somewhere else" !!!
But I leave this to your avid imagination ;)
Yes, things we can do in ASM don't always follow "the rules" since we are the game masters.
In order to jump directly to a STDCALL function, you need to effectively remove all stack data associated with the current function, before stacking the arguments for the function you're jumping to. You also need to ensure that the "invariant" registers EBX, ESI, EDI, and EBP have the correct values before jumping to the function.
A simple-minded way to accomplish this is to first stack the arguments, stack a copy of the return address, restore registers, and then use the scratch registers EAX, ECX, and EDX to move the arguments and return address upward to replace the data that must be unstacked. Set ESP, and then jump.
The default installation of VC uses CDECL as the default for "normal" C++ functions, which means you leave the arguments and replace the return address and local variables with the new arguments and a copy of the old return address. If you set the compiler option to use STDCALL or explicitly use __stdcall or WINAPI, you replace/remove the current argument list in addition to the return address and locals.
A simple-minded way to accomplish this is to first stack the arguments, stack a copy of the return address, restore registers, and then use the scratch registers EAX, ECX, and EDX to move the arguments and return address upward to replace the data that must be unstacked. Set ESP, and then jump.
The default installation of VC uses CDECL as the default for "normal" C++ functions, which means you leave the arguments and replace the return address and local variables with the new arguments and a copy of the old return address. If you set the compiler option to use STDCALL or explicitly use __stdcall or WINAPI, you replace/remove the current argument list in addition to the return address and locals.
Thanks all! Looks like I got it working.
Assume the pointer to VirtualFree is in EAX, and the pointer to the memory to free is in EDX
Assume the pointer to VirtualFree is in EAX, and the pointer to the memory to free is in EDX
pop esi
pop ecx ; now ecx contains the return address
add esp,4 ; from ret 4 equivalence (pop arg0 aka this pointer--
; we clean the stack)
push 8000h ; MEM_RELEASE
push 0 ; 0 (size)
push edx ; memory to free
push ecx ; return pointer
jmp eax ; tail-call jump to VirtualFree(...)