i posted this article b/c some of u might be interested in
its not my article & i dont have any idea if it works
thx

Basic Overview

Self-modifying code is a technique in which the application writes or modifies portions of it's own code at run-time.

Windows 95 has higher data protection than MS-DOS. Applications are normally given access to memory for

their data segments and stack, but they are not permitted to modify their own code. In order to do this we must first ask Windows 95 for permission by calling the VirtualProtect() function. When you call VirtualProtect() you pass in the address of the first byte you want to modify, the number of bytes you want to work with and a flag indicating what you want to do with the memory (ie read it, write to it, execute it etc). You also pass in the address of a variable which the function fills with the previous protect state so you can restore it when you're done.

The following is a portion of Win32 code that demonstrates self-modifying code. The assembly statement immediately after the myloop label is "mov dword ptr a,0x12345678". The preceding statements change the op code to "mov dword ptr a,0x87654321". Try placing this code inside a C/C++ function and stepping through it with a line debugger:

LPVOID address;

// Get the address of the dword we need to change

_asm mov dword ptr address,offset

// Ask windows for permission to modify the code

result = VirtualProtect(address,4,PAGE_WRITECOPY,&oldprotect);

// Modify it in assembly. This is equivelelent to *(LPLONG)address = 0x87654321

_asm mov ebx,dword ptr address

_asm mov dword ptr ,0x87654321

// All done

result = VirtualProtect(address,4,PAGE_EXECUTE,&oldprotect);

myloop:

_asm mov dword ptr a,0x12345678



Creating New Portions of Code

It's also possible to generate new code from scratch. This technique is particularly handy for things such as precompiled sprites and precompiled texture mapping loops. As a simple example, I'll show how to create a function that accepts two long integers and returns the sum. For convenience I'll declare a function pointer type and a variable of that type:

typedef LONG (* FunctionType)(LONG, LONG);

FunctionType ComputeSum;

ComputeSum is a variable of type FunctionType, a type that can point to functions with the following format:

LONG Function(LONG, LONG)

The first step is to determine the actual op-codes that will go inside the function. I've already done this (which I'll show next) and it turns out that we need 11 bytes of memory to store them. How you allocate the memory for your functions is up to you, I'll use the new operator:

ComputeSum = (FunctionType) new BYTE[11];

We can now fill the array with the actual op code values:

((LPBYTE)ComputeSum)[0] = 0x55; // push ebp

((LPBYTE)ComputeSum)[1] = 0x8B; // mov ebp, esp

((LPBYTE)ComputeSum)[2] = 0xEC;

((LPBYTE)ComputeSum)[3] = 0x8B; // mov eax,

((LPBYTE)ComputeSum)[4] = 0x45;

((LPBYTE)ComputeSum)[5] = 0x08;

((LPBYTE)ComputeSum)[6] = 0x03; // add eax,

((LPBYTE)ComputeSum)[7] = 0x45;

((LPBYTE)ComputeSum)[8] = 0x0C;

((LPBYTE)ComputeSum)[9] = 0x5D; // pop ebp

((LPBYTE)ComputeSum)[10] = 0xC3; // ret eax

The final step is to call VirtualProtect(). We already have the op-codes in place, so we only have to ask for access to execute the code segment. The following call will do the job in this case:

VirtualProtect(ComputeSum, 11, PAGE_EXECUTE, &oldprotect);

As before, the variable oldprotect should be of type DWORD. At this point we have a pointer to a valid function and we can call it with C code such as the following:

sum = ComputeSum(1, 2);

sum = ComputeSum(val1, val2);

The only thing that remains is for us to delete the code segment before our application terminates. This is a simple matter of restoring the access protection back to it's original value and calling the appropriate function to delete the allocated memory, eg:

VirtualProtect(ComputeSum, 11, oldprotect, &oldprotect);

delete (LPBYTE)ComputeSum;

One last point: if you want your self-modifying code to be compatible with Windows NT then you'll also need to call the FlushInstructionCache() function after modifying any code segments. This is not a requirement under Windows95 since it is a single-CPU operating system, but I strongly recommend calling it anyway to avoid compatibility problems with future Windows releases.
Posted on 2002-07-11 21:38:21 by b0z0
b0z0,

There is a piece of example code in MASM32 called SMC that shows how runtime code modification is done. It is useful for protection schemes among other things.

Regards,

hutch@movsd.com
Posted on 2002-07-11 23:38:34 by hutch--
b0z0,

just change the characteristics of your code section to writable
Posted on 2002-07-12 00:16:37 by masquer
For small pieces of code, just build it on the stack and jmp/call to stack space. And don't forget how important it is to modify your code that self modifies your code... :)
Posted on 2002-07-12 05:39:24 by bitRAKE

For small pieces of code, just build it on the stack and jmp/call to stack space. And don't forget how important it is to modify your code that self modifies your code... :)


Hey... neat tricks!


Also, on really really old computers (those without separate data and code caches), self-modifying code had uses in optimization. It is very rarely used in optimization today because there is a time penalty when executing code that was recently modified (because it's still in the write-data cache, the code cache needs to be updated).
Posted on 2002-07-12 19:51:40 by AmkG

It is very rarely used in optimization today because there is a time penalty when executing code that was recently modified (because it's still in the write-data cache, the code cache needs to be updated).
I get around this penalty by creating the code on the stack, and then jumping to another piece of code - which returns to an address on the stack. You can even blend this in with the typical stack frame code - making it harder to see. While the program is initializing it is also slowing creating some key pieces of code from the registration key. The poor would be cracker will be sucking down coffee and scratching his head for hours trying to figure out what happens where. :) Granted, he could just create those pieces of code and patch them in, but this can be very hard to impossible and out of reach of most byte patching baboons. Combine this with good software watermarking - that way a register version isn't just passed around without consequences.
Posted on 2002-07-12 20:40:24 by bitRAKE
Ages ago i made com files with debug and did the recursive thing in which my program continually modified its own code. Then i read that it wasn't a good idea to modify code in this way. I personally believe that progs that are self modifying are at the cornerstone of true artificial intelligence. Its only M$ and its buddies that make it difficult to attain this on their platforms as they wish to be the first.

:alright:

oh yah..and i never once got a crash from self modification. But that was on the older systems...386 downward with win 3.1
Posted on 2002-07-12 21:42:26 by IwasTitan
bitRAKE: which consequences? And what if the user tells to the judge that somebody stole his purchased copy of your program?

Moreover, why do you jmp elsewhere and then again on the stack? Haven't tested this yet on the Athlon-XP, but I see no reason why a jmp direct to the stack wouldn't be the same, penalty-wise (just one less passage).

I haven't tested this (or I don't recall the results of my test), but I'm quite certain that your concern is not technical, but just to give one more headache to the wannabe cr*cker. Do you confirm?
Posted on 2002-07-13 03:10:21 by Maverick
There is a certain amount of myth about the performance penalty that is supposed to follow from self modifying code. When the term is used to refer to modifying a register that is already in the pipeline, you get a stall until the pipeline is clear which IS the performance penalty from writing self modifying code.

When the same term is used to modifying code in the code section, the problem does not hold, read/ write to memory is no big deal and unless you were particularly clever, you probably could not get all of the code directly into the cache anyway.

If you could master this trick, the solution would be to modify the code earlier in the program and come back and execute it later. What you will get at the worst is a stall if you modify the code and THEN run it immediately after it has been modified.

Regards,

hutch@movsd.com
Posted on 2002-07-13 05:26:53 by hutch--
bitRake,
i like your idea of running code that is on the stack. I see it as working something like this:



- start a proc
- create as many local DWORD vars as you need to cover the
amount of instructions you want to execute (coz LOCALS are
created on the stack)
- set all those vars to specific values, those values could be from
the .data section
- run a math algo on those vars to change them to the actual
code and data you require, maybe the algo could use the CRC
of the app as a type of key value
- execute the new code
- if the CRC is wrong because the app has been patched, or
because some cr4cker is messing with the registers, there is a
great chance the app will GPF, which you should be able to
catch with exception handling.



It is kind of like a mixture of the old Roman letter transposing ciphers, mixed with hostile 'buffer overflow to execute code on the stack' methods. I like it, it is kind of poetic :)
Posted on 2002-07-13 05:40:10 by sluggy

bitRAKE: which consequences? And what if the user tells to the judge that somebody stole his purchased copy of your program?

Moreover, why do you jmp elsewhere and then again on the stack? Haven't tested this yet on the Athlon-XP, but I see no reason why a jmp direct to the stack wouldn't be the same, penalty-wise (just one less passage).

I haven't tested this (or I don't recall the results of my test), but I'm quite certain that your concern is not technical, but just to give one more headache to the wannabe cr*cker. Do you confirm?
The consequences are me publicly outlining the situation and using their personal information, so everyone knows what kind of asshole this guy is. Might not catch him this time, but maybe someone will. Locks are to keep good men honest - all locks can be broken by bad men.

You are correct - I haven't tested for a penalty. Technically, I only understand there would be a penalty if you were modifying code in the same cacheline being executed but haven't tested this either. Also, it makes sense that there might be a penalty if you modify any data that is in the instruction cache already. Jumping to another section of code creates greater convolution. Doing it the same everytime isn't good - IDA can be programmed to decypher these sections of code.
Posted on 2002-07-13 10:33:50 by bitRAKE
bitRake is right. I have a ThunderBird CPU. With 128KB total L1 cache. 64KB for Ins and 64KB for data. When you write data to memory it goes into the data cache. But when you jump to this address the CPU has to write back the data in the cache to memory so that it can be fetched by the Instruction Cache. Maybe the Athlon/XP has some way to prevent the memory access but I heavily doubt this as self modified code isnt used that much these days.
Posted on 2002-07-13 20:20:19 by x86asm
bitRake: That lock comment just reminded me that I meant to
ask you if you realized that "bitraking" is a locksmithing term
for a particular method of picking a pin tumber lock.
Posted on 2002-07-14 02:13:20 by Canite
Canite, my father's father was a watch maker, and my father a locksmith for a short time. It has been a long time since anyone mentioned that. And the tool we call a bitRAKE. Little ones have to do something to pass the time. :grin:
Posted on 2002-07-14 03:33:14 by bitRAKE
Ricky,

You must have been pretty lean and flexible as a kid to be used to pick locks with. :tongue:

Regards,

hutch@movsd.com
Posted on 2002-07-14 06:10:15 by hutch--