Hello all,

I'm currently working on a "Yet Another PE Protector" type of project and have come to a point where I've decided to employ offset-independent code (Position independent code for the *nixers) not for lack of memory but more for hiding the more sensitive parts of the app from prying eyes namely messers debugger and disassembler. A few points about this particular protector worth noting is that the protected app and the protector are one and the same i.e. the logic and processing of the protected app is embedded into the protector app. The protector will only unpack procs/code sections when needed and then repack them as soon as they're done processing. Now the unpacked code can either be run from a normal code section, the stack (hehe), an allocated buffer, a global code pool (buffer), etc. Thus the need for offset independent code.

Now I've got a few notions I'm toying with on how I should implement this: 1) patch RVAs after unpacking (i.e. protector does patching) or 2) Unpacked code corrects its own offsets according to its beginning address after unpacking. There are other things I'm considering but these two seem to be my favorites for the time being.

Any hints, pointers, samples regarding this matter would be highly appreciated.

Thanks in advance,

Sheldon
Posted on 2007-11-20 02:42:16 by Shell
forget running from the stack - dep does not allow it at all, so you'll limit your market / userbase if you do this...

use a delta to work for you - does not require hard coded offsets

call delta
delta:
pop esi
mov eax,
..... more code here

data_i_want equ $
<your data goes here>

if you're planning code relocation you're in for some headaches, all e8 calls will need fixed up, as well as any relative indexing used within the proc...

personally i suggest you research some more on the whole topic and look at some other protectors, simply because all the information you give implies you're not an expert in this field...
Posted on 2007-11-20 04:06:47 by evlncrn8
Thanks for the response evlncrn8. I believe I need to provide a little more info. The offset-independent paradigm is not new to me. However, doing it in the Win32 environment is.

Code running off the stack is part of an "Exception Handling" kludge and should not be hampered by DEP (theoretically).

As for the hard-coded offsets, I'm afraid those are an unnecessary evil for what I'm trying to achieve - the entire "protected app" is unpacked on the fly so the need for data sharing/communication dictates that I use hard-coded offsets unless I bury all the data within the code segments relative to the procs that use them (ugly code) and global vars would still be hard-coded.

if you're planning code relocation you're in for some headaches, all e8 calls will need fixed up, as well as any relative indexing used within the proc...
That's kind of the reason for this thread, I was wondering if you or any other kind soul would point out the pitfalls to watch out for like you did here, so thank you for that. You're input is very much appreciated, and I'll be sure to keep them in mind. I'm well aware that code relocation is a daunting task and should be left to the PE loader to do at app start but being asm programmers that we are, there shouldn't be any reason why we can't try  :lol:
Posted on 2007-11-21 02:30:24 by Shell
I'm also interested it what it takes to relocate code.  I've found in my timing tests when developing routines, that the location in memory makes a big difference in how long it takes to code to execute on my flakey AMD.  That makes it difficult to do meaningful comparisons.  I've also found that if I relocate the routines under test to the same spot in memory when testing, I get much better and repeatable results.  So knowing the pitfalls of relocating the code would be a valuable discussion for me.
Posted on 2007-11-21 09:38:44 by JimG
e8 calls - either local or going to a jmp

jump tables (if you relocate the code, you have to adjust these too) - like call/jmp eax/ jmp and so on

hard coded va's (jump tables kind of) too, could be in data section as well

section characteristics issues (vc8 has a nice headache about that)
and you'll obviously need a decent disasm->asm->compile as well for proper relocation

some code might also require specific alignment

i would forget about code running off the stack, its an old trick, its shit now, and with dep etc, there are far too many overheads to consider to make it worthwhile
Posted on 2007-11-22 03:01:44 by evlncrn8
Again: forget about running off the stack. NX bit, anyone? Yeah sure, VirtualProtect, but just use heap or VirtualAlloc memory instead.

Forget about "re-compressing", leave compressed code alone and dealloc uncompressed instead of re-compressing, why waste more clock cycles than necessary?

Either write your code position-indepently (x64 helps :)), write for a VM format that's easier to relocate, or give up. Forget about writing a disassembler to fix up references. Using PE relocations is a possibility, but you need to isolate just the relocs for the block of code you're uncompressing, which could give an attacker a good idea where blocks start and end.

Writing code can give quite some performance penalties on some CPUs, so don't use this for performance-sensitive stuff.
Posted on 2007-11-22 07:27:32 by f0dder
I've been getting help on this topic from other venues as well, and everyone seems to be in agreement when it comes to code running off the stack - very bad idea, don't do it, you're crazy, etc... So for now - I'll limit stack usage to Vector/Structured Exception Handling stuff (maybe if I mentioned my full handle was shellcoder everyone would change their mind but that's beside the point  :P )

@f0dder: No use for VirtualProtect in this particular venture, lots of VirtualAlloc calls though (not to mention the whole thing already has merged sections and RWE specified at link time anyway).

The recompressing is more of an anti-debugger trick (if breakpoint set crc of recompressed gets screwed so can't decompress again later and/or debugger detected sort of thing - this is a very rough description :D ) You're right of course about the more trivial procs which simply get de-allocated after use.

As you noted PE relocs are possible but a dead giveaway so that's kinda why I'm trying to reinvent the wheel in this sense (when trusted/standard procedures just won't do - roll your own).

VM of course is the next logical step to what I'm trying to achieve but I'll leave that for version 2. Curious - Full blown VM or just PCode translator stuff?

Thanks for all the input.

@evlncrn8: No need to disasm->asm->compile. I'm leaning more towards ways to write code that expects to be relocated like f0dder suggested. Hard-coded VAs I'm still flip-flopping on at the moment, but, your suggestion of adjustable jumptables just gave me an idea - why not borrow from OOP and use v/ftables so unpacking a proc could be considered an creating an INSTANCE of that proc - data and pointers (funcs & company) are all in the vtable - something to mull over during the weekend. Maybe even come up with some sample snippets (to whet JimG's appetite - not making any promises though)
Posted on 2007-11-23 02:56:32 by Shell
also bear in mind os support, some things you can do in 2k you cant in xp etc...
for example vectored exception handling is an xp (or higher) thing....
rwe of the merged sections will break vc8 compiled programs, as they have internal checks on section characteristics - upx got hit by this a while ago....
i'd really suggest you heavily  research os differences, work on reloc stuff, and setup a vmware system of 98, me, 2k, xp, xp mce, vista, server 2003 and server 2008 and heavily test on all of them... the jump table stuff also isnt too easy as its not always 'spottable'...

just research and experiment, then build up a do and dont list and work from that would be the best advice i can give...

also testing your 'protected' executables with various popular anti virus programs mightn't be a bad idea either...
Posted on 2007-11-23 07:52:06 by evlncrn8
vc8?

Which version is that? Please use Visual Studio release names when mentioning vc versions :)
Posted on 2007-11-23 08:09:45 by f0dder
IIRC, 2005.
Posted on 2007-11-23 13:04:36 by ti_mo_n
Hm, I haven't had any problems with RWE'ing my vs2005 apps... but I think vs2005 is when they started adding the XP "image configuration" information, which has things like thread affinity and such... older UPX versions didn't know about this information and didn't zero out the RVA+Size fields in the PE directory table, and windows of course bitched about corrupt exe.

So, as far as I know, there's no problem with RWE, you just have to nuke or copy the image config data.
Posted on 2007-11-23 17:51:15 by f0dder
rwe of the merged sections will break vc8 compiled programs, as they have internal checks on section characteristics - upx got hit by this a while ago....

I doubt Visual C++ compiler from Visual Studio 8 generates such checks itself. It is possible you are linking to library that performs these checks - in such case, just find analternative for that library.
Posted on 2007-11-24 02:57:54 by vid
I'll have to check with the other moderators whether they agree before I post it, but I wrote a set of masm macros which make writing pc-relative code easy as pie. If theres no objection, and since I have seen my code has (clearly) already been leaked, I will post it.

Posted on 2007-11-24 03:13:27 by Homer

rwe of the merged sections will break vc8 compiled programs, as they have internal checks on section characteristics - upx got hit by this a while ago....

I doubt Visual C++ compiler from Visual Studio 8 generates such checks itself. It is possible you are linking to library that performs these checks - in such case, just find analternative for that library.


#ifdef CRTDLL
        _fpmath(initFloatingPrecision);
#else  /* CRTDLL */
        if (_FPinit != NULL &&
            _IsNonwritableInCurrentImage((PBYTE)&_FPinit))
        {
            (*_FPinit)(initFloatingPrecision);
        }
        _initp_misc_cfltcvt_tab();

-----------

thats just 1 example...

/***
*BOOL _IsNonwritableInCurrentImage
*
*Purpose:
*      Check if an address is located within the current PE image (the one
*      starting at __ImageBase), that it is in a proper section of the image,
*      and that section is not marked writable.  This routine must be
*      statically linked, not imported from the CRT DLL, so the correct
*      __ImageBase is found.
*
*Entry:
*      pTarget - address to check
*
*Return:
*      0        Address is either not in current image, not in a section, or
*                in a writable section.
*      non-0    Address is in a non-writable section of the current image.
*
*******************************************************************************/

\src\pesect.c

finding another alternative library is a workaround, not a solution
as i mentioned before upx was hit by this, so anyone coding a protector has to handle such issues....

Posted on 2007-11-24 05:36:18 by evlncrn8

I'll have to check with the other moderators whether they agree before I post it, but I wrote a set of masm macros which make writing pc-relative code easy as pie. If theres no objection, and since I have seen my code has (clearly) already been leaked, I will post it.


The thread is already in the right forum, and I can't see how MASM macros could be that dangerous... or could they :lol:

I think the most outright dangerous about IP-relative code is how hard the program may crash on the slightest miscalculation :P
Posted on 2007-11-24 12:59:09 by SpooK
Actually, I co-wrote it with Bryant.
Ask him to show you the K32B file.
I'd have to slave up an old drive to get hold of it, and I don't know which drive it was on.
That seems like work :P



Posted on 2007-11-24 17:01:33 by Homer