Hi all, been a while since I last posted here... Good to see some familiar names still around here :)
I'm working on a exe packer (who isn't) based on Jibz' aPlib. So far things are going quite well and I'm pretty sure that when I'm done the exes will get a tiny bit better compression than FSG. If not I won't release it, let me put it that way ;)

Anyway, I'm planning to use the shellcode method for loading functions. Using checksums instead of function names. Why store long function names when you can store a dword instead.. Now the only problem is that GetProcAddress obviously doesn't support this so I have to use my own code for getting the addresses. No problem there either, code is actually available everywhere. Now I just need to get it as small as possible :)

Here's what I got so far, google for LGetProcAddress and you'll find the original. I'm using a different approach which saves some space. Also I don't do any error checking, if a function is not found you're screwed.

Only requirement is that not too many registers may be destroyed. If you can get it a lot smaller eg by passing the args in a different way feel free to do so, this whole function will probably end up inside some loop anyway..
For testing, you can find a .inc with hashes here.

Thanks in advance for your help! :)

; LGetProcAddress(dwHash, hModule)

LGetProcAddress:
pushad

xor ecx, ecx
LMainLoop:
; All this code must be inside the loop because we destroy EBP later on
; ... which saves 1 byte ;)
mov ebp, [esp+20h+8] ; ebp = base address
mov eax, [ebp+3Ch] ; eax = PE header offset
mov edx, [ebp+eax+78h] ; edx = export table RVA
mov ebx, [ebp+edx+1Ch]
mov edi, [ebp+edx+20h]
add ebx, ebp ; ebx = function address table
add edi, ebp ; edi = function name table
mov esi, [edi+4*ecx]
add esi, ebp ; esi = function name

; Calc hash into edx
xor edx, edx
xor eax, eax
LCalcHash:
lodsb
cmp al, ah
jz short LHashDone
ror edx, 13
add edx, eax
jmp short LCalcHash
LHashDone:
add ebp, [ebx+4*ecx] ; ebp = function address
inc ecx
cmp edx, [esp+20h+4] ; check hash
jne short LMainLoop

LDone:
mov [esp+1Ch], ebp
popad
ret 8


*Edit: Noticed that I wrote "size optimizating", I love it when I'm too tired to write decent english yet still I feel the need to push around bytes :/
Posted on 2005-02-09 19:11:07 by snq
As usual, I found something 5 mins after posting..

Replace
LCalcHash:

lodsb
cmp al, ah
jz short LHashDone
ror edx, 13
add edx, eax
jmp short LCalcHash
LHashDone:

with
LCalcHash:

ror edx, 13
add edx, eax
lodsb
cmp al, ah
jnz short LCalcHash
LHashDone:

2 bytes gone, 63 to go ;)

Also, the xor eax, eax might not be necessary, but it depends on the DLL. If the PE offset is <256 (which it probably is in 99% of all cases) we can remove the xor and save 2 more bytes.. But i'm not sure, dont want to risk crashes on some systems.
Posted on 2005-02-09 19:28:27 by snq
Suggestions:
1) replace edi to esi, since edi is used only once.
2) then replace ebp to edi, which should save several bytes (EBP opcodes are larger)
3) replace cmp al, ah with test al,al, to improve the speed.
Posted on 2005-02-11 03:05:38 by MCoder
remember to support NT forwarded exports...
Posted on 2005-02-11 03:57:51 by f0dder
Here's the current code.. It's not beautiful but it's only supposed to be small and work ;)

edi contains the address of my own import table, which is structured like this:
db "lib1.dll", 0

dd hash1
dd hash2
...
dd hashX
dd 0
db "lib2.dll", 0
dd hash1
dd hash2
...
dd hashX
dd 0
...
db 0 ; terminator, no more libraries

; 83 bytes

LibLoop:
push edi ; edi = dll name
apicall LoadLibrary ; eax = base address
test eax, eax
jz short NoMoreLibs
mov ecx, eax
repnz scasb ; edi = 0 terminated hash array
FuncLoop:
mov ebx, [edi] ; ebx = name hash
stosd ; edi += 4

test ebx, ebx ; next lib if hash==0
jz short LibLoop

xor ecx, ecx
dec ecx
LMainLoop:
inc ecx ; ecx updated to next function
pushad

mov edx, [eax+3Ch] ; edx = PE header offset
mov edx, [eax+edx+78h] ; edx = export table RVA
add edx, eax ; edx = export table

mov esi, [edx+20h]
add esi, eax ; esi = function name table
mov esi, [esi+4*ecx]
add esi, eax ; esi = function name

mov ebp, eax ; ebp = base address
add eax, [edx+1Ch] ; eax = function address table
add ebp, [eax+4*ecx] ; ebp = function address
mov [edi-4], ebp ; store function address

; Calc hash into edx
xor edx, edx
xor eax, eax
LCalcHash:
ror edx, 13
add edx, eax
lodsb
test al, al
jnz short LCalcHash

cmp ebx, edx ; check hash
popad
jne short LMainLoop ; try next function
jmp short FuncLoop

NoMoreLibs:
Posted on 2005-02-11 09:23:05 by snq
remember to support NT forwarded exports...

I haven't really read up on forwarded exports.. But so far I haven't had any problems with this method.
For now the idea is to create a compressor with minimal overhead, for use with 64k or 4k intros, not really for business use so to say. I can live with minor incompatabilities, I'm not planning to make it a too serious project :)
Posted on 2005-02-11 09:31:03 by snq

I haven't really read up on forwarded exports.. But so far I haven't had any problems with this method.

If the address of an exported function lies within the range defined by PE_DIRENT_EXPORT, it is a forwarded export - rather than pointing to valid code, the export will point to a ASCIZ string like "ntdll.whateverfunction". This is done a lot on NT, and things like kernel32.heapalloc is forwarded to ntdll.
Posted on 2005-02-12 07:24:35 by f0dder
Hmm.. Thanks for pointing this out, I guess there might be a problem there, if even commonly used functions like HeapAlloc won't work with this method.. I always use GlobalAlloc myself because it requires less code, so I probably wouldn't have discovered it.
So I can solve this in 2 ways. Either I check for forwarded imports in the compressor and load in this case HeapAlloc directly from ntdll, or I will have to use GetProcAddress for getting the addresses after all. I might end up doing 2 different loaders, one using hashes and using LGetProcAddress, and one using the actual names and regular GetProcAddress. And then use whichever one results in a smaller executable.
Posted on 2005-02-13 06:15:32 by snq
I just came up with this. It seems a bit too long, though, and only supports alphabetic names. It's 89 bytes. But it lets you forget about forwarded names.
mov edi,FuncTable

xor eax,eax
cdq
mov ebp,readbits
u2:
push edi
mov edi,buffer
push edi
push eax
u0:
call ebp
db 4
xchg ecx,eax
jecxz next
mov bl,64
u1:
call ebp
db 5
or al,bl
mov bl,96
stosb
loop u1
jmp u0
next:
mov ebx,__imp__GetProcAddress@8
pop ecx
loop gpa
pop eax
push ecx
push eax
push [esp+16]
call [ebx]
pop edi
stosd
jmp u2
load:
call [ebx+__imp__LoadLibraryA@4-__imp__GetProcAddress@8]
pop edi
pop ecx
push eax
call ebp
db 6
jmp u2
readbits:
pop esi
lodsb
push esi
push ecx
xchg ecx,eax
rb0:
bt [data],edx
rcl eax,1
inc edx
loop rb0
pop ecx
ret
Posted on 2005-02-13 07:16:24 by Sephiroth3
Sephiroth3, I'm having a hard time understanding what your code is supposed to do ;) Also yes, it does seem a bit big.
I wrote a regular loader now that uses LoadLibrary, it's 46 bytes. The import table is compressed using aplib together with the rest of the original exe, unfortunately the aplib algo doesn't seem to work very good on plain text but I'll have to live with that.
Anyway, I just produced my first working exe! :) It's 856 bytes large at the moment and all it does is show "Hello world" and call some more functions just for testing the loader. But so far so good :)
Posted on 2005-02-13 18:16:05 by snq
I got it down to 741 bytes now and still a lot of optimization to do and a lot of unused space.
I uploaded the exe here, if anyone wants to test be my guest :) If it works it'll show a "Hello world" msgbox. It won't work under win9x (yet).

*Edit:
Just tried compressing the same exe with some other packers.. The original exe has 3 sections, data/code/import, and is 2.5k large.
FSG compresses it to 857 bytes which is nice but not as nice as 741 bytes :-D
Latest UPX didn't want to compress it, said it was uncompressable.
UPX v0.71 did want to compress it after a lot of threatening, but the result was 2173 bytes. I'm pretty sure for larger exes UPX will in most cases give better compression tho.
And finally I tested with FRP. Probably nobody has heard of this packer but it's a modified version of UPX that uses aPlib as well, made by farb-rausch. The resulting exe was 1.5k.
So, my packer beats em all, at least for ridiculously small and useless exes ;)

Edit2: Down to 721 bytes now. One "problem" tho, I still have 48 bytes of wasted space in it.. What to do, what to do.... Any ideas? I guess I should move the start of the section back by 48 bytes, as its not 9x compatible anyway right now..
Posted on 2005-02-13 18:42:11 by snq
Did you compare with the other packers:

http://pect.y11.net/
Posted on 2005-02-14 12:56:15 by MCoder
No, not yet.
I added support now for loading functions by ordinal and tested with another exe, 6.5k containing lots of code and imports. Again it was smaller than all others. I'll do some more extensive testing once I finish the packer.. Right now I have to manually edit adresses and sizes in my loader source for each exe so it's kind of a pain in the ass to test ;)
Posted on 2005-02-14 13:45:03 by snq
I tested now with DemoThread.exe and keygen.exe. DemoThread came up a bit bigger than MEW and FSG, 12100 bytes I think it was. keygen.exe however was 3956 bytes, a #1 placement :)
Didn't test the other 2 exes yet because my code isn't ready for those kind of sizes just yet ;)

I just keep editing my posts... But after stripping some more useless tables from the exe, I got DemoThread.exe down to 11116 bytes so #1 there as well now ;)
Posted on 2005-02-14 21:01:07 by snq
my apicrc engine + example. it's not aimed for size (it's ~230 bytes), but rather elegance, so don't use it on shellcodes (unless you size-hack it quite a bit ...), posted this awhile back. this will work on all versions of windows, but has yet to support forwarded exports (it shouldn't be anymore than a few lines ... but i'm lazy)

http://angrypanda.net/~sfeng1/apicrc.zip

module_api: 

db 'module.dll',0
__api1 dd hash1
__api2 dd hash2
dd 0


the 'table' terminates with a dword, for time's sake, and the hashes are subsequently replaced with the addresses of the APIs, which you can use to call with a delta offset. an example is included inside apicrc.asm. use get_crc32 to obtain the crc32 hashes, the string is automatically null-terminated.

enjoy.
Posted on 2005-02-15 00:16:36 by Drocon
Thanks for posting that Drocon, always interesting to see other peoples approaches.. But as you said, your code is a bit on the large side ;)
I've added back my hash loader, but it's only used for functions where it can be used and where it will save space. So it won't be used for forwarded stuff and imports by ordinal. Furthermore it is only added if it actually saves space. If an exe only has 2 imports it will only cost space so it won't be included and regular names will be used.. But if it has eg 80 imports it will save a lot of space so then it will be included and used :)
Posted on 2005-02-15 07:38:40 by snq
1 byte optimization tip :)

(fasm syntax)



format PE GUI 4.0
entry start

include "d:\programy\kod\fasm\include\win32a.inc"


section ".code" code readable writeable executable


start:

mov edi,dll1

; 82 bytes

LibLoop:
push edi ; edi = dll name
call [LoadLibraryA] ; eax = base address
test eax,eax
jz NoMoreLibs

mov ecx,eax
repnz scasb ; edi = 0 terminated hash array

FuncLoop:
mov ebx,dword [edi] ; ebx = name hash
stosd ; edi += 4

test ebx,ebx ; next lib if hash==0
jz LibLoop

sub ecx,ecx
dec ecx

LMainLoop:
inc ecx ; ecx updated to next function
pushad

mov edx,dword [eax+0x3c] ; edx = PE header offset
mov edx,dword [eax+edx+0x78]; edx = export table RVA
add edx,eax ; edx = export table

mov esi,dword [edx+0x20]
add esi,eax ; esi = function name table
mov esi,dword [esi+4*ecx]
add esi,eax ; esi = function name

mov ebp,eax ; ebp = base address
add eax,dword [edx+0x1c] ; eax = function address table
add ebp,dword [eax+4*ecx] ; ebp = function address
mov dword [edi-4],ebp ; store function address

;----------------------------------------

; Calc hash into edx
; xor edx, edx
; xor eax, eax

sub eax,eax
cdq ;edx = 0

;----------------------------------------

LCalcHash:
ror edx,13
add edx,eax
lodsb
test al,al
jnz LCalcHash

cmp ebx,edx ; check hash
popad
jne LMainLoop ; try next function
jmp FuncLoop

NoMoreLibs:

ret

section ".data" data readable writeable executable

dll1 db "kernel32",0
dd 0x73E2D87E ;ExitProcess
dd 0
dll2 db "user32",0
dd 0xCAD36F3B ;ChangeDisplaySettingsA
dd 0x84454941 ;CreateWindowExA
dd 0x2B245A7A ;GetAsyncKeyState
dd 0xCC248D43 ;GetDC
dd 0xBC4F79F4 ;SetCursor
dd 0

db 0 ;terminator, no more libraries


section '.idata' import data readable writeable executable

library kernel32,"kernel32"

import kernel32,\
LoadLibraryA,"LoadLibraryA"


important: use library`s names without ".dll" extension, it`ll save some bytes, and it`s still compatible under xp. i`m 4k coder too. with my friend, wrote only xp-compatible noimport code. i suppose, that you know, what is it doing. maybe it`ll help you.

(masm syntax)




;noimport.asm (c) Northfox, rambo, reverend
;49 bytes.
;(+2 bytes - compatible version)

.386
.model flat,stdcall

salc macro
db 0d6h
endm

.code

start:

pop ecx
push ecx

@@: cmp word ptr [ecx-1],"ZM"
loopnz short @b

mov ebp,ecx

add ecx,dword ptr [ecx+60]
mov eax,dword ptr [ecx+120]
mov esi,dword ptr [ebp+eax+28]
add esi,ebp
mov eax,dword ptr [ebp+eax+32]
mov edi,dword ptr ds:[ebp+eax]
add edi,ebp

@@: salc
@e: cmp word ptr [edi],"Ac"
jz short @f

scasb
jnz short @e

lodsd
jmp short @b

@@: lodsd
add eax,ebp

;eax - GetProcAddress
;ebp - kernel32.dll

ret

end start


and what with opengl32 and glu32 hash-table? are you using ordinal importing opengl instead of hash? i think, that it`s better idea, and may be even compatible. the only disadvantage is that breakpoint2005 rules forbid ordinal importing.. :)

(fasm syntax)



; supported libraries:
;
; user | dll version | first function | address |
; ----------+---------------+-------------------+----------+
; rambo | 4.0.1379.1 | DllInitialize | 7d27bbfe |
; rambo | 4.0.1381.4 | DllInitialize | 78a4bf5e |
; vorg | 5.0.2160.1 | DllInitialize | 69408c55 |
; epsylon | 5.0.2195.6611 | DllInitialize | 69408c55 |
; ----------+---------------+-------------------+----------+
; ninja | 5.1.2600.0 | GlmfBeginGlsBlock | 5f1a7c8c |
; rambo | 5.1.2600.1106 | GlmfBeginGlsBlock | 5f1a7c8c |
; chebdo | 5.1.2600.2180 | GlmfBeginGlsBlock | 5f1aa6da |
; luks | 5.1.2600.2180 | GlmfBeginGlsBlock | 5ed1a6da |
; -> strange, but i handle it. (sp2) |


mov ebp,import_strings
mov edi,import_table+4

@glu: sub esi,esi
@@: inc esi

push esi
push ebp
call [LoadLibraryA]

push eax
call [GetProcAddress]

cmp byte [ebp],"g"
jz @nou

dec esi
jnz @not

bswap eax
sub al,5eh
jz @not
dec al
jz @not
sub edi,4

@not: inc esi
@nou: stosd

cmp si,16fh
jb @b

add ebp,9
test eax,eax
jnz @glu


optimizations are welcome! :)
Posted on 2005-03-05 18:22:53 by rambo
Ahh.. An understanding soul :D
I already changed the cdq actually but thanks anyway ;)

Unfortunately I haven't had any time to get any work done on my packer but I agree the opengl imports are important especially for 4k intros.

This is the last version of my code.. I separated the LibLoop and FuncLoop for now so I can easily leave out the hash stuff where it won't save any bytes.

I'll probably never have time to code a 4k intro anyway but I want to use my micro softsynth :) Problem is also I live in the very north of Sweden now and there are no demoparties here (and I refuse to travel a couple 1000 km for a party) so even if I make a 4k intro where am I going to release it...

Anyway, here's the code. Feel free to use it for whatever you like.
imports are now stored in a slightly different format. default is 0-terminated function names, if the first byte of the name is -1, use hash loaded from the next 4 bytes. if the first byte of the name is -2, use ordinal.
LoadLibs:

pop esi
.LibLoop:
push esi ; esi = dll name
call [ebp+4] ; apicall LoadLibrary
test eax, eax ; if hmodule = null, no more libs
jz short .NoMoreLibs
xchg eax, ebx ; ebx = hmodule
.SkipDllName:
lodsb
test al, al
jnz short .SkipDllName

lodsd
xchg eax, edi ; edi = function table

.FuncLoop:
; cmp byte [esi], 255
cmp byte [esi], bl ; bl = 0
pushfd
jge short .AsString
lodsb
inc al
lodsd
jz short LGetProcAddress
.AsOrdinal:
push eax
jmp short .FuncLoop2
.AsString:
push esi ; esi = func name
.FuncLoop2:
push ebx
call [ebp] ; apicall GetProcAddress
.StoreFuncAddress:
stosd ; store func address
popfd
jl short .FuncLoop
test eax, eax
jz short .LibLoop
.SkipFuncName:
lodsb
test al, al
jnz short .SkipFuncName
jmp short .FuncLoop

.NoMoreLibs:

ret ; jmp entrypoint
.End:

;----------------------------------------------------------------------------

; ebx: base address
; eax: hash
; edi: dest address table
; destoys ecx
LGetProcAddress:
xor ecx, ecx
dec ecx
.MainLoop:
inc ecx ; ecx updated to next function
pushad

xchg eax, ebx ; eax=base ebx=hash

mov edx, [eax+3Ch] ; edx = PE header offset
mov edx, [eax+edx+78h] ; edx = export table RVA
add edx, eax ; edx = export table

push eax
mov [edi], eax
add eax, [edx+20h] ; esi = function name table
mov esi, [eax+4*ecx]
pop eax
add esi, eax ; esi = function name
add eax, [edx+1Ch] ; eax = function address table
mov eax, [eax+4*ecx] ; ebp = function address
add [edi], eax

; Calc hash into edx
xor eax, eax
cdq ; xor edx, edx
.CalcHash:
ror edx, 13
add edx, eax
lodsb
test al, al
jnz short .CalcHash

cmp ebx, edx ; check hash
popad
jne short .MainLoop ; try next function
mov eax, [edi]

jmp short LoadLibs.StoreFuncAddress
.End:
Posted on 2005-03-06 07:50:43 by snq
i don`t understand one line.. :) :



.FuncLoop:
; cmp byte [esi], 255
cmp byte [esi], bl ; bl = 0
pushfd
jge short .AsString
lodsb
inc al
lodsd
jz short LGetProcAddress


why are you increasing al? eax is cleared by some value in next line!

and about making 4kb without going to the party.. here, in poland i also haven`t near to a GOOD party, but 4k is not only you, there are also musician and probably 3d modeller, if you have some little-tricky 3d format, so making it for their pleasure at party is many much funnier :)
Posted on 2005-03-07 02:13:42 by rambo
inc     al

lodsd
jz short LGetProcAddress

eax is cleared/reassigned yes, but the flags remain in tact after lodsd :) i use inc al to check if al is -1 or -2 (hash or ordinal).

Our musician can't code :/ I wrote a softsynth for 64k first that we have never used in an intro yet.. And like 2 years ago I wrote a micro version of it for 4k intros, but we (aardbei) got kinda lazy and inactive :(
I still really want to do a good 4k intro tho. Even if it would be my last intro ever. All I want is for it to be as good as PTCT was, thats not too much to ask is it ;)
Posted on 2005-03-07 03:11:39 by snq