Hi!
I made some assembler compression and decompression code available. It is small, very fast, and compresses quite well. I released it under the zlib license.
If the attach does not work, it's also at: http://home19.inet.tele.dk/jibz/files/uflz20021112.zip
I made some assembler compression and decompression code available. It is small, very fast, and compresses quite well. I released it under the zlib license.
If the attach does not work, it's also at: http://home19.inet.tele.dk/jibz/files/uflz20021112.zip
However, there is one problem tho? Where is the masm code? :o
Hehe, Well I guess the conversion cant be that complicated? :alright:
Hehe, Well I guess the conversion cant be that complicated? :alright:
The benefit of the NASM code is that it will work with other linkers as well. The code also works with DJGPP or on linux. E.g. you can assemble it using:
nasm -f win32 <sourcefile>
and link it into your code using MS link. But yes, it should be fairly easy to translate to MASM too :-)
Here are a few statistics in case somebody is wondering how it compares. I compared with LZOP -1 and comrades lz77 implementation:
calgary corpus:
lz77: 7.8 sec -> 2,251,910
lzop: 1.0 sec -> 1,582,455
uflz: 1.5 sec -> 1,296,357
canterbury corpus:
lz77: 5.0 sec -> 1,402,493
lzop: 0.9 sec -> 1,151,925
uflz: 1.2 sec -> 959,429
gcc-2.7.1 source code:
lz77: 58.5 sec -> 17,995,196
lzop: 4.0 sec -> 11,040,001
uflz: 7.8 sec -> 8,203,678
netscape.exe (from ACT):
lz77: 6.9 sec -> 2,114,336
lzop: 1.1 sec -> 1,801,764
uflz: 1.7 sec -> 1,639,410
nasm -f win32 <sourcefile>
and link it into your code using MS link. But yes, it should be fairly easy to translate to MASM too :-)
Here are a few statistics in case somebody is wondering how it compares. I compared with LZOP -1 and comrades lz77 implementation:
calgary corpus:
lz77: 7.8 sec -> 2,251,910
lzop: 1.0 sec -> 1,582,455
uflz: 1.5 sec -> 1,296,357
canterbury corpus:
lz77: 5.0 sec -> 1,402,493
lzop: 0.9 sec -> 1,151,925
uflz: 1.2 sec -> 959,429
gcc-2.7.1 source code:
lz77: 58.5 sec -> 17,995,196
lzop: 4.0 sec -> 11,040,001
uflz: 7.8 sec -> 8,203,678
netscape.exe (from ACT):
lz77: 6.9 sec -> 2,114,336
lzop: 1.1 sec -> 1,801,764
uflz: 1.7 sec -> 1,639,410
Impressive Jibz - as always! I will see what I
can do with the decompressor (size wise). :)
can do with the decompressor (size wise). :)
Sounds great! .. here is a 107 byte version to get you started ;-)
Here is a new package, including the 107 byte decompressor and a 352 byte version of the compression code. I also added makefiles for Borland C++ and GCC on Linux/FreeBSD/BeOS/QNX.
Still no MASM version, though .. I might make one later :-)
Still no MASM version, though .. I might make one later :-)
Very nice, Jibz... :)
Thanks for sharing...
Does this beat apLib ?
PS: Do you have any news about the delta format we talked about there is some week ? :cool:
Regards,
Thanks for sharing...
Does this beat apLib ?
PS: Do you have any news about the delta format we talked about there is some week ? :cool:
Regards,
Hi Readiosys!
It beats aPLib on compression speed, but the ratios are not as good. Below is the table again, with the addition of the results for aPLib v0.36, and my current development code (ffce) at level 1.
I am still working on the delta code. I think it's going to be quite good when I get a little more work done on it .. I'll e-mail you when I've got something more solid :-)
calgary corpus:
lz77: 7.8 sec -> 2,251,910
lzop: 1.0 sec -> 1,582,455
uflz: 1.5 sec -> 1,296,357
apl : 137.5 sec -> 1,115,349
ffce: 10.8 sec -> 1,087,644
canterbury corpus:
lz77: 5.0 sec -> 1,402,493
lzop: 0.9 sec -> 1,151,925
uflz: 1.2 sec -> 959,429
apl : 111.3 sec -> 774,661
ffce: 7.2 sec -> 763,029
gcc-2.7.1 source code:
lz77: 58.5 sec -> 17,995,196
lzop: 4.0 sec -> 11,040,001
uflz: 7.8 sec -> 8,203,678
apl : 797.5 sec -> 7,212,418
ffce: 63.7 sec -> 6,253,056
netscape.exe (from ACT):
lz77: 6.9 sec -> 2,114,336
lzop: 1.1 sec -> 1,801,764
uflz: 1.7 sec -> 1,639,410
apl : 89.2 sec -> 1,351,048
ffce: 11.0 sec -> 1,346,094
It beats aPLib on compression speed, but the ratios are not as good. Below is the table again, with the addition of the results for aPLib v0.36, and my current development code (ffce) at level 1.
I am still working on the delta code. I think it's going to be quite good when I get a little more work done on it .. I'll e-mail you when I've got something more solid :-)
calgary corpus:
lz77: 7.8 sec -> 2,251,910
lzop: 1.0 sec -> 1,582,455
uflz: 1.5 sec -> 1,296,357
apl : 137.5 sec -> 1,115,349
ffce: 10.8 sec -> 1,087,644
canterbury corpus:
lz77: 5.0 sec -> 1,402,493
lzop: 0.9 sec -> 1,151,925
uflz: 1.2 sec -> 959,429
apl : 111.3 sec -> 774,661
ffce: 7.2 sec -> 763,029
gcc-2.7.1 source code:
lz77: 58.5 sec -> 17,995,196
lzop: 4.0 sec -> 11,040,001
uflz: 7.8 sec -> 8,203,678
apl : 797.5 sec -> 7,212,418
ffce: 63.7 sec -> 6,253,056
netscape.exe (from ACT):
lz77: 6.9 sec -> 2,114,336
lzop: 1.1 sec -> 1,801,764
uflz: 1.7 sec -> 1,639,410
apl : 89.2 sec -> 1,351,048
ffce: 11.0 sec -> 1,346,094
What are the compressors in that list?
comrade:
lz77 - comrade lz77 implementation
lzop - Markus FXJ Oberhumer lzop file compressor
apl - Jibz aPLib
ffce - Jibz new unreleased supa-duppa compressor :)
uflz - again Jibz small compressor (this thread creation reason)
...
lz77 - comrade lz77 implementation
lzop - Markus FXJ Oberhumer lzop file compressor
apl - Jibz aPLib
ffce - Jibz new unreleased supa-duppa compressor :)
uflz - again Jibz small compressor (this thread creation reason)
...
I'm down to 92 with small change to compression algo (FASM code):
uflz_depack_asm_tiny:
pushad
mov ebp, esp
mov esi, [ebp + (8+1)*4] ; source
mov edi, [ebp + (8+2)*4] ; dest
mov ebx, [ebp + (8+3)*4] ; length
; cld ; not needed on my system
xor eax, eax
add ebx, edi
.literal:
movsb
.nexttag:
cmp edi, ebx
jnc done
call word getbit
jnc .literal
call word getgamma ; high pos
lea edx, [ecx-2]
call word getgamma ; len
shl edx, 8
mov dl, [esi] ; low pos
inc esi
not edx ;= inc edx | neg edx
push esi
lea esi, [edi + edx]
rep movsb
movsb
movsb
pop esi
jmp .nexttag
getbit: add eax, eax
jne .A
lodsd
add eax, eax
inc eax
.A: ret
getgamma:
xor ecx, ecx
inc ecx
.A: call word getbit
adc ecx, ecx
call word getbit
jc .A
ret
done:
sub edi, [ebp + (8+2)*4]
mov [ebp + (8+3)*4], edi
popad
ret 8
; Usage:
; push length
; push destination
; push source
; call uflz_depack_asm_tiny
;
; pop unpacked_length
; Changes to compression:
; - reverse order of matchpos, matchlen store
; (store matchlen, then matchpos)
I haven't had time to test this, yet.Thanks for the nice optimisations, bitRAKE! I especially liked the way you got rid of the 0x8000000 :-)
I was not able to get the 'call word label' trick to work -- the program crashed (Win32).
Attached is my latest version, compression = 706/342 bytes, and decompression = 146/99 bytes. I removed the uninitialised data, and use stack variables and a user supplied workmem instead.
I was not able to get the 'call word label' trick to work -- the program crashed (Win32).
Attached is my latest version, compression = 706/342 bytes, and decompression = 146/99 bytes. I removed the uninitialised data, and use stack variables and a user supplied workmem instead.
I was playing around with converting the code to MASM style, from which I learnt 2 things:
takes 1 minute. Assembling to omf format does not have this problem.
Attached is the latest version, compression = 646/328 bytes, and decompression = 146/99 bytes. I added a conversion to 16-bit 8086 assembler (which I guess is of little interest to the Win32ASM community ;-).
[*]@@: style labels are not local to macros
[*]when assembling to coff format, ml.exe does something for each element in an uninitialised array, which means that assembling:
.386
.model flat,stdcall
.data?
buffer dd 4*65536 dup (?) ; 1 mb of workmem
end
takes 1 minute. Assembling to omf format does not have this problem.
Attached is the latest version, compression = 646/328 bytes, and decompression = 146/99 bytes. I added a conversion to 16-bit 8086 assembler (which I guess is of little interest to the Win32ASM community ;-).