FINAL UPDATE:This file compressor/decompressor is now a working program. Feel free to use or change it as need. Just be sure to tell me so I can see what you did.

Posted on 2009-11-06 23:08:28 by leftovas17
I updated this in the original post. In case you were wondering... anyone...
Posted on 2009-11-10 05:24:48 by leftovas17
You got it right, almost.
1) In compressFile you can jump right to mov ecx, currentPrefix (no need to update nextCodeWord because it wasn't changed).
2) In decompressFile you need to check for dictionary overflow before any access do it (mov prefix, ebx is an example).

The program is quite flexible regarding dictionary/codeword sizes, at least while you keep them in sync:
a) compressFile/decompressFile should not add codewords past dictionary end (nextCodeWord <= LENGTHOF prefix, which is 8192 now but can be any value >=256), and
b) codeword should be long enough to hold all possible values (13 for 0...8191, you are right; 16 was used to simplify debugging: compressed stream is much more readable that way).

Will you try to modify it for variable-length codewords?

By the way, it still lacks meaningful comments (will you be able to explain how it works, say, week later? ;-). Reduce the clutter by removing unnecessary code/data too.
Posted on 2009-11-10 05:58:30 by baldr
yea, i figured that is where i needed to compare in decompressFile.. my only problem is knowing where to jump to if its equal.

deCompressFile Proc
push edi
push ebx
push esi
mov esi,0
call initModel
mov nBits, 16
call readBits
mov oldFileSize, eax
mov eax, codeWordLength
mov nBits, eax

call readBits ;calls the readBits proc
cmp edx,0 ;checks to see if anything was read
jz done ;if not: jump to done label
mov esi, nextCodeWord
cmp currentPrefix,-1
je here
mov ebx, currentPrefix
cmp      nextCodeWord,8192
        je            to where?
mov prefix, ebx
mov currentPrefix,eax
cmp eax, esi
jne here2
mov dl,suffix
mov suffix,dl
inc nextCodeWord
mov ecx,eax
mov currentPrefix,eax
call outputString
mov suffix,dl
sub oldFileSize,eax
jne RepeatMe
pop edi
pop ebx
pop esi
call flushBits

deCompressFile ENDP
Posted on 2009-11-10 17:21:28 by leftovas17
That's easy part: cmp currentPrefix, -1 checks whether we about to add already existing codeword (for literal byte, it's (-1, byte) prefix/suffix). Check for overflow should occur before that, for example right after mov esi, nextCodeWord. Conditional jump should proceed to label here: because codeword is not added if any one of these checks (nextCodeWord < 8192 && currentPrefix != -1) fail.
Posted on 2009-11-10 18:36:09 by baldr
That is exactly what I thought. But it does not work sadly. It only outputs part of the original document. a 609KB document is only 33KB now. here is the decompressMe proc as i have it. and i have attached the source.

deCompressFile Proc
push edi ;preserve edi
push ebx ;preserve ebx
push esi ;preserve esi
mov esi,0 ;make esi 0
call initModel ;initialize the model
mov nBits, 16 ;move into nBits 16
call readBits ;read 16 bits
mov oldFileSize, eax ;put eax into oldFileSize
mov eax, codeWordLength ;put codeWordLength into eax
mov nBits, eax ;put codeWordLength into nBits

call readBits ;calls the readBits proc
cmp edx,0 ;checks to see if anything was read
jz done ;if not: jump to done label
mov esi, nextCodeWord ;put the value of the nextCodeWord into esi
cmp nextCodeWord, 8192 ;compare nextCodeWord to 8192
je here ;if above: skip adding new codeWord
cmp currentPrefix,-1 ;compare the currentPrefix to -1
je here ;if equal: jump the the here label
mov ebx, currentPrefix ;else: add currentPrefix to ebx
mov prefix, ebx ;put ebx into the prefix at esi*4
mov currentPrefix,eax ;move eax into currentPrefix
cmp eax, esi ;compare esi with eax
jne here2 ;if not equal: jump to here2
mov dl,suffix ;else: put the suffix at esi-1 into dl
mov suffix,dl ;and put dl into the suffix at esi
inc nextCodeWord ;increment the nextCodeWord
mov ecx,eax ;put eax into ecx
mov currentPrefix,eax ;put eax into currentPrefix
call outputString ;call outputString
mov suffix,dl ;move dl into suffix at esi
sub oldFileSize,eax ;sub eax from the oldFileSize
jne RepeatMe ;jump to repeatMe if oldFileSize isnt complete
Done: ;done label
pop edi ;preserve edi
pop ebx ;preserve ebx
pop esi ;preserve esi
call flushBits ;call flushBits to print any stragglers
ret ;return to user

deCompressFile ENDP
Posted on 2009-11-10 23:52:13 by leftovas17
From initModel: mov codeWordLength, 13
From decompressFile: mov nBits, 16

Do you see discrepancy here? compressFile writes 13-bit codewords, and decompressFile expects them to be 16-bit.

Probably readBits should be rewritten to take one parameter: bit count to read. Passing parameters through global variables is not a good practice. Then readBits could be called this way:

mov ecx, 8
call readBits; read one byte from (uncompressed) input stream
mov ecx, codeWordLength
call readBits; read one codeword from (compressed) input stream
Posted on 2009-11-11 03:52:39 by baldr
no, it makes nBits 16 only for the beginning of decompressFile in order to read the file size from the first 16bits that compressFile writes to the first 16bits. Then it puts codeWordLength into eax, and eax into nBits. So no discrepancy on that as far as I can tell.
Posted on 2009-11-11 12:44:56 by leftovas17
Agreed, I was too quick to answer. The problem was similar: you store uncompressed file size as 16-bit value, hence 65535 limit (and size is written as size%65536). Exit condition for decompressing loop will fail almost for every file (unless output string for codeword ends exactly at output file offset size%65536).
Posted on 2009-11-11 14:02:07 by baldr
Im not really sure what you mean. Would you mind trying to explain it in another way?
Posted on 2009-11-11 15:08:28 by leftovas17
Okay, so I went on a hunch of what I assumed you meant and changed:
sub oldFileSize,eax ;sub eax from the oldFileSize
jne RepeatMe ;jump to repeatMe if oldFileSize isnt complete

cmp eax,oldFileSize ;sub eax from the oldFileSize
jb RepeatMe ;jump to repeatMe if oldFileSize isnt complete

and i get the full file back when decompressed, but it still crashes. Getting closer though!
Posted on 2009-11-11 15:35:21 by leftovas17
I've changed mov edx, 16 to mov edx, 32 in compressFile and mov nBits, 16 to mov nBits, 32 in decompressFile. This variant successfully compressed 800 kB file and decompressed it.

Another way to detect end of compressed stream is to designate it with reserved codeword (256, for example).
Posted on 2009-11-11 16:28:00 by baldr
You read my mind. I was sitting in class today looking over a printout of the program and I saw that too. I couldn't figure out why size mattered, and then i thought maybe it was not eax that was wrong, but the oldFileSize... Changing it to 32 worked perfectly. Thank you for all of you help so far. 

As for variable-length codewords. I think that is too far out of my reach at this point. I have submitted this as my project, so anyone who wants to use this or add on to this can now. It will not be used as homework! If you do use/change it, just let me know so I can see what/how you did to it.

Thank You,
Posted on 2009-11-11 21:13:16 by leftovas17