Hello all,

I have a question about decoding streams of Huffman codes. I want to know what is the best way to go about doing it (specifically size optimization).

My initial attempt works something like this:

I'll use the example from RFC 1951 for convenience sake (b/c I'm interested in the deflate algorithm). To the left, a simple binary tree, to the right the alphabet and corresponding Huffman codes.

So the stream of codes for ABCD is: 001011010

To decode this stream I would first construct a table of packed DWORDs representing the nodes of the tree. The low-word representing going left down the tree and the high-word representing going right down the tree. Each word value either being the symbol or a pointer to another node (I use sign bit to distinguish between them). I'll use this notation for clarity:

So the table I would construct would be:

HuffmanTable: [^node1, B]

Decoding would work like this:

-Start with node 0 (i.e., the first DWORD in the table)

@@:

-Read a bit

-If bit was 0, load up the loword. Else the hiword

-If loaded value was a symbol, we're done extracting this Huffman code

-Else, go to the node indicated in the loaded value and jmp @B

Or, in assembly language:

So any comments or criticisms are appreciated. Is this a bad way of going about it? I kinda like it, because it should be reasonably extensible (any alphabet involving 15 bits or less will work) and it seems pretty compact (especially if you keep ebx,esi,ecx loaded with appropriate values in a loop). But if I'm completely off base let me know.

Thanks in advance

--Chorus

I have a question about decoding streams of Huffman codes. I want to know what is the best way to go about doing it (specifically size optimization).

My initial attempt works something like this:

I'll use the example from RFC 1951 for convenience sake (b/c I'm interested in the deflate algorithm). To the left, a simple binary tree, to the right the alphabet and corresponding Huffman codes.

```
```

/\ Symbol Code

0 1 ------ ----

/ \ A 00

/\ B B 1

0 1 C 011

/ \ D 010

A /\

0 1

/ \

D C

So the stream of codes for ABCD is: 001011010

To decode this stream I would first construct a table of packed DWORDs representing the nodes of the tree. The low-word representing going left down the tree and the high-word representing going right down the tree. Each word value either being the symbol or a pointer to another node (I use sign bit to distinguish between them). I'll use this notation for clarity:

So the table I would construct would be:

HuffmanTable: [^node1, B]

Decoding would work like this:

-Start with node 0 (i.e., the first DWORD in the table)

@@:

-Read a bit

-If bit was 0, load up the loword. Else the hiword

-If loaded value was a symbol, we're done extracting this Huffman code

-Else, go to the node indicated in the loaded value and jmp @B

Or, in assembly language:

```
```

-------------------------------------------------------------------------------

DecodeHuffman PROC uses ebx esi ecx lpBitstream,lpHuffmanTable:DWORD

mov ebx,lpBitstream ;ebx=pointer to Bitstream base

mov esi,lpHuffmanTable ;esi=pointer to HuffmanTable

mov ecx,nBitIndex ;ecx is bit count into stream where we

;will begin decoding

xor eax,eax ;eax is going to be the node Init=0

@@:

bt [ebx],ecx ;test the bit if it was 0 or 1

adc eax,eax ;get 2*node index + bit value

inc ecx ;adjust ecx for next bit

movzx eax,WORD PTR [esi+2*eax] ;read word in HuffmanTable for this node

; note we've already double eax above

test eax,8000h ;I use sign bit to signal symbol

jz @B ;If not a symbol, it's a node index. jmp

mov nBitIndex,ecx ;Save our place in the bitstream

and eax,7FFFh ;Only want bottom 15 bits for symbol

ret

DecodeHuffman endp

-------------------------------------------------------------------------------

So any comments or criticisms are appreciated. Is this a bad way of going about it? I kinda like it, because it should be reasonably extensible (any alphabet involving 15 bits or less will work) and it seems pretty compact (especially if you keep ebx,esi,ecx loaded with appropriate values in a loop). But if I'm completely off base let me know.

Thanks in advance

--Chorus

Here are my thoughts on huffman from a while ago:

http://www.asmcommunity.net/board/index.php?topic=4183

I implemented it in my PNG decoder.

Thomas

http://www.asmcommunity.net/board/index.php?topic=4183

I implemented it in my PNG decoder.

Thomas