Hello all,
I have a question about decoding streams of Huffman codes. I want to know what is the best way to go about doing it (specifically size optimization).

My initial attempt works something like this:

I'll use the example from RFC 1951 for convenience sake (b/c I'm interested in the deflate algorithm). To the left, a simple binary tree, to the right the alphabet and corresponding Huffman codes.

/\ Symbol Code
0 1 ------ ----
/ \ A 00
/\ B B 1
0 1 C 011
/ \ D 010
A /\
0 1
/ \

So the stream of codes for ABCD is: 001011010

To decode this stream I would first construct a table of packed DWORDs representing the nodes of the tree. The low-word representing going left down the tree and the high-word representing going right down the tree. Each word value either being the symbol or a pointer to another node (I use sign bit to distinguish between them). I'll use this notation for clarity:

So the table I would construct would be:

HuffmanTable: [^node1, B]

Decoding would work like this:
-Start with node 0 (i.e., the first DWORD in the table)
-Read a bit
-If bit was 0, load up the loword. Else the hiword
-If loaded value was a symbol, we're done extracting this Huffman code
-Else, go to the node indicated in the loaded value and jmp @B

Or, in assembly language:

DecodeHuffman PROC uses ebx esi ecx lpBitstream,lpHuffmanTable:DWORD
mov ebx,lpBitstream ;ebx=pointer to Bitstream base
mov esi,lpHuffmanTable ;esi=pointer to HuffmanTable
mov ecx,nBitIndex ;ecx is bit count into stream where we
;will begin decoding
xor eax,eax ;eax is going to be the node Init=0
bt [ebx],ecx ;test the bit if it was 0 or 1
adc eax,eax ;get 2*node index + bit value
inc ecx ;adjust ecx for next bit
movzx eax,WORD PTR [esi+2*eax] ;read word in HuffmanTable for this node
; note we've already double eax above
test eax,8000h ;I use sign bit to signal symbol
jz @B ;If not a symbol, it's a node index. jmp
mov nBitIndex,ecx ;Save our place in the bitstream
and eax,7FFFh ;Only want bottom 15 bits for symbol
DecodeHuffman endp

So any comments or criticisms are appreciated. Is this a bad way of going about it? I kinda like it, because it should be reasonably extensible (any alphabet involving 15 bits or less will work) and it seems pretty compact (especially if you keep ebx,esi,ecx loaded with appropriate values in a loop). But if I'm completely off base let me know.

Thanks in advance

Posted on 2003-06-14 19:33:09 by chorus
Here are my thoughts on huffman from a while ago:

I implemented it in my PNG decoder.

Posted on 2003-06-15 04:06:11 by Thomas