I've recently decided to revisit a topic from school - Huffman compression - with my eye on writing a PNG decoder/encoder. Yes, I'm aware that Thomas has already written a decoder (http://www.asmcommunity.net/board/index.php?topic=4183.0), but I'd rather do it myself.

I have no problem understanding the process of creating the binary tree and using weighted values to create a variable length bit string for each symbol. The problem I have is how to store that tree in the compressed file. Every source I have only goes into creating the tree and using it to compress/decompress. Is there a general way to represent a Huffman tree in the file?

The ideas I've had are:
- Store the symbol along with its frequency of use and then recreate the tree during decompression.
- Store the symbol along with its bit string. I have no idea how this would even work.
- Store a binary representation of the tree with symbols and pointers to the nodes.

I hope someone can shed some light on this or point me to a document that actually goes into the detailed implementation.

Posted on 2010-01-12 09:09:50 by Sparafusile

You may consult RFC 1951 for explanation of Huffman tree storage used in Deflate. Bzip2's wiki appears to contain enough information to understand delta encoding of canonical Huffman codes used in it. Google rules! ;-)
Posted on 2010-01-12 13:42:34 by baldr