Huffman compression in C++ (Summer project)
Well, I decided to post progress that I make on my project ...
One or a few things to note is that all code is in C++ and that it is on paper, so I have no clue whether it will work at all, although I think it should.
In this thread, I will not post any code at all. ("It's MINE!!! ALL MINE!!!" - Daffy Duck :p) I will however, post which parts of the projects I have completed and what they do ^^.
Ok, 1st off:
Wrote the "BitArray" to hold an array of bits (uses an array of chars, and stores 8 bits in each char). There will be a large array in the end holding the entire coded string which will be very easy to print to file and then read if needed (for decompression).
There will be an array of smaller ones for each char, so that when the tree is created, you only have to find a char to encode once in the tree. There will be an array of the smaller bit arrays.
2nd:
Wrote the node class for the huffman tree, hold pointers to left and right, current weight, byte value (character).
3rd:
outlined process for compression and decompression via flow charts.
4th:
wrote a sorting function utilizing linear sort, I know it is very inefficient. I might change it later to tree or heap sort. Most likely heap, since it would be faster.
To DO:
- remove_dead function to remove all 0 weights from the array of nodes.
- combine function to combine the first 2 elements in the array of nodes under a new parent node.
- Use a better sorting algorithm*
- give BitArray class a new method which accepts another BitArray and adds the value of the argument (the bit string) to itself.
- give BitArray stack/queue behavior (push, pop, remove_front) to add in the previous and for searching (since the tree traversal will be recursive, push and pop is needed).
- IMPORTANT!!! Re-read the K&R book on how to add CLI options/arguments :)
That's it for now :).
When I finish this project, if it doesn't give good compression (comparable to ZIP/RAR/ACE), I might release the source code, if it does good, then I won't for obvious reasons :). (I dunno what they are :p)
EDIT: questions/comments/complaints/suggestions welcome :).
EDIT2: the way I am going to have my util is it will only be able to handle a single file ... which is fine ... but for folders and stuff, I will write "wrapper" scripts (probably in perl, maybe another C/C++ program) ... the idea is that the compression program does just that and nothing else.