|
-
Mako Shark
Huffman compression in C++ (Summer project)
Well, I decided to post progress that I make on my project ...
One or a few things to note is that all code is in C++ and that it is on paper, so I have no clue whether it will work at all, although I think it should.
In this thread, I will not post any code at all. ("It's MINE!!! ALL MINE!!!" - Daffy Duck ) I will however, post which parts of the projects I have completed and what they do ^^.
Ok, 1st off:
Wrote the "BitArray" to hold an array of bits (uses an array of chars, and stores 8 bits in each char). There will be a large array in the end holding the entire coded string which will be very easy to print to file and then read if needed (for decompression).
There will be an array of smaller ones for each char, so that when the tree is created, you only have to find a char to encode once in the tree. There will be an array of the smaller bit arrays.
2nd:
Wrote the node class for the huffman tree, hold pointers to left and right, current weight, byte value (character).
3rd:
outlined process for compression and decompression via flow charts.
4th:
wrote a sorting function utilizing linear sort, I know it is very inefficient. I might change it later to tree or heap sort. Most likely heap, since it would be faster.
To DO:
- remove_dead function to remove all 0 weights from the array of nodes.
- combine function to combine the first 2 elements in the array of nodes under a new parent node.
- Use a better sorting algorithm*
- give BitArray class a new method which accepts another BitArray and adds the value of the argument (the bit string) to itself.
- give BitArray stack/queue behavior (push, pop, remove_front) to add in the previous and for searching (since the tree traversal will be recursive, push and pop is needed).
- IMPORTANT!!! Re-read the K&R book on how to add CLI options/arguments 
That's it for now .
When I finish this project, if it doesn't give good compression (comparable to ZIP/RAR/ACE), I might release the source code, if it does good, then I won't for obvious reasons . (I dunno what they are )
EDIT: questions/comments/complaints/suggestions welcome .
EDIT2: the way I am going to have my util is it will only be able to handle a single file ... which is fine ... but for folders and stuff, I will write "wrapper" scripts (probably in perl, maybe another C/C++ program) ... the idea is that the compression program does just that and nothing else.
Last edited by slavik; 08-12-2005 at 11:01 PM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
Mako Shark
well, didn't do it over summer, but over the past weekend I have advanced in leaps and bounds 
I got it to the point where for encidng, it just has to output the encoded file in a smart way ... so far it can analyze the file it is given and tell you the file size, encoded file size, compression ratio in % (encoded size / original file size)*100, and how much space you can save ...
for file I/O I (will) use C stuff and for other stuff I will use C++ (since it has vector which is nice to work with and the sort algorithm which I don't have to code either).
Last edited by slavik; 10-31-2005 at 10:03 PM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
hi;
forgive my ignorance in all programming languages - and i'm interested in trying to follow your progress, but for those of us less enlightened, could you plz provide a brief synopsis of what/why you are doing this....and the benefits ?
Thanks - and hope it works out for ya (chi-ching )
Delete the Electoral College - Support
www.NationalPopularVote.com
"The world according to DRM Bozos"
I am a consumer, I'll buy anything
I am a sheep, I am cattle, I follow the herd
I am ignorant, a dumbass, and I am a bozo...
I am the epitome of the 'rank and file'
I am your next door neighbor
I am 95% of American Consumers
I will consume you
- If the light in your head hasn't come on yet,
I suggest you go get a new bulb!
-
Hammerhead Shark
 Originally Posted by slavik
When I finish this project, if it doesn't give good compression (comparable to ZIP/RAR/ACE), I might release the source code, if it does good, then I won't for obvious reasons  . (I dunno what they are  )
No offense, but I see that as kind of a backward attitude toward the issue. I mean, if it does work well, won't it be more useful to publish it so other people can improve on it? Do people really pay for general compression software anymore? There are already algorithms out there that are very widespread and very open that work very, very well. See 7zip, which is one of the most popular projects on SourceForge. Not to mention all the GNU utilities...I'm not trying to get down on you, I'm just trying to figure out the motivation behind that particular part.
Many existing compression schemes are built with Huffman encoding as part of them. What kind of improvements does your implementation offer (if you don't mind the question)? How are the results for your compression testing so far?
As far as why he would do it, I'd take the liberty of assuming it has something to do with the phrase "because he can". 
For references on Huffman, see here and here.
Core 2 Duo 6750
Antec P182
Abit IP35 Pro
4 GB DDR2 800 RAM
Asus Xonar D2
Panasonic SA-XR55 / Audio Technica ATH-A900
-
Mako Shark
actually, I changed that stance and will open the source after I am done with it ...
as for compression testing my program thinks it can compress a 3.3GB ISO DVD image to 123MB in 10 minutes ...
this week(end) I hope to write 2 more classes that will buffer the input and output.
the reason "why" I am doing this is to learn how to implement the huffman coding algorithm so then I have something "real" to show for my code ...
the bzip2 project has only 1 main author, too. and bzip2 is considered to be the best available ...
btw, winrar would take over 1 hour to compress the same DVD ISO image ... I wasn't willing to wait.
And my implementation so far has a limit of length being 2^32 bytes long, which is 4GB.
edit: after I am done with this, I plan on working to implement arithmetic coding (claimed to be slower but better compression than huffman).
edit2: here's a "log" of a run on a 3.3GB DVD ISO image
D:\Dev-Cpp\compression>huffman.exe F:\KNOPPIX_V4.0.2DVD-2005-09-23-EN.iso
Reading F:\KNOPPIX_V4.0.2DVD-2005-09-23-EN.iso
File size = 3344640000 bytes
Encoded size = 123414528 bytes
Compression ratio = 3.68992% ((encoded file size)/(original file size))*100%
Space Saved = 3221225472 bytes
the program is available at http://bcacm.org/~slavik/huffman.exe
it doesn't encode yet, since it lacks the file output functionality, but otherwise it's pretty much done for encoding ...
Last edited by slavik; 11-01-2005 at 10:04 AM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
Mako Shark
today, I hit a milestone 
I wrote the output part and the decode part ... although it decodes a file right after encoding, it was done this way (very dirty right now) to make sure that it can decode the files it encodes ...
and it decodes them properly ... 
to do:
-clean up the code
-make decode and encode separate functions
-allow encode/decode command from command line.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
bravo! bravo!....
what i find really interesting is the knoppix dvd as a test. The 3.3GB on that DVD usually has about 8GB + of data /apps since it's already compressed, it easily fits on a DVD-5 (4.7GB, actual 4.3). What am i misunderstanding ? -- and is this going to be for x86 ?
excellent work - keep on keepin' on
Last edited by I4one; 11-01-2005 at 10:02 PM.
Delete the Electoral College - Support
www.NationalPopularVote.com
"The world according to DRM Bozos"
I am a consumer, I'll buy anything
I am a sheep, I am cattle, I follow the herd
I am ignorant, a dumbass, and I am a bozo...
I am the epitome of the 'rank and file'
I am your next door neighbor
I am 95% of American Consumers
I will consume you
- If the light in your head hasn't come on yet,
I suggest you go get a new bulb!
-
Mako Shark
my program will be for anything that has a C++ compiler for it ...
since I compile with mingw (g++ port for windows) ... anything that g++ runs on will work 
this is an earlier version that can analyse the file and tell you how many bytes it would need for the compressed version ... of course, the header is not being included, which would be upto 1.5KB
http://bcacm.org/~slavik/huffman.exe
usage ...
the executable is for win32 only ...
to use, open up command promt and go to the directory where you have it, then enter the name "huffman.exe" without quotes and then a space and the file name or path to a file you want to test it on. if the filepath contains spaces put it in quotes ...
you can also specify the full path to the program ...
examples:
huffman.exe C:\foo\foo.bar
huffman.exe "C:\blah glah\someother.dat"
C:\huffman.exe "D:\My Documents\docfile.odt"
"C:\foo blah\huffman.exe" "D:\Your Documents\someother.doc"
Last edited by slavik; 11-02-2005 at 12:39 AM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
Richard M. Nixon '08
Very cool. Any chance you'll increase the maximum filesize, considering that most DVDs are just a bit over 2^32 bytes.
-
some results, when testing on a typical DVD on E: drive;
the 1st instance is on a DIR itself - 2nd the Drive itself - 3rd is a file within a folder
Code:
C:\WINDOWS\Desktop>huffman.exe E:\video_ts
Problem with the file
You either didn't specify the file pathname properly or
I do not have read permissions to it
C:\WINDOWS\Desktop>huffman.exe E:\
Problem with the file
You either didn't specify the file pathname properly or
I do not have read permissions to it
C:\WINDOWS\Desktop>huffman.exe E:\Video_TS\vts_01_1.vob
Reading E:\Video_TS\vts_01_1.vob
File size = 0 bytes
Encoded size = 0 bytes
Compression ratio = -1.#IND% ((encoded file size)/(original file size))*100%
Space Saved = 0 bytes
Last edited by I4one; 11-02-2005 at 08:05 AM.
Delete the Electoral College - Support
www.NationalPopularVote.com
"The world according to DRM Bozos"
I am a consumer, I'll buy anything
I am a sheep, I am cattle, I follow the herd
I am ignorant, a dumbass, and I am a bozo...
I am the epitome of the 'rank and file'
I am your next door neighbor
I am 95% of American Consumers
I will consume you
- If the light in your head hasn't come on yet,
I suggest you go get a new bulb!
-
Mako Shark
well, the program accepts only files (no directories at this point in time) ... and how big is the VOB?
that is an interesting problem ... I4one
As for increasing the size, it WILL be done later and backward compatability won't be broken ...
when the larger file size is supported though, the size of the header will increase ... in theory, I could make my own 5byte data type.
EDIT: If you want, go to http://bcacm.org/forums/index.php and register there ... and keep an eye on the "Huffman Project" ... that is where most of the updates will be.
Last edited by slavik; 11-02-2005 at 10:06 AM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
Richard M. Nixon '08
I tested this on a dumped GBA rom-image.
2199 - Gunstar Super Heroes (u)(Trashman).gba
8192kb
I got 84.97. I guess the compression gets better at higher file sizes.
-
 Originally Posted by slavik
well, the program accepts only files (no directories at this point in time) ... and how big is the VOB?
that is an interesting problem ... I4one
.vob tested = 1,048,574 KB - ~1 GB
but obviously it's a DVD, and Read Only (Archive attribute is set also - hmmm)...yet you're test of knoppix was very similar -- i will try some other files, and post results when i'm not drinking 
btw - this is all Fat32, and (CDFS) and UDF actually for the DVD;
http://i12.photobucket.com/albums/a2...asablanca2.png
Delete the Electoral College - Support
www.NationalPopularVote.com
"The world according to DRM Bozos"
I am a consumer, I'll buy anything
I am a sheep, I am cattle, I follow the herd
I am ignorant, a dumbass, and I am a bozo...
I am the epitome of the 'rank and file'
I am your next door neighbor
I am 95% of American Consumers
I will consume you
- If the light in your head hasn't come on yet,
I suggest you go get a new bulb!
-
Mako Shark
well, if you give it a file over 4GB, it won't complain, but the size display for the file will be wrong ...
Also, my tests had the file on the hard drive ...
btw, it is possible to give an input to my program that will not be compressable, only thing is that the usefulness of such a file to anyone else is rather low ... there is also a way to give my program an input of 2KB that it outputs a file of 8KB ... I didn't actually test it but calculated it on paper ... so far, it seems that there is a problem with my buffered output ...
edit: fixed ... forgot to tell it to open the file for binary output :P (bytes are interpreted as bytes, not characters)
edit2: this weekend, I hope to get something working that will encode a file with the proper header and decode it with a properly ...
edit3: still not fixed, I think ...
Last edited by slavik; 11-02-2005 at 06:54 PM.
Activation? What activation?
 Originally Posted by Geekkit (from ubuntu forums regarding 'goto' statement)
Yep it sure does. So does crack cocaine. Existence is not a valid endorsement for being acceptable.
 Originally Posted by Linus Torvalds
Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it 
-
here's some more results;
* 1st using a very small CDA file on a burnt CD
* 2nd using a d/l WinXP SP2 package (on local HDD)
Code:
C:\WINDOWS\Desktop>huffman.exe "E:\track01.cda
Reading E:\track01.cda
File size = 44 bytes
Encoded size = 21 bytes
Compression ratio = 47.7273% ((encoded file size)/(original file size))*100%
Space Saved = 23 bytes
C:\WINDOWS\Desktop>huffman.exe "D:\XP Install\WindowsXP-KB835935-SP2-ENU.exe"
Reading D:\XP Install\WindowsXP-KB835935-SP2-ENU.exe
File size = 278927592 bytes
Encoded size = 278927592 bytes
Compression ratio = 100% ((encoded file size)/(original file size))*100%
Space Saved = 0 bytes
Last edited by I4one; 11-03-2005 at 10:12 AM.
Delete the Electoral College - Support
www.NationalPopularVote.com
"The world according to DRM Bozos"
I am a consumer, I'll buy anything
I am a sheep, I am cattle, I follow the herd
I am ignorant, a dumbass, and I am a bozo...
I am the epitome of the 'rank and file'
I am your next door neighbor
I am 95% of American Consumers
I will consume you
- If the light in your head hasn't come on yet,
I suggest you go get a new bulb!
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|