CSC311-003 Data Structures |
Due: Monday, November 11, 1996 |
|
Fall 1996 |
11:59pm (midnight) |
|
| ||
Introduction
- In this project we will manipulate a non-trivial data structure in STL and C++. The data structure in question is a Huffman tree for data compression. A description of a Huffman tree can be found in chapter 22 of Sedgewick's Algorithms textbook.
- You will modify some given files, and provide a static Huffman encoder and decoder for files, similar to the pack and unpack utilities. The encoder will take a given file and produce a Huffman-code compressed version of it on the standard output. The decoder will take a given compressed file and produce its uncompressed version on the standard output. There are example executables, encode.sun and decode.sun among the files provided.
- The Huffman encoder is static in that it makes two passes over the input file, one to determine the number of occurrences of every character, and a second to generate the Huffman-encoded output. Therefore the output file is of the form:
Field Description Header The string " HuffEnCoded"CharCount Number of distinct characters in the file. Character Table A CharCount number of entries, each of the form UCHAR Count, whereUCHARis anunsigned charand Count is the number of its occurrence in the Huffman encoded data.Huffman Data A stream of binary Huffman-codes of variable length.
Algorithm of "encode"
In the huffnode.h, we have already given the definition of the class huffnode. Different leaf nodes represent different "chraraters", which is called key of the nodes. Since we don't know how many different charaters will be in the file, the container should be able to increase dynamiclly. After reading a charater from the file, we compare it with the key of the elements now saved in the container. If there is already an element whose key is equal to the character we have just read, we increase the counter of the element, otherwise we create a new element, set this its counter to 1 and add it into the container. Thus the container should be able to use a key to locate a element, returns its possion ( if it exists ). We will use container map ( see Chaper 7.2 and 19.8 of Musser's STL Textbook ) in this project. The declaration will be map<byte,huffnode*,less<byte> >. The member functions we are interested in are "insert()", "find()", etc.
What should be paid attention is that the container contains pointer to huffnode". we maintain only one copy of each node in the memory, all the other objects or functions use pointer to access them.
Algorithm of "decode"
The Source Files Provided
- There are five source files provided,
You are to modify the last four of these files. There are skeletons provided for all functions and the code is heavily commented, but some functions are left empty on purpose. In those functions there is a comment block starting with
encode.sundecode.sunbyte.hbitstream.hhuffnode.hencode.Cdecode.Chufftree.h//TODO:, and including a description of the required functionality of the function.
byte.h- This short file only contains a definition of type bype.
bitstream.h- This file contains two complete classes, ibitstream and obitstream, which provide bit-per-bit input and output.
huffnode.h- This file provides a near-complete definition of a Huffman-tree node, in the huffnode class. A Huffman-tree node can either be a leaf or an internal node. If it is a leaf it contains the
unsigned charit represents, the number of occurrences of that character in the input, and the bits of its corresponding Huffman-code, in the form of an STLvectorofbool. If it is an internal node it contains pointer to its left and right child. Eachhuffnodehas a_typewhich tells whether it is an internal node or a leaf. You are to modify two functions in this file:
- You are to modify the constructor
huffnode( huffnode* l, huffnode* r )so it correctly constructs a new Huffman-tree based on two subtrees.- You are to provide an
operator<<which outputs a Huffman-tree for debugging purposes.
encode.C- This file provides a definition of the Huffman-encoding process, The encoding process opens an input file, reads it over building a mapping to Huffman leaf-nodes as above, then constructs the Huffman tree and finally generates the binary Huffman data using the obitstream class.
decode.C- This file provides a definition of the Huffman-decoding process. The decoding of course only works on input files that are the output of the Huffman-encoder. The decoding process first checks the input file for the header "
HuffEnCoded", then reads the the number of characters, the characters themselves and their number of occurrences. It then builds an Huffman-tree just like the one in the encoding process, from the characters and their occurrences, and decodes the Huffman-code binary data in the rest of the file bit-by-bit using the ibitstream class.
hufftree.h- This file provides the definition of the class hufftree. This class contains the most important data structures and functions. It creates all the nodes, both internal and leaf, using a container map to organize all the leaf nodes. It builds the whole Huffman tree, give the Huffman code for every character. It provides some member funtions to read data from or write data to files. You have to spend most of you efforts on this file.
How to get these files
How to submit your programs
Grading Scheme
Weight logical units
65% encode
30% decode
5% Comments, Styles
encode will be graded by the following:
Weight Description
25% First pass over the file. Using STL class to save data
25% Built the Huffman Tree. Using STL class to save datas
15% Second pass over the file. Encode it
decode will be graded by the following:
Weight Description
10% Read the Charter Table correctly
20% Decode the file correctly
comments and styles will be graded by the following
3% Comments for functions, telling concisely what will be done in them.
Comments for important variables, telling what they are used for.
2% Clear logical structure of your program.
Clear definition of the classes used in your program.