CMSC-341 Data Structures Section 101 & 201
Fall 1998 Project 3
1. The Project
The purpose of this project is to provide you with experience coding binary search trees
in C++, the notion of inheritance and introduce you to file input in C++. In this project
you will read and analyze text from a designated file.
2. Description
In this project you will create a 'concordance' for a piece of English text to be read from a
text file. A concordance is a sorted list of words that occur in the text, along with a
frequency count (i.e. the number of times the each word appears in the text). A
concordance for the text, "How much wood would a woodchuck chuck, if a woodchuck
could chuck wood" is:
a 2
chuck 2
could 1
how 1
if 1
much 1
wood 2
woodchuck 2
would 1
Your program will read data from a text file that has been preprocessed to remove
everything expect space-delimited words, which may contain apostrophes. All words
should be converted to lower case. The textfile to be read and analyzed is
/afs/umbc.edu/users/d/e/dennis/pub/cs341/proj3/project3.txt.
Your program will then display a menu to the user with the following options:
- Quit
- Print the number of distinct words in the concordance
- Print the number of occurrences of a particular word. If the word does not
exist in the concordance, an appropriate message should be printed.
- Print all the words that appear more than a certain number of times,
alphabetically. If no words meet that criteria, an appropriate message should
be printed.
- Print all the words (alphabetically) in the concordance with their frequency, as
described above.
Continue to ask the user for input until the user quits.
3. Specific Tasks
You will create the following classes, putting their declaration in a header (.h) file
and their implementation in the appropriate .C file.
- A Concordance class -- this class contains the concordance data
tree and provides the operations required by the main program
- A ConcordanceData class -- this class contains the data (words
and counts) that are stored in the ConcordanceBinaryTree described below. You must
use the String class that is provided to represent the words stored in this class.
- A template Tree_Node and Binary_Search_Tree class. You
may use any of the code from class or from the text. There is no reason
to not use the Tree_Node class from the notes in class. You may use the
Binary_Search_Tree class from the notes or text "as is" or you may find it
helpful to modify some or all of the operations. The Binary_Search_Tree
MUST remain a template class. You only need to implement the
Binary_Search_Tree operations that are required for this project.
- A ConcordanceTree class which extends the Binary_Search_Tree
class and contains all concordance data.
- You must create your own makefile for this project. Use the makefile from project2 as
an example. Your class header files and implementation files may be called whatever
you like, but your main program file must be called project3.C. It must be possible to
build your project executable by simply typing in 'make'. Feel free to use shorter, but
meaningful names for your classes.
The code for the String class (String.h and String.C) is in the directory
/afs/umbc.edu/users/d/e/dennis/pub/cs341/proj3
4. Submitting and Grading
Submit your project in the usual way, using the submit program. Submit any and all files
necessary to successfully compile your program, including a makefile. In addition, you
must use the Unix script facility to capture output from your program which must
submitted in the file project3.scr. At the Unix prompt (assume it's $) type the following:
$ script
$ project3
your menu displays here
now, execute each menu option at least once -- be sure to select option 3 twice;
one time entering a word that you know is in the concordance and one time entering a
word that is NOT in the concordance. Also select option 4 multiple times, at least once
entering a frequency too high for any word in the concordance. When you have executed
all menu options at least once and quit, terminate the script session by issuing the
command
$ exit
all inputs and output between the script command and exit command are captured
in the file typescript. rename typescript to project3.scr and submit with your other
files.
This project will be graded 60% for correct execution and 40% for style, comments,
object design, structure, etc. No report is required. The usual 5-point bonus for early
submission and 5-point per day penalties for late submission apply. No project will be
accepted more than 48 hours late.
This project is due
Sunday, November 15, 1998 for section 101
Tuesday, November 17, 1998 for section 201
Special Bonus Clause -- a 10 point bonus will be awarded for any project that properly
implements the BST in a balanced fashion (e.g. as a Splay, AVL or Red-Black tree). DO
NOT attempt to implement a balanced tree until after you have successfully implemented
the basic BST.
5. Hints
- General hints and design tips will be given in class.
- How to read data from a file
Suppose you have an ifstream instance named datafile and a character array
named inbuffer. Try the following code
ifstream datafile; // the input file
char inbuffer[120]; // read lines into this buffer
datafile.open( );
datafile.getline (inbuffer, 120); // read a line from the file
while (int (inbuffer[0])) // is first char not NULL??
{
datafile.getline(inbuffer, 120); // get the next line
}
datafile.close ();
- OK, I have a line from the file, how do I get the individual words?
Look up the man page for strtok
(char* strtok (char* s1, const char* s2))
You use it like this
char *s;
s = strtok (inbuffer, " "); // first string in line, up to space
while ( s )
{
< do something with s>
s = strtok (NULL, " "); // get next string
}
6. Bigger Hint
If you make the effort, you'll find that there are easier ways to do the input.