CMSC-341 Fall 2002
Project 3
Assigned |
09 Oct 2002 |
|
Due |
23 Oct 2002 by 11:59PM |
|
Updates |
| 14 Oct 2002 The constructors in the author's BST code
have been modified. The order of the items in the initialization list has
been reversed. The original order was causing warnings on some version of
the compiler. | |
|
09 Oct 2002 The project description says that all
ConcordanceEntry methods must be private. That is not correct. You need
to be able to create a ConcordanceEntry outside of Concordance code so that
you can pass it in to the Concordance constructor. Make the methods
described in this document for ConcordanceEntrys public. |
|
|
09 Oct 2002 You must count blank lines when computing line
numbers. For example, if a text file starts with two blank lines and then
has the text "Four score and seven years ago", the word "four" occurs on
line 3. The text on which the sample output is based had some blank lines
that have been removed so that the sample output conforms to this
clarification. |
|
Background
This project will give you experience working with Binary Search Trees and
tree iterators in the context of implementing a Concordance ADT.
The Merriam-Webster online dictionary says that a concordance is "an
alphabetical index of the principal words in a book or the works of an
author with their immediate contexts". To build a concordance, your
program will read the words in data file, store them in a BST, and record
for each word the line number(s) on which it occurs. Iterators will be
used to walk through and print out ranges of words in the concordance.
Note: A word is defined to be any string of characters delimited
by white space or punctuation. For this project we will consider only the
following to be punctuation - period, question mark, exclamation point,
comma, semi-colon. For example, if the data file contains "dog.cat", the
concordance should contain "dog" and "cat", but not "dog.cat". Also,
convert all words to either upper or lower case before adding them to the
concordance.
Because C++ allows us to define ADTs, via templates, that can hold any
type of object, we can implement a more general kind of concordance that is
an ordered index of objects along with information about where they occur
in a dataset. It is this more general definition of a concordance that you
will implement, though you will instantiate it to build a concordance of
the kind described in the dictionary entry above.
Description
Your concordance implementation will require you to write four new classes -
ConcordanceEntry, Concordance, BinarySearchTreeItr,
and ConcordanceItr. The Concordance will store ConcordanceEntrys in
a BST so that they can be accessed quickly in sorted order, and each
ConcordanceEntry will contain a linked list of the locations (lines
numbers) of the corresponding object (word) in the dataset (text file).
Concordance iterators will be used to walk through concordance entries, and
they will make use of BST iterators.
In this project, as with the previous projects, you will read commands
from a file. There will be commands for loading a text file and building a
concordance and for printing the contents of the concordance in various
ways.
Here are your tasks:
- Copy the linked list files (LinkedList.C, LinkedList.H, and
dsexceptions.H) from Dr. Oates' public directory
/afs/umbc.edu/users/o/a/oates/pub/CMSC341
to your directory.
- Copy the BST files (BinarySearchTree.C and BinarySearchTree.H) from
Dr. Oates' public directory
/afs/umbc.edu/users/o/a/oates/pub/CMSC341
to your directory.
The BST code uses dsexceptions.H, but you will have already copied it with
the linked list code.
- Implement the ConcordanceEntry class.
- Implement the Concordance class.
- Implement the BinarySearchTreeItr class using the example
implementations of tree iterators given in the lecture notes as a starting
point (though it's OK to come up with a completely new implementation if
you want to).
- Implement the ConcordanceItr class. It will create and manipulate
BinarySearchTreeItrs.
- Design and implement Proj3.C which contains main( ) and is the driver
program for this project.
- Implement a makefile for this project.
- Answer the questions posed in 341-Fall02-proj3_questions.txt. Copy the
file
/afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj3/341-Fall02-p3_questions.txt
to your own directory and edit it to provide your answers to the questions.
Don't forget to submit the edited file; it is 10% of this project's grade.
Definition of the ADT
All classes that you design and implement must support the "Big-4", even if
the default behavior provided by the compiler would be acceptable.
- Default constructor
- Destructor
- Copy Constructor
- Assignment Operator
As always, you are free to add whatever private data and methods you
desire, and you must provide the methods listed in this project
description. If you need to add any additional public methods, think very
carefully about it and provide ample justification for why the method must
be public in comments in your code. Points will be deducted for public
methods that are not well justified.
The ConcordanceEntry
Each concordance entry will contain an object and a list of the places that
it occurs. Because this project will require you to build concordances for
text documents in which you maintain a list of the lines on which each word
occurs, your code will declare concordance entries as follows:
- ConcordanceEntry<string, int> ce;
Note that this implies you must define the Concordance and ConcordanceEntry
classes to accept two distinct template parameters - the type of the object
and the type of the information stored about occurrences of the object.
You must store occurrence information in a singly linked list which is
a private data member in the ConcordanceEntry class. For this project,
the list will contain the line numbers on which the word occurs in the data
file.
You will need to implement at least the following methods:
- A zero argument constructor, and a one argument constructor that takes
an argument of type Object.
- bool operator<(const ConcordanceEntry<Object, Location> &
rhs) const - This must compare the strings stored in the entry so that
they are ordered alphabetically in the concordance.
- ostream & operator<<(ostream & out, const
ConcordanceEntry<Object, Location> & rhs) - Print the object and a
list of the locations at which it occurs. Use the print idiom given on
page 35 of the text. Note that the list of locations must be ordered from
smallest to largest when printed.
- void addLocation(const Location & loc) - Add a new location to
the list of locations. If the location already exists in the list, do not
add it and do not report an error or throw an exception.
The Concordance
The Concordance class stores ConcordanceEntrys in a BST so that they can
be accessed quickly in sorted order. You must implement the following
methods:
- ostream & operator<<(ostream & out, const
Concordance<Object, Location> & rhs) - Print all of the concordance
entries in sorted order. Use the print idiom given on page 35 of the text.
- void addLocation(const Object & obj, const Location & loc) -
If there is not a ConcordanceEntry in the Concordance for the object,
create one. In either case, add the location to the ConcordanceEntry as
described above.
- Concordance(const ConcordanceEntry<Object, Location> & notFound
) - When creating a BST you must pass in an item that is guaranteed to
never be inserted into the tree. This item is used as the return value
when searching the tree for an object that it does not contain. Because
the Concordance contains a BST of ConcordanceEntrys, when you create the
BST you must do so by passing a ConcordanceEntry to the BST constructor
that is guaranteed to never be inserted into the tree. That is the purpose
of the argument to the constructor. For this project, the string
"***DOES_NOT_EXIST***" will never occur in any of the text files used for
testing your concordance code. Note: You will need to store the
notFound concordance entry as a private data member in the concordance.
Note: To avoid problems with getting the value of notFound into the
BST, it is best to initialize the concordance's copy of notFound and the
BST in the initializer list of the constructor.
Your concordance must be able to supply three different types of
iterators. They are as follows:
- ConcordanceItr<Object, Location> first() - Returns an
iterator pointing to the first entry in the concordance. Is past end if
the concordance is empty.
- ConcordanceItr<Object, Location> entry(const Object & obj) -
Returns an iterator pointing to the entry in the concordance containing the
specified object. Is past end if the object does not exist in the
concordance.
- ConcordanceItr<Object, Location> range(const Object & first,
const Object & last) - Returns an iterator pointing to the first object
in the concordance that is greater than or equal to first in the
order induced by operator< defined for ConcordanceEntrys. Is
past end when it is advanced to an entry that is greater than last
or when it is advanced beyond the last entry in the concordance. If the
concordance contains no entries between first and last the
iterator is past end when created.
The BinarySearchTreeItr
You must implement a binary search tree iterator that, when advanced,
performs an inorder traversal of the tree. Example implementations are in
the lecture notes, and you are free to use the code therein. You may also
write an implementation from scratch if you want to. Be sure to put the
iterator interface and implementation in their own .C and .H files,
respectively. That is, do not put them in BinarySearchTree.C and
BinarySearchTree.H. You may, however, modify these files as needed to
accommodate the iterator, e.g. by adding friend declarations or methods for
getting iterators from a BST.
The BST iterator must support at least the following methods:
isPastEnd, advance, and retrieve. Their behavior
should be analogous to the behavior of the corresponding ListItr
methods.
You must add a first method to the BST class that returns an
iterator pointing to the first element in the BST in an inorder traversal.
The ConcordanceItr
The ConcordanceItr class will create and manipulate BST iterators to walk
through the elements of a concordance. You must implement at least the
following methods: isPastEnd, advance, and retrieve.
Their behavior should be analogous to the behavior of the corresponding
BinarySearchTreeItr methods.
Note that you will need additional private data members to support
iteration over a range of values (as described above for the Concordance
class). All iterator functionality should be implemented within this one
class. That is, there should not, for example, be distinct classes or
advance methods to handle range iterators.
The Command Line
Project 3 will be invoked with a command line that consists of a single
argument -- the name of the command file to
read. As usual, you should check the validity of all command line
arguments.
The Command File
Commands in the file specify operations to be performed with concordances.
Each line in the file represents one command. Blank lines may appear
anywhere in the file and should be ignored. Otherwise, you can assume that
any line containing a command is syntactically well-formed. We make this
assumption so you don't have to spend lots of time making the command file
parser bullet proof.
The command file format follows:
- LOAD <file name> -- Create a new concordance based on the text in
the specified file. There may be multiple LOAD commands in a file. If
there are, any subsequent commands are meant to operate on the concordance
built for the last LOADed file. Be sure to destroy the current concordance
before LOADing a new file so there are no memory leaks.
- PRINT -- Print the entire contents of the concordance.
- PRINTENTRY <word> -- Print the concordance entry for a specific
word.
- PRINTRANGE <first_word> <last_word> - Print all entries in the
concordance that fall alphabetically between first_word and last_word.
Note that it need not be the case that either of these words exist in the
concordance.
Sample Output
Sample output is available for your study at
/afs/umbc.edu/users/o/a/oates/pub/CMSC341/Proj1/341-Fall02-p3-sample_output.txt
The datafile loaded in the sample output contains the following text:
Tomorrow, and tomorrow, and tomorrow
Creeps in this petty pace from day to day
To the last syllable of recorded time
Project Submission
DO submit these files
Because you are submitting your own makefile, you are free to name your
files whatever you like, with the following restrictions:
- All class definitions appear in .H files. Header files may NOT contain
any code.
- All class implementations appear in .C files.
- The name of your executable MUST BE Proj3 (upper-case 'P'). The
submitrun program and the scripts that we use to grade your files assume
that you follow this convention for naming your executable.
DO NOT submit these files
Do not submit any of the linked list or BST code UNLESS YOU MAKE
MODIFICATIONS to it. If you use the code in its original form, include it
from Dr. Oates' directory via your makefile. If you make changes to the
linked list or BST code, submit your changes and have your makefile compile
with your versions.
Submit Tools
There are a number of tools available to you to check on
your submittal. It is your responsibility to ensure that the submittal is
correct and will result in a successful compilation of your project. Do not
wait till the last minute to submit your files. Give yourself enough time
to check that your submittal is correct.
If you don't submit a project correctly, you will not get credit for
it. Why throw away all that hard work you did to write the project? Check
your submittals. Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/.
One of the tools provided by the submit program is submitls. It
lists the names of the files you have submitted.
Additionally, there are two programs for use only by CMSC-341 students
(not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are
named submitmake and submitrun. You can use
these programs to make or run your submitted projects.
The syntax is similar to that for submit:
submitmake <class> <project>
Example: submitmake cs341 Proj2
This makes the project, and shows you the report from the make utility.
It cleans up the directory after making the project (removes .o files), but
leaves the executable in place.
submitrun <class> <project> [command-line args]
Example: submitrun cs341 Proj2 test.dat
This runs the project, assuming there is an executable (i.e. submitmake
was run successfully).
Grading and Academic Integrity
Your project will be tested using a variety of command lines, some of which
will be invalid.
Your project will also be tested using a variety of command files which
will test various conditions which your code should handle.
Project grading is described in the Project
Policy handout.
Your answers to 341-Fall02-proj2_questions.txt
are worth 10% of your project grade.
Cheating in any form will not be tolerated. Please re-read the Project
Policy handout for further details on honesty in doing projects for
this course.
Remember, the due date is firm. Submittals made after midnight of the
due date will not be accepted. Do not submit any files after that time.