-
CMSC 341: Project 2
Project 2
Assigned |
Sept. 25, 2000 |
Due |
Oct. 9, 2000 |
Background
The list is probably the most frequently-used data structure. Understanding
the list ADT is essential to almost everything that follows in this course. You
will have the opportunity to implement a list in Project 2, and to become familiar
with iterators. The application is to develop a concordance.
A concordance is an sorted list of symbols (in our instance, words) and the
locations (for us, lines) where they appear in a document (for us, a body of text).
Generally, concordances are generated for such things as extensively studied
works of literature and religion. For instance, you might use a concordance
to find all the places that Shakespeare uses the word 'arrow' or that 'fish'
are mentioned in the Bible.
The user interface for your digital concordance will include functions such as:
- add a reference to a symbol in the concordance
- return a symbol and the associated line numbers
- print the concordance to an arbitrary output stream
Objectives
This project will deepen your skills in coding with templates. You will
- work with the listADT and extend it by creating a data type, the sorted list
- work with iterators
- practice designing robust classes
In addition, you will continue to practice using the string data class.
What to do
You will implement three classes: the first class is a ConcordanceEntry class, which
defines one type of entry in the concordance (you could theoretically make
a concordance to index things other than words, but we won't do that now).
A ConcordanceEntry consists
of a word and a sorted list of line numbers on which the word appears in a body of text. The
class ConcordanceEntry needs to be Comparable. The ordering
relation should be on data member that holds the word.
The second class is the SortedList.
The third class is the Concordance, which is a type of SortedList of entries. For this
assignment, the concordance entries will be ConcordanceEntry.
You'll
start with the list ADT as described in Chapter 3 to implement these classes, and the
data structure you'll use is the singly-linked implementation of lists (not an array or vector
implementation).
The main function
Your main function should do these tasks:
- read the command line to obtain the name of the text file for which
it will build the concordance
- build the concordance
- display the concordance on cout. The display should be in word order.
- for each word, on how many lines does it appear?
- for the most frequently appearing word, print all lines on which it appears. This
is called "word in context." Exclude the following overly common words: the, a, an,
and, or.
The SortedList class
The sorted list is first of all, a list. It has the added property that each element in
the list compares < its successor. The class interface is identical to the one
presented in the book, except for "insert" which should first find the appropriate
sorted-order place for the item and then place it there.
The header file for this class may be found in the course directory
for project 2 (/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2).
The Concordance class
The concordance is a sorted list of entries.
Like other sorted lists, it has the property that each element in
the list compares < its successor.
The header file for this class may be found in the course directory
for project 2 (/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/Concordance.H).
You may add methods and data to this class, but may not change or remove what is
there . Most methods of the Concordance class are standard to the lists we've
seen.
The ConcordanceEntry class
For this assignment, each concordance entry is a ConcordanceEntry and has two data members: a string, which is the word
being stored in the concordance, and a list of line numbers. A "word" is a bunch of
characters. Words may contain letters, numbers, and the symbols "-" (hyphen) and
"'" (apostrophe). All other characters should be considered to be delimiters.
Lines containing only delimiters (and no valid characters) should be considered to
be empty. Empty lines are counted, but do not add entries to the concordance.
You need to decide on some standard way
of dealing with capitalization: the word "tomorrow", in the following quote
Tomorrow, and tomorrow, and tomorrow
Creeps in this petty pace from day to day
To the last syllable of recorded time;
needs to be included in the concordance only once, even though on occurrence is capitalized
and others are not.
Also, the concordance should have only one occurrence of a line number for a word, even if
that word appears more than once in a line, as do "tomorrow" and "and".
Design and implement your own ConcordanceEntry class.
Grading and Academic Integrity
Project grading is described in the
Project Policy handout.
Please remember that you must provide good documentation as described in the handout.
Cheating in any form will not be tolerated. Please re-read the
Project Policy handout for further details on honesty in doing
projects for this course.
Remember, the due date is firm. Submittals made after midnight of
the due date will not be accepted. Do not submit any files after that
time.
Files To Be Submitted
You should submit only the files you have written, and a makefile.
The files to be
submitted are:
Please do not submit any of the files provided to you such as
LinkedList.h, etc.
Submit the files using the procedure given to you for your section
of the course.
Sample Output
Sample output is available for your study at
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/sample_output.txt
It was obtained by executing
Proj2
on umbc9 with a command-line argument
of fox_frag.txt.
A copy of the executable is available for you to try out. It is
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/Proj2
Other sample input files can be found in
/afs/umbc.edu/users/r/h/rheingan/pub/CMSC341/Proj2/*.txt