CMSC 341 - Fall 2001
Project 4
Assigned |
Wednesday, 07 November 2001 |
Due |
Monday, 26 November 2001 |
Modifications |
By popular demand, the due date for this project has
been moved to midnight Monday 11/2/6/01
Have a nice Thanksgiving |
Background
We have been studying different variations of binary search tress and investigating
their theoretical performance using asymptotic analysis. This project
will give you an opportunity to gather and analyze empirical data about
these trees and determine for yourself if the performance is really what
the theory says it should be. You will be modifying the author's
binary search tree code and his top-down splay tree code to add new functionality.
You will then exercise these trees and gather data about their performance
and shape. Finally, you will be asked to analyze your data.
You'll also get some practice with C++ file I/O and formatted output.
Description
In this project you will be reading random integers from a file, building
a BST and SPLAY tree with those values, then performing find( ) operations
for both trees. After all values from the file have been inserted,
your program will print the first few elements from the tree and report
the number of nodes inserted (N) and the value of lg(N). Your program
will then perform the following operations
-
find( ) for every 100th value in the file
-
find( ) for every 50th value in the file
-
find( ) for every 25th value in the file
After all insertions have been completed and after every 100 operations
thereafter, your program will report the following information in
a tabular format (see the sample output below).
DO NOT assume that the number of values in the file is a multiple of 100.
-
the cumulative number of operations performed so far(M)
-
the height of the tree
-
the internal path length (IPL) of the tree
-
the average node depth
-
the number of data comparisons performed so far (C)
-
the value of C / (M * lgN)
When all operations have been completed and reported print your tree again.
The Command Line
Your program will accept two command line arguments -- the name of the
data file and the number of nodes from the tree to print. If the
name of the data file is provided, you may assume the file is in the specified
data
file format.
Classes
Modify the author's BST class and top-down SplayTree class to provide the
following additional public methods
-
height( ) - returns the height of the tree (NOTE - because the height of
a tree with a single node is 0, the height of an empty tree is defined
as -1)
-
size ( ) -- returns the number of nodes in the tree
-
IPL ( ) - returns the IPL (internal path length) of the tree - the sum
of the depths of all the nodes. The IPL of an empty tree is 0.
-
compares ( ) - returns the total number of data comparisons performed since
tree construction
-
printTree (int n) -- prints the first n nodes of the tree in level-order.
Each node is printed with it's data value and level. The root is
level 0. If n is bigger than the number of nodes in the tree (unlikely
except for testing), print the entire tree.
Tasks
Here are your tasks:
-
Modify the author's BST code to support the new public methods described
above. Write whatever test programs you think are necessary to test
your BST.
-
Modify the author's Splay tree code to support the new public methods described
above. Write whatever test programs you think are necessary to test
your Splay tree.
-
Write main( ) and required auxiliary functions to read data from the file,
perform the required tree operations and accumulate and report the required
data.
-
Answer the project questions by copying the file /afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj4/341-Fall01-p4_questions.txt,
editing it to include your answers and submitting the edited file.
Task Notes
-
Printing the tree requires a level-order traversal of the tree which can
also be helpful in calculating the depth of each node. A level-order
traversal is performed using one or more queues. The author's array based
queue class is available in Mr. Frey's public directory
-
When elements are compared, the code often looks something like this
if (a > b)
do something
else if (a < b)
do something else
else // a == b
do some other thing
This counts as one comparison, regardless of which of the three
conditions is true.
-
Recall that insert( ) does nothing when a duplicate element is inserted
into the tree. Therefore even though the data files provided have
8K, 16K or 32K values, not all may be inserted.
-
The data should be saved while executing the required operations, then
reported in a format similar to that presented in the sample output.
In particular, data should be in right justified columns and where appropriate,
reported with four decimal places. There will be approximately 2600
operations performed on a file with 32K elements (not including the original
34K insertions).
-
In the description above, lgN means log of N, base 2. The C++ math library
provides the function "log10 (x)" which returns the log of x, base 10.
To convert from log10 x to log2 x, note the following
:
log2x = log10x / log102.
You will have to include <math.h> in your program and use the -lm
switch to tell the linker to include the math library.
Task Hints
This project will take a long time to write and test.
Here are some hints that should help you organize and understand your tasks
a bit better.
-
Use an incremental development approach
-
Modify the BST to support the additional public methods. Test the
new BST with a driver program.
-
Write main ( ) in Proj4.C so that it reads the data file, performs all
inserts and finds on the BST, collects and prints the data. Essentially
complete the project requirements as if you were only required to use a
BST.
-
Implement the splay tree and test it with a separate driver program.
For debugging, you'll find it helpful to print out the nodes in the tree
in level order. Try printing the element in each nodes parent, left
child and right child. You may want to print the depth too.
-
Integrate the splay tree into main( ). This should be straightforward
since you will just be duplicating what you did for the BST in step 2.
-
The algorithm for a level-order traversal can
be found at the bottom of this web page.
-
Test your code with small amounts of data first. You can create your
own data files by hand, or use the GenRandInts program from Mr. Frey's
directory.
-
Think recursively when writing the code for the height of the tree.
-
Calculating the internal path length of the tree (IPL) requires a level-order
traversal
Data Files
Four large data files are available for testing - Proj4.8K.dat, Proj4.16K.dat,
Proj4.32K.dat and Proj4.64K.dat Not surprisingly, these files contain
8K, 16K, 32K and 64K values respectively. The data files may be assumed
to contain whitespace separated values of type int. These values
are generated with rand( ) and lie between 0 and MAX_INT. There are 10
values per line. You will want to create your own smaller data files for
testing, but larger data files will be necessary to correctly analyze the
data. DO NOT assume that the number of values in the file is a multiple
of 100 or 1000. Your program should run successfully regardless of
the number of values in the file.
A program that generates random integers and writes them to a file is
available for your use. The program is GenRandInts and is
located in Mr. Frey's public directory /afs/umbc.edu/users/d/e/dennis/pub/CMSC341.
Sample Output
Sample output is located in Mr. Frey's public directory for this project
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj4 in
the file p4output. It was
generated from the test file Proj4.64K.dat also located in this project's
public directory. Twenty nodes were printed for each tree.
Your output must match the sample output in the following respects
-
Decimal values must have 4 decimal places
-
All values are right justified in their fields
-
Tree nodes must be printed 4 per line and in the specified format -- [
element, depth ]
-
Columns must have headings
Project Submission
DO submit these files
Since you are submitting your own makefile, you are free to name your files
whatever you like, with the following restrictions:
-
Use only .C and .H extensions
-
All class definitions must appear in .H files. Header files may not contain
any code unless that code was put there by the author.
-
All class implementations appear in .C files.
-
The name of your executable MUST BE Proj4 (upper-case 'P'). The
submitrun program and the scripts that we use to grade your files assume
that you follow this convention for naming your executable.
-
Your makefile. Your makefile must work properly! This
is the fourth project of the semester and you should have all the bugs
worked out of your makefile (i.e. with respect to using files from Mr.
Frey's public directory). If your makefile doesn't work in your submittal
directory, expect no sympathy.
-
The answers to the project questions.
DO NOT submit these files
Do not submit any files from Mr. Frey's public directory that you use without
modification.. These files should be included in your project through your
makefile. If found in your submittal directory, these files will
be deleted (for example, Queue.H and Queue.C)
Submit Tools
There are a number of tools available to you to check on your submittal.
It is your responsibility to ensure that the submittal is correct and will
result in a successful compilation of your project. Do not wait till the
last minute to submit your files. Give yourself enough time to check that
your submittal is correct.
If you don't submit a project correctly, you will not get credit
for it. Why throw away all that hard work you did to write the project?
Check your submittals. Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/.
One of the tools provided by the submit program is submitls. It
lists the names of the files you have submitted.
Additionally, there are two programs for use only by CMSC-341 students
(not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are
named submitmake and submitrun. You can
use these programs to make or run your submitted projects.
The syntax is similar to that for submit:
submitmake <class> <project>
Example: submitmake cs341 Proj4
This makes the project, and shows you the report from the make utility.
It cleans up the directory after making the project (removes .o files),
but leaves the executable in place.
submitrun <class> <project> [command-line-args]
Example: submitrun cs341 Proj4 test.dat 20
This runs the project, assuming there is an executable (i.e. submitmake
was run successfully).
Grading and Academic Integrity
Your project will be tested using a variety of command lines, some of which
will be invalid.
Your project will also be tested using a variety of command files which
will test various conditions which your code should handle.
Project grading is described in the Project
Policy handout.
Your answers to 341-Fall01-proj4_questions.txt
are worth 20% of your project grade.
Cheating in any form will not be tolerated. Please re-read the Project
Policy handout for further details on honesty in doing projects for
this course.
Remember, the due date is firm. Submittals made after midnight of the
due date will not be accepted. Do not submit any files after that time.
Level-Order Tree Traversal
A level-order traversal is one that visits the root first, then the root's
children (left to right), then the root's grandchildren (left to right)
and so on down the tree. This traversal is generally implemented
with a queue. You have to decide what to put on the queue that represents
the nodes. Here's the pseudo code
declare a queue to hold
the nodes
enqueue the root
while the queue is not
empty
dequeue a node
do something with the node (i.e."visit" the node)
enqueue the node's left child (if there is one)
enqueue the node's right child (if there is one)
Last modified on Tuesday, November 6, 2001 by Dennis Frey