CMSC 341
Project 5
Assigned
| Wednesday, May 2, 2007
|
Due
| Tuesday May 15, 2007 at 11:59pm
|
Description
A SkipList is a data structure which deals with sorted data in logarithmic
time. It's a less complex alternative to Red-Black trees.
In this project you will implement the
SkipList class and run
some tests to determine if the theoretical performance discussed in class
is realized in practice.
An important part of this project is for you to develop an
understanding of
SkipLists by experimenting with the
maximum node size and probability parameters of the list.
You will also use an auxiliary class known as a
Comparator to assist in capturing the
test data.
Mr. Frey's public directory for this project is
/afs/umbc.edu/users/f/r/frey/pub/CMSC341/Proj5
Details of the Project
Here are the tasks you must complete for this project:
-
Read and understand the SkipList material in the
lecture notes, and in
Dr. Pugh's paper.
-
Implement a suitable SkipList ADT.
You may start with the author's List class (provided in
Mr. Frey's public directory) or implement your own SkipList.
Pseudo code for insert
and find is provided in
lecture notes and in
Dr. Pugh's paper.
- Write a main function that produces the type of output shown in
341-Spring07-p5-sample_output.txt
in Mr. Frey's public directory for this project.
The main
function takes the following command-line arguments in this order:
- a seed value for the drand48 random number generator, a positive int
- the maximum node size to be allowed in the list, a positive int
- the probability associated with the list, a double between 0.0 and 1.0
- the number of insertions to do (i.e., the size of
the list), a positive int
- the name of a data file that contains a sufficient number of
unique random integers to insert, a string with no spaces
- the maximum number of elements to print, a positive int
Notes on the sample output:
- All floating point values should be output with 4 decimal places
- Since the "Expected" column under "Distribution of node sizes"
is printed as an integer, there is some truncation (especially in the higher
node sizes). Therefore the sum of
the "Expected" column is DOES NOT equal the number of inserts.
A code fragment is given
below that shows one way to insert
integers read from a file into your
SkipList and then reread the file
to find them.
- Run your program to collect lots of fascinating data on the effect
of probability, maximum node size, etc. on performance. Study
your data to see if it makes sense in terms of your understanding of the
theory.
- Answer the questions posed in the file
341-Spring07-proj5_questions.txt
in Mr. Frey's public directory.
Copy the file to your own directory and edit it to provide your
answers to the questions. Do not change the name of the file since the grading script
looks for it under that name. The questions count 20% of your project grade.
Definition of the SkipList ADT
The ADT for SkipList is similar to the
textbook's List ADT in Chapter 3.
Some differences are:
- Your SkipList may have some methods that are
not in List. Check the class notes
for these methods. You may implement any of these methods you feel
are necessary for this project.
- Since you must report the number of nodes of each size, you will
probably want some kind of accessor(s) to retrieve this information.
- The author's List ADT does not support the find( )
method.
- Since the SkipList
is sorted, many List methods
(e.g. push_font( ), push_back( ), etc) are not relevant for
SkipList.
Feel free to remove any List methods
which are not required.
- You will need a new (or modified) print function that prints
only the first N elements of the SkipList with their node sizes in
the format [ <value>, <size> ].
Some potential differences in the
implementation of the SkipList
and the implemenation of the List are
- For this project, the SkipList need not be doubly linked.
- Iterators are not required for this project
(i.e. main( ) can be implemented without them).
- Since you must keep track of the number of data comparisons
performed for
inserting and finding items in your
SkipList, a
Comparator object will be passed to your
SkipList
constructor. Be sure to store a reference to the comparator,
not a copy of it.
- In a SkipList, the tail node actually
contains meaningful data
(see Dr. Pugh's paper).
The data in the tail is an object that is "larger"
than any data that will be inserted into the SkipList.
This "max object" should be passed to your SkipList
constructor.
- Since the SkipList uses a random
number generator, the seed for the RNG should be passed to the
SkipList constructor.
- Objects stored in List need not support any comparison operators,
but objects stored in the SkipList must support operator<
Code fragment showing insertion of integers read
from a file followed by
find operations for each element of the
list. Alternatively (without iterators), you could
reread the file and perform find
on each integer as it's read. To reread a stream, you may close the
stream and then reopen the stream.
Comparator<int> comparator;
SkipList<int> sl(maxNodeSize, probability, INT_MAX, comparator, seed);
int val;
ifstream inFile( argv[5] );
// do the insertions
for (i = 0; i < nrInserts; i++)
{
inFile >> val;
sl.insert(val);
}
inFile.close();
// count the avg. number of comparisons made to build the list
int avgBuildCompares = static_cast<double>(compartor.GetCompares())/nrInserts;
// reset the number of comparisons, then re-open the file
comparator.Reset();
inFile.open( argv[5]);
for (i = 0; i < nrInserts; i++)
{
inFile >> val;
sl.Find(val);
}
inFile.close();
// count the avg. number of comparisons made doing the find ops
avgFindCompares = static_cast<double>(comparator.GetCompares())/nrInserts;
The Comparator ADT
As the name implies, a Comparator class is a class template
used to compare two objects of the same
type. The Comparator is also used to keep appropriate statistics. In this
project we will use the Comparator to count the number of times any
data comparison is made.
Using a comparator object is advantageous because
- It's reusable
- It relieves the SkipList from data collection
creating a clear separation of responsibilities
- It can be changed to collect other data (e.g. how many times
a was less than b) without affecting the SkipList code
- It makes it easy to count comparisons in complex code.
For example, in this code
if (ptr != NULL && ptr->data < x)
it's difficult to increment a counter in the right place because
the comparison is not always executed (when p == NULL).
Incrementing a counter either
before or after the 'if' statement yields an incorrect count.
Using an comparator, we would rewrite the code above as
if (ptr != NULL && (comparator.IsLessThan(ptr->data, x))
Now we are assured of an accurate count of the number of times
the comparison is actually performed.
The Comparator class has a single data member which is the number
of comparisons performed. The operations on a Comparator are
- Reset( ) -- sets the number of comparions back to zero
- GetCompares( ) -- an accessor for the number of comparisons
- IsLessThan(a, b) -- returns true if a < b
- IsGreaterThan(a, b) -- returns true if a > b
- IsEqualTo(a, b) -- returns true if a = b
Recall that objects stored in the
SkipList need only support operator<
Miscellaneous
- Although we are only storing integers in our SkipList, the SkipList
class must be implemented as a template.
- Before trying to run your program on large data files, run your
program on a small data set so that you manually verify your program's
output.
- Be sure to test your insert functions in different situations.
I.e. inserting a value that will go at the front of the list,
a value that will go at the end of the list, and a value that will go
in the middle of the list.
Our test code inserted the values 10, 5, 6, 90, 100, and 2
into a SkipList with seed = 1, probability = 0.5, and max node size = 10.
The node sizes were 3, 2, 3, 3, 1, and 3, respectively as shown in
the diagram below.
Implementing the pseudo code from the class notes, Find(5) took 5 comparisons and
Find(100) took 8 comparisons.
You should be able to reproduce this SkipList and these results.
- A data file with 200000 unique integers
named 200000.ints can be found in Mr. Frey's public directory
to use for your testing. To save your disk space, create a symbolic
link (ln -s /afs/umbc.edu/users/f/r/frey/pub/CMSC341/Proj5/200000.ints 200000.ints) to that file rather than make a copy in your local directory.
- The executable GenUniqueInts is also available
in Mr. Frey's public directory for
your use to create files of your own. Execute the command
GenUniqueInts -help for help. Once again, a symbolic link
will help save your disk space.
- The C++ library function log10( x ) returns the
log of x in base 10. You may find this "change of logarithm" rule helpful.
log x ( y )
= log 10 ( y ) / log 10 ( x )
Grading and Academic Integrity
Project grading is described in the
Project Policy handout.
Your answers to
341-Spring07-proj5_questions.txt are worth
20% of your project grade. Write your name on this file. Don't
change the name of the file.
Cheating in any form will not be tolerated. Please re-read the
Project Policy handout for further details on honesty in doing
projects for this course.
Remember, the due date is firm. Submittals made after midnight of
the due date will not be accepted. Do not submit any files after that
time.
Files To Be Submitted
You should submit only the files you have written, a makefile, and the
file containing your answers to the questions. Make sure you submit
all the files needed to successfully compile your project.
Do not change the name of the questions file. Make sure you have put
your name on it.
Your makefile should successfully compile your program when the grader
types make Proj5. Please
remember that the grader does not do this manually; it is done by an
automatic script. Therefore, make sure your makefile will proceed
successfully without human intervention after it is started.
Use submitls to check on the files you
have submitted. Leave yourself enough time to do it right - don't
wait till the last minute to do the submittals. A good practice is to
have only the files in your directory that you will be submitting. If
this is the case, then if you can successfully make the project, the
grader will be able to also. Just clean up the directory (remove
executables, .o files and data files) and submit everything else.