CMSC 341 Fall 2005
Project 5
Disjoint Set Experimentation
Assigned
| Monday, Nov 28th
|
Due
| 11:59pm, Sunday Dec 11th
|
Updates
|
|
Objectives
- To implement the Disjoint Set (Union-Find) data type
- To perform an experiment with a data structure
- To analyze the results of an experiment and determine
how/if it models expected results
- To present the results of an experiment in a well written,
logically constructed report
Background
In class, the theoretical performance of certain operations on a data
structure are presented and sometimes proven; sometimes not. In the
case of the disjoint set data structure, we showed that certain heuristics
affect the performance of the union
and find operations. In particular
we showed that using Union-by-Weight results in worst case O(lg n) performance for find (where n is the size of the universe of elements).
We also showed that using Path-Compression resulted in amortized
O(lg n)
performance for find. And finally we showed that using both Union-by-Weight
and Path-Compression resulted in amortized O(1) performance for
find .
In this project, we will find out if the theory is correct.
Description
In this project you will implement a disjoint set (DJS)
data structure which can be
configured to use Union-by-Weight and/or Path-Compression or neither of these.
After implementing this data structure, you will perform a sequence of
union
and find operations, periodically displaying various data and calculations.
The methodology for performing the experiment is described below. You are
at liberty to run the experiment as often as you like, but a minimal set
of data must be submitted with your project.
Once you have completed the experiment and analyzed the data, you will write
a report that describes the results of your experiment and if/how those
results conform to the theoretically expected results.
The performance of a find operation is measured by the length of the
path from the element being found to its representative. We will measure
this length by counting the number of times find
is called recursively.
Your DJS data structure must be able to count these calls and provide
an accessor for this value.
You may use the author's DJS code (found in Mr. Frey's public directory)
or implement your own DJS class.
The command line for project 5
consists of the following parameters, in this order
- The number elements in the universe, N, numbered 0 to N - 1
- The number of operations to perform, M.
- The random number generator seed, S
- The data reporting interval, I. Print the required data after every I operations. Assume that I is a factor of M.
- One of the strings "compress" or "NOcompress" (without the quotes), indicating whether or not path compression should be used.
- One of the strings "byWeight" or "NOTbyWeight" (without the quotes), indicating whether or not union-by-weight should be used.
In this experiment we will use a random number generator to determine which
operation to perform and the element(s) on which to perform it. So that your
results can be duplicated, you will seed your random number generator from
a command line parameter. We will be using the functions srandom and random. See the man pages
for details on how to use them.
In this experiment, the following values are reported every time I operations
have been completed. See the sample output
below for an appropriate format, legend and column headers.
- The number of operations performed so far
- The number of find operations performed so far
- The number of union operations performed so far
- The total number of recursive calls to find so far
- The number of recursive calls to find since the last
time reported
- The average number of recursive calls to find for each operation
- The average number of recursive calls to find since the last time reported.
- The average number of recursive calls to find
for each operation divided by lg2( N ).
The experiment consists of the following steps
- Construct an appropriately configured DJS data structure.
- Seed the random number generator
- For each of the M operations
Choose a random number. If the number is odd, select two random
numbers from 0 to N-1 and union their sets together. If the number is
even, select one random number from 0 to N-1 and perform a find
operation on that element.
Every I operations, report the data described above.
In lieu of project questions, you will write a report about your experiment
and your data analysis. This report will count 25 points
of the project grade. Spelling and grammar count. Your report file
must be named Proj5Report.txt and must be written in plain text.
Be sure to include your name, section, UMBC username, etc
at the beginning of your report.
Your report should begin with a brief description of your DJS class.
Describe how you designed your class to be configurable with respect to
using path-compression and/or union-by-weight.
The bulk of your report should provide an analysis of the data for each
of the four DJS configurations (with/without path-compression and union-by-weight). For each configuration, describe how the reported data (total recursive
calls, average recursive calls, average recursive calls divided by
lg2( N ), etc.) behave (i.e. become constant, get increasingly larger, get decreasingly smaller, converges to a value, etc.) as the number of operations increases.
Describe the expected behavior according to the theory discussed in class
and how/if/why your data supports or conflicts with that theory.
Project Notes, Hints, and Requirements
- The C/C++ math library does not provide a
function for calculating lg2( N ). It does provide the
function log10( N ) which returns the log of N in base 10.
Use the math identity lg2( N ) = log10( N ) / log10( 2 )
to calculate the log of N in base 2. You must #include <cmath>
to use log10.
- Note that union is a keyword in C/C++ and so cannot be used
as the name of a function, variable, class, etc.
- When performing a union operation, be sure to first
find the repesentatives of the sets being unioned.
- You must #include <cstdlib> to use random( ) and srandom( )
- Mr. Frey's public directory for this project is
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj5
- Run your experiment with various universe sizes, number of operations,
with/without path compression and union-by-weight. Analyze this data to write your report.
- You must submit four data files from your experimentation.
In each case, use N = 1000, M = 10000, I = 100 and S = 1234
- Submit the file exp1.dat which results from
running your project without path compression and without union-by-weight
- Submit the file exp2.dat which results from
running your project with path compression and without union-by-weight
- Submit the file exp3.dat which results from
running your project without path compression but with union-by-weight
- Submit the file exp4.dat which results from
running your project with both path compression and union-by-weight
Sample Output
This sample output is provided only as a format example.
Your output must be presented in a readable table format.
Experiment Parameters
---------------------
Number of Elements = 500
lg( Nr Elements ) = 8.96578
Number of Operations = 1000
Union-by-Weight = NO
Path-Compression = NO
Random Seed = 1234
Table Legend
------------
Nr Ops = Cumulative Number of Union/Find Operations
TC = Total Recursive Calls
Delta TC = Change in TC from previous line
AC = Average Recursive Calls per Operation
Delta AC = Change in AC from previous line
K1 = Average Calls / lg( Nr Elements )
Total Ops Unions Finds TC Delta TC AC Delta AC K1
100 42 58 6 - 0.0600 - 0.0067
200 82 118 31 25 0.1550 0.0950 0.0173
300 139 161 65 34 0.2167 0.0617 0.0242
400 190 210 125 60 0.3125 0.0958 0.0349
500 238 262 209 84 0.4180 0.1055 0.0466
600 282 318 311 102 0.5183 0.1003 0.0578
700 332 368 455 144 0.6500 0.1317 0.0725
800 382 418 593 138 0.7412 0.0912 0.0827
900 432 468 745 152 0.8278 0.0865 0.0923
1000 482 518 904 159 0.9040 0.0762 0.1008
Files To Be Submitted
Submit any files which you have written or modified -- source code and makefile.
Submit your project report. This file must be in plain text and be named
Proj5Report.txt.
Submit the required experiment data files
If your makefile is set up correctly, you should be able to execute the command
make submit.
Submission Tools
There are a number of tools available to you to check on your submittal. It is
your responsibility to ensure that the submittal is correct and will result in
a successful compilation of your project. Do not wait till the last minute to
submit your files. Give yourself enough time to check that your submittal is
correct.
If you don't submit a project correctly, you will not get credit for it.
Why throw away all that hard work you did to write the project? Check your
submittals. Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One
of the tools provided by the submit program is
submitls. It lists the names of
the files you have submitted.
Additionally, there are two programs for use only by CMSC-341 students (not
part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/
and are named submitmake
and submitrun. You can
use these programs to make or run your submitted projects.
The syntax is similar to that for submit:
submitmake <class> <project>
Example: submitmake cs341 Proj5 <parameter list>
Submission Tools
There are a number of tools available to you to check on your submittal. It is
your responsibility to ensure that the submittal is correct and will result in
a successful compilation of your project. Do not wait till the last minute to
submit your files. Give yourself enough time to check that your submittal is
correct.
If you don't submit a project correctly, you will not get credit for it.
Why throw away all that hard work you did to write the project? Check your
submittals. Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One
of the tools provided by the submit program is
submitls. It lists the names of
the files you have submitted.
Additionally, there are two programs for use only by CMSC-341 students (not
part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/
and are named submitmake
and submitrun. You can
use these programs to make or run your submitted projects.
The syntax is similar to that for submit:
submitmake <class> <project>
Example: submitmake cs341 Proj5 <parameter list>
Submission Tools
There are a number of tools available to you to check on your submittal. It is
your responsibility to ensure that the submittal is correct and will result in
a successful compilation of your project. Do not wait till the last minute to
submit your files. Give yourself enough time to check that your submittal is
correct.
If you don't submit a project correctly, you will not get credit for it.
Why throw away all that hard work you did to write the project? Check your
submittals. Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at http://www.gl.umbc.edu/submit/ . One
of the tools provided by the submit program is
submitls. It lists the names of
the files you have submitted.
Additionally, there are two programs for use only by CMSC-341 students (not
part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/
and are named submitmake
and submitrun. You can
use these programs to make or run your submitted projects.
The syntax is similar to that for submit:
submitmake <class> <project>
Example: submitmake cs341 Proj5
This makes the project, and shows you the report from the make utility. It cleans
up the directory after making the project (removes .o and ii_files), but leaves
the
executable in place.
submitrun <class> <project>
Example: submitrun cs341 Proj5 <parameter list>
This runs the project, assuming there is an executable (i.e. submitmake was
run successfully).
Project grading is described in the
Project
Policy handout.
Cheating in any form will not be tolerated. Please re-read the Project
Policy handout for further details on honesty in doing projects for this
course.
Remember, the due date is firm. Submittals made after midnight of the due
date will not be accepted. Do not submit any files after that time.