CMSC 341 Fall 2006 
    Project 5
  
    
      | Assigned | Monday, Nov 27, 2006 | 
    
      | Due | 11:57 pm on Sunday Dec 10, 2006 | 
    
    | Updates | Monday Nov 27th -- the definition of "cluster" is hereby changed
to be "a group of 1 or more consecutive slots" in the hash table.
The old definition required 2 or more consecutive slots. | 
Objectives
The purpose of this project is to give you significant exposure 
to Hash Tables. You will be conducting experiments with probing hash tables
using linear, quadratic and double hashing. You will be asked
to analyze the data you collect and compare the data with the expected
theoretical results.
Description
In this project, you will be inserting randomly
generated integers into three hash tables. For one hash table,
 you will use  linear probing for collision resolution. For another you 
will use quadratic probing. For the third hash table, you will use double
hashing. You will be collecting data about clustering and probing in
hash tables. 
Your program will attempt to insert N unique random integers into the hash tables,
reporting the collected data on a specified interval (after every INTERVAL
insertion attempts). For  example, with N = 1000 and INTERVAL = 100, 
you will attempt to insert 1000 integers, and report data  after attempting 
to insert 100, 200, 300, 400, 500, .... 1000 integers. See  the
sample output
for recommended organization of the data. You may assume that N is a multiple
of INTERVAL.
Throughout this project description, "N" will refer to the number
of  attempted  insertions into the hash table; "K" will refer to the key
being  inserted;  "I"  will refer to the probe number; and "M" will refer
to the  hash table  size.  These definitions are the same as those
used in class.
Your program will accept the following command 
line arguments  (in this order)
    - the name of a file which contains unique random integers separated
by whitespace. 
    
- N-- the total number of random integers to attempt to insert
    
- INTERVAL -- the interval (nr of attempted insertions) for reporting
  data
After attempting to insert INTERVAL integers, your program will report the following data.
    - the total number of attempted insertions (successful and unsuccessful)
    so far
    
- the value of lambda (successful insertions / table size)
    
- the total number of probes so far (for both successful and unsuccessful
    insertions)
    
- the average number of probes needed to (successfully or unsuccessfully)
    insert an integer into each  hash table
    
- the maximum number of probes needed to  (successfully or unsuccessfully)
     insert an integer into each  hash table
    
- the number of successful insertions
    
- the number of unsuccessful insertion attempts.
    
- the number of clusters in the hash table
 A cluster is defined to  be 1 or more consecutive occupied slots 
in the hash table. When counting clusters, do not "wrap around".
- the maximum cluster size in each hash table
    
- the average cluster size in each hash table
Requirements, Notes, and Hints
Mr. Frey's public directory for this project is
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/Proj5
- Code for the author's hash table class (QuadProbing.h) can be
found in Mr. Frey's public directory.  Other related files
(NextPrime.cpp and HashFunctions.cpp) are also located in that directory.
- Note that the author's code uses quadratic probing for collision
resolution.  This project requires three different collision resolution
methods.
- Your code must calculate the hashtable size (M) based on the number
of integers (N) to be inserted.  Use the smallest prime greater than N / 0.90.
The author's NextPrime( ) function can help calculate M for you.
- The author's code automatically rehashes the hash table when the
table is 50% full.  This code must be removed so that we can study
the effects that filling the hash table.
- An insertion is "unsuccessful" if h(k, i) == h(k, j)
for some j not equal to i.  I.e. you ever attempt to store the key
into the same slot twice.  You should check this condition after each
probe so that we all count unsuccessful insertions in the same way.
-  Test data files can be found in Mr. Frey's public directory.
No comments or other text will be found in the files.
This is the same format as the integer files used for project 4.
- You must write a makefile for this project.  The makefile should
reference any unmodified files (e.g. NextPrime.cpp) in Mr. Frey's public
directory.  DO NOT submit these files.
- Answer the questions found in the file
 341-Fall06-p5_questions.txt found in Mr. Frey's public directory.
These questions are  worth 20% of the project grade.
 
- Your code must handle (try/throw/catch) any exceptions thrown by the 
       author's code or by your own classes.
- All "average" values should be printed with 2 decimal places of precision.
-  Probing Functions
   
   -  For all probing, use the function
    
 h( K )  = K mod M
- For linear probing, use the function
     
 h( K, I ) = ( h( K ) + I ) mod M
- For quadratic probing, use the function
    
 h( K, I ) = ( h( K ) + I2) mod M
- For double hashing, use the function
    
 h ( K, I ) = ( h( K ) + I * h2( K ) ) mod M
 where h2 ( K ) = R - ( K mod R )
 and where R is the largest prime that is smaller than the table size.
 Your program is responsible for calculating the value of R.
 
         The following sample output is for formatting and organizational 
purposes     only. No real data is provided.
       In this example, N = 1000 (the total number of insertions to attempt)
  and INTERVAL = 100 (the reporting frequency).    Your output will consist
of three tables like the one shown below. 
There will be separate
tables for linear probing, quadratic probing and double  hashing.
In the tables, the columns are defined as follows
General:
  - "N" -- the total number of attempted insertions.
  
- "lambda" -- the load factor -- the number of successful insertions
 / table size
 
For the "Inserts" columns:
  - "success" -- the running total of the number of successful insertions
  
- "failed" -- the running total of the number of insertions that failed
   because no slot could be found. 
    
 Note that success + failed = N
For the "Probes" columns:
  - "total" -- the running total number of probes for all insertions (successful 
 and unsuccessful)
  
- "avg" -- the average number of probes for each insertion (successful
   and unsuccessful)
  
- "max" -- the maximum number of probes necessary for a single insertion
   (successful or unsuccessful)
For the "Cluster" columns:
  - "number" -- the number of clusters in the table
  
- "avg" -- the average size of each cluster
  
- "max" -- the maximum cluster size in the table
 
Linear Probing Analysis (Table size = xxxxxx)                 
                  --- Inserts ---   ------- Probes -------    ----- Clusters ----- 
     N   lambda  success  failed     total    avg    max     number   avg     max  
   100     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx
   200     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx
   300     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx 
  ....   
  ....
  1000     0.xx    xxxxx  xxxxxx   xxxxxxx  xxx.xx  xxxxx    xxxxxx  xxx.xx  xxxxx
                 Files To Be Submitted
                 Submit any files which you have  written  -- source code 
and   makefile.       Your files MUST be named as listed  below.  
              
  - Proj5.cpp -- contains your main( )
  
- Any other files you write for this project     
  
- The question file with your answers inserted.
  
- Your makefile -- which must be named makefile or Makefile
              .     
Submit the files using the procedure given to you for your section of
the course.
If your makefile is set up correctly, you should be
able   to  execute     the command make submit. 
DO NOT submit files that you  didn't write -- These files should  be referenced
by your makefile.                  
                       
 Submit Tools
There are a number of tools available to you to check 
on  your   submittal. It is your responsibility to ensure that the 
submittal is correct and  will result in a successful compilation 
of your project. Do not  wait till the last minute to submit your files.
Give   yourself  enough time  to check that your submittal is correct.  
If you don't submit a project correctly, you will not get credit for 
           it. Why throw away all that hard work you did to write the project? 
     Check . Make sure they work. Do this before the due date.
Documentation for the submit program is on the web at
http://www.gl.umbc.edu/submit/
Additionally, there are two programs for use only by CMSC-341 students
(not part of the UCS submit program). They are in the directory
/afs/umbc.edu/users/d/e/dennis/pub/CMSC341/ and are named 
submitmake and submitrun. You can
   use    these    programs to make or run your submitted projects. 
The syntax is similar to that for submit: 
submitmake <class> <project>
Example: submitmake cs341 Proj5 
This makes the project, and shows you the report from the make utility.
            It cleans up the directory after making the project (removes
.o   and    ii_files),      but leaves the 
                       executable in place. 
submitrun <class> <project> [command-line 
           args]
Example: submitrun cs341 Proj4 1234 1000 100 
This runs the project, assuming there is an executable (i.e. submitmake
            was run successfully). 
Grading and Academic Integrity
                        Your project will be tested using a variety of command
   lines,    some   of  which  will be invalid. 
                       Your project will also be tested using a variety of
 command     files    which    will test various conditions which your code
 should handle.                           
Project grading is described in the 
                       Project Policy
                 handout.
                       Cheating in any form will not be tolerated. Please 
re-read     the   
                       Project Policy
                        handout for further details on honesty in doing projects
    for   this   course.    
Remember, the due date is firm. Submittals made after midnight of the
due date will not be accepted. Do not submit any files after that time.
Dennis Frey