Assigned |
Wednesday, Feb 23, 2011 |
Due |
11:59 pm on Wednesday, March 30, 2011 Revised Due Time:
11:59 pm on Sunday, April 3, 2011 |
Update |
Thursday, March 10 1. Due time has been revised to allow you a few more days to complete the project. 2. Sample data set and output is added. 3. A note on how to use nextInt(k) to generate random numbers is added (in Implementation Notes section). 4. Pseudo code was revised slightly to make it more readable. A note is added to explain parameter passing in recursive calls in split function |
The purpose of this project is to implement and experiment a data structure known as Randomized Binary Search Tree (RBST). RBST is a BST that guarantees to have expected height of O(log n) without performing any tree balance operations.
The ordinary binary search tree (BST) as defined in the
class works well for random input. The trees of n keys will have a
``high'' probability of being balanced and thus having height of O(log n).
But the assumption that the keys are inserted in random order does not always
hold; it is known that with some specific sequence of insertions and deletions
the resulting tree can be degenerated to have height of O(n), making the
BST performing as poor as linked lists. Various techniques have been developed
to address the issue of degeneracy of a BST, including AVL trees, Splaying
trees, Red-Black trees, to mention just a few. Randomized binary Search
Tree (RBST), proposed by Martínez and Roura [1], is another such
technique.
RBST is a special BST which uses randomized version of the insertion
and deletion operations. These operations ensure that each of the
n keys has an equal probability (1/n) to be the root of the
tree. Consequently, regardless the sequence of operations and the order of
their insertions and deletions, the resulting binary search tree has the same
probability as if it had been built on the random permutations of the n keys,
and thus the expected height a RBST is guaranteed to be of order O(log n).
Of course in worst case any particular RBST may still have its height in O(n).
In this project, you are asked to only implement the randomized insertion
operation, which is described next, followed by an illustrative example and the
algorithms.
Randomized Insertion.
The randomized insertion works as follows (assuming all keys are integers): To
insert a key X into a BST T with n keys, we use the insertion
at root algorithm (to be given below) with probability 1/(n+1), and
with probability 1 - 1/(n+1)
we insert the key recursively into the right or left subtree, according to the
respective values of the key and the root.
The main idea behind insertion at root is to use the split operation: This operation creates two new BST's L and R from a key X and a given BST T, which is modified by the operation. The BST L contains the keys of T that are smaller than X, and the BST R contains the keys of T that are larger than X. Then we insert X into a new node, whose left and right children are L and R, respectively. An example is given in Figure 1 below.
Algorithms
Outlines of algorithms for "randomized insertion", "insertion at
root", and "split a BST" are given below
algorithm for randomized insertion
bst
randomizedInsert (int x, bst T)
{
pick a random number, r, between 0 and the size of T, inclusive
if r == the size of the tree
insert x at the root
/* x has probability of 1/(T+1) inserting at the root */
else
if x <
T.key then
T.left = randomizedInsert(x, T.left);
else
T.right = randomizedInsert(x, T.right)
}
algorithm for insertion at root
bst
insertAtRoot( int x, bst T)
{
bst L, bst R;
split T into two subtrees, called L
and R
make a new root node containing x
reattach L and R to the new root as its left and right subtrees
}
algorithm
to split a tree
void split( int x, bst
T, bst L, bst R) /* assuming x is not in T */
{
if T is null, make L and R null
else if x < T.key
set R = T /* all keys at the root and in right subtree are greater than x */
recursively split T's left subtree into L and R’s left subtree
/* some keys in T's left subtree are greater than x, other may be less than x */
else /* x is greater than T.key */
set L = T
recursively split T's right subtree into R and L's right subtree
}
Note: Since the parameter passing in Java is only by value, directly using L, R, or L.left, R.right in recursive calls inside split function will make L and R null, you need to find a way to work around it. One suggestion is to use local variables, say lNode and rNode, in the recursive calls and link them to L and R after these calls.
So your recursive calls will look like
Split(x, T.left, L, rNode) if x < T.key; and Split(x, T.right, lNode, R) if x > T.key
Then after the recursive calls, link R.right and L.left to rNode and lNode, respectively.
You
need to work out the details to make your code run correctly.
Explanation
of algorithm insert_at_root.
T is to be split into two BST L and R where L contain all key in T that are
smaller than x, and R contains those greater than x. If T is empty,
nothing shall be done and both L and R are also empty. Otherwise, if x < T.key, then the right subtree of T and
the root of T belong to R. To compute L, and the remaining part of R, that is,
the subtree that contains the keys in T.left
which are greater than x, we make a recursive call to split T.lef. If x > T.key, we proceed in a similar manner.
Consider the example in Figure 1. Since 26 > T.key,
we set L pointing to T, then proceed to split T.right
on 26. the two subtrees coming out of this split will become the right subtree
of L (containing those keys > x) and become R, respectively. In the
diagraphs below, the part of the original tree not split is colored black, the
Ls from slit operations colored red, R’s blue. Also note that for
illustrative purpose only two of the four parameters for split function are
shown in the diagram.
Split left subtree of 28 does nothing because it is empty. L and R are
formed with the returns of all recursive. Then 26 becomes the key of the root
of the new BST with L and R as its left and right subtrees. Figures below, reading from right to left,
show the process of unwinding of recursive calls where R’ is attached to
the left of R and L’s to the right of L.
In this project you are asked to compare the performance of ordinary BST and RBST. You will build a BST using the standard BST insert operation and a RBST using the randomized insert operation from a sequence of non-negative integers (no duplicate) read in from a data file. Then you will
Implementation notes
Note nextInt(k) in Random class pseudo-randomly produces integers in [0, k), exclusive of k. So to produce random numbers between 0 to |T|, inclusive, you should use nextInt(|T|+1).
Project 2 will be invoked with a command line that consists of two arguments. The first argument will be a positive integer (L), telling the program the depth of the level-order print shall print. The second argument is the full path name of a file that contains a sequence (may be empty) of non-negative integers the program shall use to build the two trees by insertions. The integers in the file may be in a single line or multiple lines, they are separated from each other by one or more space or by line break.
Note that you must check command line arguments to ensure that they are valid, e.g. that the command file can be opened, and print an appropriate message if an error is found.
An example command line looks like
ant -Dargs="6 /afs/umbc.edu/users/y/p/ypeng/pub/cs341s11/Proj2/testData1.txt" run
The test data set can be found in the location given in the example command line above. It contains the following non-negative integers:
1 3 5 4 6 9 8 7
The output from executing the above command line shall look like the following:
Height of the normal BST: 6
Height of the randomized
BST: 4
Level order print of the
normal BST
Level 0: 1
Level 1: 3
Level 2: 5
Level 3: 4 6
Level 4: 9
Level 5: 8
Level 6: 7
Level order print of the
randomized BST
Level 0: 3
Level 1: 1 5
Level 2: 4 8
Level 3: 6 9
Level 4: 7
The repository location for this project is:
/afs/umbc.edu/users/y/p/ypeng/pub/cs341s11/Proj2
References
[1]
Martínez, Conrado; Roura,