Trees
Please note: There were a few semantic typos in the pseudocode. Please make sure you understand the current version and verify for yourself that it is correct. find() and remove() were modified from their original form.
Basics
A tree is defined as either an empty tree or a node
(or vertex) with zero or more subtrees. The subtrees
are themselves trees by this same definition. Note that this is a recursive
definition of a tree. The base case is a tree that is empty
(i.e. has no nodes).
Terminology
The top of a given tree is a distinguished node called the root of
the tree. The connections from the root to its subtrees are called edges. Each of the root's subtrees may also have its own root (after all,
it is its own tree). If the root has k subtrees (numbered [1..k]), the roots
of each of these subtrees is said to be a child of the root of the tree.
Similarly, the root is called the parent of these children. Children of the same parent node are called siblings. The analogy to geneology
can be extended ad nauseum.
Any node in a tree may have children. A node with no children (that is, it has only empty subtrees or no subtrees at all) is called a leaf node. Non-leaf nodes are called internal nodes. The branch factor of a tree
is the maximum number of children for a single node in the tree. That is, in a tree with a branch factor of three, no node has more than three children.
A path in a tree is a sequence of nodes (not edges) such that
for each pair of nodes, ni is the parent of ni+1 in that sequence. That is, the first node in the path is the parent of the second, the second is the parent of the third, etc.
For any two nodes ni and nj, ni is an ancestor of nj if and only if i < j on some path in the tree. Similarly, nj is said to be a descendant of ni. Note that any node in a general tree may be uniquely identified by
the path from the root to that node. Note also that root is an ancestor of all of the nodes in its subtrees and that every node in one of root's subtrees is
a descendant of the root.
The length of a path is the number of edges
crossed in following that path. This is always one less than the number of nodes on the path. The depth of a node is the length of the path from the root to that node. The depth of a tree is the depth of its deepest leaf. Similarly the height of a node is the depth of the subtree rooted at that node. The height of a tree is the height of its root.
Binary Trees
Binary trees are the same general trees defined above with the additional
restriction that their branch factor is two.
Binary Tree Traversals
There three basic ways to traverse binary trees. They are named for when the root
is visited in relation to its children.
preorder(t)
visit(t)
preorder(left(t))
preorder(right(t))
postorder(t)
postorder(left(t))
postorder(right(t))
visit(t)
inorder(t)
inorder(left(t))
visit(t)
inorder(right(t))
Note that preorder and postorder can be extended easily for trees with an arbitrarily high branch factor but that inorder traversals really only make sense
on binary trees.
The call to visit(t)
can perform whatever operation is
necessary at that node (e.g. print the value).
Binary Search Trees
A binary search tree (or BST) is a binary tree with the additional restriction
that for a given node with a value k, all of the values in k's
left subtree are less than the value of k and all of the values in
k's right subtree are greater than its value. Duplicates may either be
ignored or stored in some sort of data structure at each node depending on your
application.
Insertion
Insertion into a BST looks roughly like this:
insert(v, t)
if (t is empty)
return new node(v, empty, empty)
else if (v < value(t))
return node(value(t), insert(v, left(t)), right(t))
else
return node(value(t), left(t), insert(v, right(t)))
That is, look at the current node's value. If v is less than it, insert
this value into the left subtree of t. If v is greater than it,
insert this value into the right subtree of t. If t is empty, make a new tree of just this node and return it.
Find
BST find is almost identical. This version returns the subtree rooted at
v or empty if the value is not found.
find(v, t)
if (t is empty)
return empty
else if (v == value(t))
return t
else if (v < value(t))
return find(v, left(t))
else
return find(v, right(t))
Deletion
In deleting a node we must be careful to maintain the BST properties.
Deletion starts with a find on that node. When that node is found,
we want to leave the structure of the tree intact while removing this value.
That means we need to find a replacement for the value in the tree.
In order to maintain the BST property, we need another value that will maintain that property. Where could we find such a value?
Successors and Predecessors
Note that if you print a BST using an inorder traversal, you will end
up with a sorted list of values. Note that if we were to pick out the root from this list, all of the values to the left of it would be in its left subtree
and all of the values to the right of it would be in its right subtree.
Now realize that if we remove a node from a BST, this traversal needs to remain
in order, but the value at that node must be gone. What fills its place? In
the traversal, it's replaced by either the next smallest value in the tree
or the next largest. These nodes are called the predecessor and successor respectively. These are the candidates for replacing a given node. Why?
It turns out that if you replace a node with the next smallest value in its
subtrees, all of the values in its smaller subtree are still smaller than that value. This is true because the predecessor is the largest value from the smaller subtree. This case is symmetric with choosing the next largest value. The successor will be the smallest value in the right subtree and the predecessor will be the largest value in the left subtree. Verify for yourself that this is true.
Given that, the algorithm for removal from a BST is relatively simple:
remove(v, t)
if (t is empty) // Not in tree - do nothing
return
if (v < value(t))
remove(v, left(t))
if (v > value(t))
remove(v, right(t))
else
if (left(t) is empty)
if (right(t) is empty)
delete t
else
value(t) = findMin(right(t))
remove(findMin(right(t)), right(t))
else
value(t) = findMax(left(t))
remove(findMax(left(t)), left(t))
Note that at the end it's important to remove the replacement from
its original subtree since a given value cannot be in two places in a BST
at the same time.