tree traversals question

Interval Trees1
Interval Trees
In this section we look at another example of an augmented binary search tree. Consider the problem
of storing a set of intervals. As a particular example, suppose each interval represents the start and end
times of a project. Often we draw a pictorial representation of the intervals over a time line, as as in the
figure below. If each line segment represents the start to end time of a project, then this figure enables us
to identify the overlapping intervals. It is clear that p3 and p4 overlap, as do p4 and p5. However p1 and
p3 do not overlap.
A manager might want to store all these timelines in a system and be able easily insert and delete the
timelines of new projects. It might also be extremely relevant to be able to search for any interval in
the system that currently overlaps with a given interval. This would be the case for instance, if a new
project were proposed and the manager needed to determine if the new timeline overlapped with any others
currently in the system.
In this section we look at how to create an augmented binary search tree that stores these intervals and
allows for the following operations:
• Insert a new interval into the tree
• Delete an interval from the tree
• Search for an interval in the tree that overlaps with a given interval i
1.1
The tree structure
The intervals are stored in a binary search tree. Each node of the tree is an object which contains the
usual BST attributes (x.parent, x.lef t, x.right), as well as the following additional attributes:
• The left endpoint of the interval: x.int.low
• The right endpoint of the interval: x.int.high
• The maximum value of all endpoints in its subtree: x.max.
Note that there is not attribute x.key. Instead, the BST is built using the value in x.int.low as the key. An
example of a BST built using the value x.int.low as the key is shown below. Notice that the left endpoints
are those that are satisfy the BST property at each node. The right endpoints are also stored at each
node, but they are not used as part of the BST property. Each node also contains x.max, which is the
maximum of all right endpoints in the subtree rooted at x:
1
It remains to show how this new information can be used to search for interval overlaps.
2
Searching for an interval that overlaps with i
The interval search algorithm in this section is a very basic search procedure. We refer to the algorithm
as Interval-Search(x,i), with the following specification:
• Input parameter x is the root of an interval tree
• Input parameter i in an interval from i.low to i.high.
• The algorithm returns false if no interval in tree overlaps with i
• If there is at least one interval in the tree that overlaps with i, the algorithm returns a pointer to
one of those intervals.
The algorithm works much like the traditional tree-search algorithm. We begin by checking if the
interval i overlaps with the root, x, and if it does, it returns the interval at the root. Otherwise it
continues the search either down the left path or the right path. The decision to continue either left or
right is summarized below:
Search left
Search right
Which way to search?
If i.low ≤ x.lef t.maximum, then if an overlap exists in this tree,
one such overlap must exist in the left subtree. So the search
continues to the left.
If there is no left subtree, or if i.low > x.lef t.maximum then
the only possible overlap is in the right subtree. So the search
continues to the right.
The example below demonstrates the first step of the search for interval i = (14, 16). The maximum
right endpoint of the left subtree is 16 and i.low is 14. Therefore one overlapping interval must exist in
the left subtree, so the search proceeds to the left. At the second step, the maximum endpoint of the left
subtree is only 12 and i.low is 14. Therefore the search continues to the right. At this point, the interval
i overlaps directly with the interval 15 ≤ x ≤ 16, and therefore we return interval (15, 16) as our result.
2
The algorithm is shown below, where x is initialized as the root of the tree.
INTERVAL-SEARCH(x,i)
While x 6= N IL and i does not overlap with x.interval
if x.lef t 6= N IL and x.lef t.max ≥ i.low
x = x.left
else x = x.right
return x
Runtime The algorithm above searches from the root to a leaf in the worst case. Therefore the runtime
is O(h), and if the tree is a red-black or AVL tree, then the runtime is O(log n).
3
Inserts and Deletes
This new interval tree is simply a binary search tree using x.int.low as the key. It is important to show
that the additional information we have stored in the tree (the maximum right endpoint) can be quickly
updated during an insert and delete operation.
When a new key is inserted, we perform the insert as usual in the binary search tree and insert the new
key as a leaf. As we go down the path from the root to the leaf, we update the maximum right endpoint
of all the nodes as necessary:
This update does only takes constant time per node, and thus the runtime of the insertion of this node
is O(h) as it is for usual BST trees.
3
However, if the BST is in particular a red-black tree or AVL tree, then the insert may make a call to
RB-repair which requires rotations. So just like we saw for subtree sizes, we need to update the values
of x.max after a rotation is made. There are only two nodes whose values need to be updated after the
rotation. In the figure below, these nodes are marked A, and B. We simply re-evaluate their max endpoint
values after the rotation. This takes a constant amount of time per rotation, and so the overall runtime of
RB-repair is still O(h).
In conclusion then, it is possible to carry out the insert operation for an interval tree in time O(h).
Furthermore, if the tree is also a red-black tree, it is possible to update the values of x.max after the
rotation so that the overall runtime of RB-repair is still O(h) = O(log n).
3.1
Deletion
The delete operation for interval trees works by carrying out the usual deletion as we saw for BST. Recall
that there were 3 cases. In each of these 3 cases, we must update the values of x.max in the interval tree
after the deletion. These 3 updates are part of the practice problems.
4
Red-Black Trees
In our previous lecture we saw that many operations on binary search trees run in worst-case time
O(h), where h is the height of the tree. However, the height of the binary search tree is not necessarily
nicely balanced. We showed the depending on how the elements are inserted, we may end up with a tree
of height O(n), meaning that our insert/delete operations would then run in time O(n). Furthermore,
although we showed that a randomly built binary search tree has height O(log n), this height is not
necessarily maintained as we carry out inserts and deletes on the tree. This means that although we may
start off with a nicely balanced tree, we can quickly disrupt its height as we carry out operations on the
tree. Future operations then get slower and slower.
In this lecture we look at a variation of the binary search tree called the red-black tree. Such trees are
height-balanced meaning that they maintain a height of O(log n) in the face of arbitrary insertions
and deletions. The huge advantage of this balanced height is that the dynamic-set operations that run in
time O(h), now run in time O(log n) in a red-black tree .
1
The Red-Black tree
Red-Black trees are simply a type of binary search tree. However, they are defined in such a way that the
height is guaranteed to be O(log n), and this height is maintained even after insert/delete operations. In
order to accomplish this task, we simply add an additional attribute to the node object in the tree, that
has exactly two possibilities:
x.color = red
or
x.color = black
Using the above attribute, we defined a red-black tree as a binary search tree that has the additional
properties below:
The Red-Black tree properties:
The RB tree is a binary search tree with the following properties:
• Each node has a color: red or black
• The root is black
• A red node can only have black children
• Add additional NIL black nodes to the tree so that every real node has 2 children. These
nodes do not contain keys, and are not considered part of the data set.
• For each node in the tree, all paths from that node to a NIL leaf have the same number of
black nodes. Note that we include the NIL black nodes in the count.
One of the essential properties above states that all paths from a node to a leaf have the same number
of black nodes. It is important to node that when we count the number of black nodes on a path to a leaf,
then ensure you:
• Do not count the node itself
• Include the NIL leaf node
Below is an example of a red-black tree. One can check that all red nodes have only black children.
Furthermore, the number of black nodes on any path from the root to a leaf is 3. If we count the number
1
of black nodes from node 31 to a leaf, each path has exactly 2 black nodes. This fact must be true for all
nodes in the tree, otherwise it is not a proper red-black tree.
The Nil nodes, are ”additional” nodes, that have no keys, and are not consider part of the data set.
However, they do ”exist” in the sense they are not the same as the NIL value.
Since any node in a red-black tree has the same number of black nodes on any path down to a leaf,
we give this number a special name called black-height of that node. Each node has its own black height.
In the example above, node 18 has a black height of 2, since the number of black nodes on any path to
a leaf is 2. Note that the root node has a black height of 3. We refer to the black-height of a tree as the
black-height of its root node. The formal definition of black-heigh is given below:
Definition. The black height, bh(x) of node x is the number of black nodes on the path from x down to
a NIL leaf (not including the node x itself ).
In the next section, we show how the specific properties of the red-black tree result in a tree that has
height Θ(log n).
2
Height of a Red-black tree
Since red-black trees are designed in such as way that the red nodes only have black children, then this
restricts the number of red nodes that exist on any path. For example, paths of all red nodes do not exist.
In fact, the most red nodes we can achieve on any path is by alternating between red and black nodes.
This leads us the the following fact:
Minimum number of black nodes on a path:
The number of black-nodes on any path from the root to a leaf is at least half the actual number of
nodes on the path.
This property can be verified in the previous figure. Notice that the longest paths (of length 5) have 3
(real) black nodes. The paths of length 4 have in fact 3 black nodes. So the number of black nodes on any
path it at least half of the total number of nodes on that path. It is impossible to have a path of length 5
with only 2 black nodes.
The reason we define black-height is so that we can use it to justify the overall balanced structure of
the tree.
2
Theorem 1. A red-black tree with n nodes (not counting the NIL nodes) has height O(log n)
Proof:. We won’t show a formal proof of this fact here. Instead, we just sketch out the idea, and explain
why the above theorem is true using the following facts.
• Fact 1: Imagine for a moment that all the nodes in the RB tree were black. The only way for such
a RB tree to exist is if the tree were complete and full, as in the figure below:
In the above example, there are 7 actual nodes and the black-height of the tree is 3. Note that
7 = 23 − 1. Similarly, a tree of all black nodes of black-height 5 has 31 actual nodes in the tree. Notice
that 31 = 25 − 1. In general, it looks like for an all-black red-black tree rooted at x, the number of
nodes is given by:
n = 2bh(x) − 1
• Fact 2: We can generalize this to red-black trees that have a mix of red and black nodes. The
red-black tree below has a black-height of 3. It has n = 18 nodes. In this case, the number of nodes
is at least 2bh(x) − 1. Thus a red-black tree with black-height given by bh(x) has a total number of
nodes bounded below by:
n ≥ 2bh(x) − 1
• Fact 3: If we isolate for n in the equation n ≥ 2bh(x) − 1 we get
bh(x) ≤ log(n + 1)
Thus the black-height of a tree is O(log n). Now we just need to relate the black-height of a tree to
the actual height of the tree. Recall from above that the number of black nodes on a path is at least
half the total number of nodes. This means that the actual height of the tree is at most double the
black height. Since the black-height is O(log n) then the actual height is also O(log n).
We have established that red-black trees have height O(log n). So as long as we can maintain the redblack properties during insertion/deletion, then the tree will remain balanced. This guarantees an O(log n)
time for our dynamic-set operations. In the next section we look at how to vary the insert operation we
saw in the regular BST so that it maintains the structure of the red-black tree.
3
3
Insertion
Insertion into a red-black tree is a two-step process. Recall that the tree is in fact a BST, so the insertion
must obey the BST properties. However, simply inserting a new node with a certain color, may break the
red-black tree properties. We fix this problem in the second step, with an algorithm called RB-repair.
Step 1: Insert node z into the BST
We insert a new node z using the insertion process for binary search trees. The insertion position takes
the place of a NIL leaf. We temporarily insert the new node at this position, we color it red, and give it
two NIL children of its own. Since the height of the red-black tree is O(log n), then the runtime of this
step is O(h) = O(log n).
Step 2: RB-repair(z)
We now fix any red-black tree properties by applying an algorithm called RB-repair to node z. There
are three cases that may arise after the insertion of the new node z.
Case 0: When there is no extra work to do
• If the RB tree was initially empty, then the new node z becomes the root of the tree and is recolored
black. RB-repair terminates.
• If the new node that was inserted had a black parent (as in the example above), then the properties
of the RB tree are maintained and we have no new work to do. RB-repair terminates.
Case 1: What to do when the parent and uncle are red
The new node z is inserted as a red node in Step 1. If its parent is also red, we have violated a property
of the RB trees. If the uncle is also red, as shown below, then we can simply recolor both the parent and
the uncle to black, and recolor the grandparent to red:
4
The result is that now z has a black parent. However, the recoloring process may have caused a problem
with the parent of g. What if the parent of g was also red? Then we may have just caused a new violation
further up the tree. To resolve this issue, we simply call our algorithm recursively for the node g: RBrepair(g). In doing so, the repair procedure moves up the tree repairing as it goes, until it reaches the root
which is black.
Case 2: What to do when the parent is red and the uncle is black
Two different examples of this case are shown below. The one on the left is usually called the bent
case and the one on the right is called the straight case.
If we come across a bent case, then in fact we unbend it using a rotation about the parent node p. A
left rotation about p means that we reassign z as the left child of g, and p becomes the left child of z. We
must also reassign the left child of z, labelled 2 in the diagram below. This child becomes the right child
of p after the rotation. All of this reassigning only takes a constant amount of time.
g
g
left rotate p
p
1
u
z
2
4
z
5
p
3
1
u
3
4
5
2
Now we move on to figuring out how to resolve the RB tree violation assuming that we are in case 2
and we have a straight case. In this case we can perform the following tasks:
• Perform a rotation about the grandparent of the new node. This shifts the node p up to the position
of the grandparent, and the grandparent g rotates down to the other side. The subtrees get reassigned
as in the figure. In the case of a right rotation, the right child of p becomes the left child of g.
• After the rotation, we switch the color of p to black and we switch the color of g to red
Once these operations are complete, there are no new possible violations in the rest of the tree.
.
5
right rotate g
switch colors
of p and g
g
p
z
1
u
3
4
p
z
5
1
2
g
2
3
u
4
5
Runtime of RB-repair: Case 1 and Case 2 together represent our RB-repair algorithm. In Case 1,
we make a recursive call to the grandparent. In Case 2, there is no recursive call. Both Case 1 and Case 2
only take constant amount of time for the work they do. This means that in the worst case, the RB-repair
may carry out a contant number of steps for all nodes on a path up to the root. Since the height of an RB
tree is O(log n), the runtime of this algorithm is O(log n) in the worst case.
3.1
Runtime of Insert in RB tree:
The first step above involves inserting into the BST, which runs in time O(log n) in a RB tree because the
height is O(log n). The second step, RB-repair, also runs in time O(log n). Therefore the overall runtime
of insertion in an RB tree is O(log n).
A similar fix-it approach is possible for the delete operation. The repair algorithm for delete has several
more cases. However, the involve a set of rotations and recolorings as in the insert algorithm.
6
Augmented Binary Search Trees
Recall that binary search trees support three main dynamic-set operations: Insert, Delete, and Search.
We saw in our previous lecture that these operations can run in time O(log n) for BST’s that have height
O(log n), such as Red-black trees.
A typical BST stores a key in each node, which is the primary information stored by the tree. Additional
information can be stored in the node object, as in Red-black trees, where we also stored the color of each
node. This is usually called augmenting the BST with additional information. In this lecture, we look
at further ways of augmenting a BST that then enables us to carry out the following additional operations:
• Select an element by rank : returns the item corresponding to rank k from the BST
• Returning the rank k of a given element x from the tree.
• Finding the total number of elements that lie within a specific range: a ≤ x ≤ b
We first look at how to augment the binary search tree by augmenting each node with its subtree size.
This allows us to efficiently perform the above mentioned operations. Adding additional information to
the binary search tree is not always easy. As soon as we add new information to the tree, then each time
we do an insert/delete operation we need to update this additional information as well. This must be done
efficiently, otherwise our augmented BST will have slower insert and delete times. The goal is to maintain
the three primary operations of the balanced BST so that they continue to run in time O(h).
1
Augmenting the BST with subtree sizes
In order to carry out the above three operations in a BST, we augment the BST by storing the subtree
size in each node of the tree:
The implementation of the tree would now involve node objects that have an additional attribute, such
as x.size that refers to the subtree size of the node rooted at x. Note that nodes that are leaves have a
subtree size of one, since the size of the subtree includes the node itself. The subtree size at the root is in
fact the number of nodes in the entire tree.
We shall now show that this augmented binary search tree enables us carry out our new operations.
1
2
Finding the Rank of an element
Suppose we would like to determine the rank of a particular element in the tree. In the above figure, the
key 11 in the tree has rank 7 because it is the 7th smallest of all elements in the tree. Given a node x
(a pointer to a node) in a BST, we would like to determine the rank of this node in the tree. The figure
below is a general diagram of a BST, with the subtrees drawn as triangles. Notice that all the elements in
pink are those that guaranteed to be smaller or equal to x. Therefore the rank of x is simply the sum of
all the pink nodes. The pink nodes can be categorized as either:
• Those that are in the left subtrees on the path from the root to x
• The nodes on the path to x that preceded a right child.
Thus if we simply count all these nodes as we search down the tree for x, we can determine the rank of
x. This is a simple update to the TREE-SEARCH algorithm from the lecture on BST. The update is to
simply sum up the left-subtree sizes as we go down the tree, and also sum the nodes on the path whenever
we make a turn to the right. This sum can be stored in a variable, rank.
In the example above, the final rank of node x is 19. The pseudocode for the Rank algorithm is shown
below. The algorithm takes as input a pointer t to the root node, and x is a pointer to particular node in
the tree, and assumes that x is in the tree. The result is that it returns the rank of element x.
2
RANK(t,x)
rank = 0
while (t 6= x)
if (t.key < x.key) if t.left 6= NIL rank += t.left.size rank++ t = t.right else t = t.left if t.left != NIL rank += t.left.size rank++ Return rank Runtime: This algorithm searches down the path from the root to x and calculates the rank as it goes. The runtime is O(h). If the BST is implemented as a self-balancing tree, such as a Red-black tree or AVL tree, then the height of the tree is O(log n) and therefore Rank(x) runs in time O(log n). 3 Selecting an element by rank In our previous lectures, we studied the Select algorithm that returns the element with rank k in a set of n elements. That algorithm solved this problem for a static set of elements, in time O(n). In this section, we look at how to provide a dynamic solution to finding the element of rank k. The word dynamic refers to the fact that our set S in changing every time there is an insert or delete, and the select algorithm should operate on this changing set S. We refer to this new Select algorithm that runs on a binary search tree as BST-Select, which is given a rank k and must return a pointer to the node in BST with that rank. Assume that we are given a binary search tree, and we initialize a pointer x to be the root of the tree. The algorithm works by searching down a path of the tree using the variable x, until we ”land” on the element with rank k. How do we decide if we should search to the left or the right for our element of rank k? Or indeed if we have already found the correct element? We examine the following three scenarios: • Case 1: Let L be the number of items in the left subtree, plus one. If it happens that L = k, then in fact we have found the node x of rank k. An example is shown below: • Case 2: If k < L , then the item we are searching for must actually be in the left subtree of x. In this case our search should continue to the left. The variable x will now move to the left child (x = x.lef t) and we repeat our process on the left subtree. An example is shown below: 3 • Case 3: Finally, if k > L, then we continue the search to the right by letting x = x.right. We must
also update k = k − L to compensate for the L items that are smaller than those in the right subtree.
An example is shown below:
A complete example of a search for rank k = 20 is shown below:
The algorithm can be implement both recursively and iteratively. The pseudocode for the Rank algorithm below provides an iterative solution, performing a while loop that traverses the tree until it finds
the item with the appropriate rank (Case 1) in which case the algorithm terminates and returns x. The
initial parameter x points to the root of the tree. The algorithm below assumes that an element of rank k
exists in the tree.
4
BST-Select(k,x)
L=1
if (x.left 6= NIL)
L += x.left.size
while (L 6= k)
if (k < L ) x = x.left else x = x.right k=k-L L=1 if (x.left 6= NIL) L += x.left.size Return x Runtime: BST-Select will eventually halt when it reaches the item of rank k. The runtime of each step through the above loop is constant, thus the runtime is O(h). For a Red-black tree or AVL tree, this has runtime O(log n). 4 Ranges Our next operation is called Range(a,b), and it returns the total number of elements that are in the range a ≤ x ≤ b. Note that the endpoints of the range do not have to be in the tree. For example, the tree on the left below has exactly 6 elements in the range 3 ≤ x ≤ 8, and the tree on the right also has 6 elements in the range 4 ≤ x ≤ 10, even though the element 4 is not in the tree. The algorithm for finding the number of elements in this range is very similar to the Rank algorithm from above, although now the process is repeated twice. We describe the tree main steps of the Range algorithm: • Step 1: Start at the root of the BST and perform a search down the tree until we find the first element that is in the range a ≤ x ≤ b. Call this node v: • Step 2: Starting at node v we perform a tree search for key a. Similar to the Rank algorithm, we keep a running total in the variable left-total that represents the number of items in the left subtree that are greater or equal to a. This left-total is found by summing the right subtree sizes on the path to a, and counting the elements on the path the a that are greater than a. The search terminates when we find a key with value a, OR we arrive at a leaf. 5 • Step 3:. Starting at node v, we perform a tree search for key b. We keep a running total in the variable right-total that represents the number of items in the right subtree that are less than or equal to b. This right-total is found by summing the left subtree sizes on the path to b, and counting the number of elements on the path to b that are smaller than b. The search terminates when we find a key with value b, OR we arrive at a leaf. When all steps are complete, the final value to be returned by the Range algorithm is: left-total + right-total + 1 Runtime: Each search in a balanced binary search tree runs in time O(h). The calculation of the totals only ads a constant step per node on the search path. Therefore the total runtime remains O(h). As usual, if the tree is a Red-Black tree or AVL tree, the runtime is O(log n). 5 Inserts and Deletes: Augmenting the binary search tree with subtree sizes enables us carry out new operations in time O(h). However, we must verify if maintaining this information in the tree is feasible. What happens if we carry out an insert/delete operation? The subtree sizes are changed, and it is important that we can quickly update this information in our new augmented tree. The goal is to make sure that any updates that we must do to the augmented information do not substantially change the runtime of the Insert and Delete in a BST. 5.1 Inserts: Let’s first think about what happens when we insert a new node into the tree. All of the subtrees on the path from the root to x will get larger by one. Therefore as we search down the path to x, we can simply increase each subtree size on the path by 1: 6 Runtime: This update takes only a constant amount of time per node on the path to x. Therefore the insert operation remains O(h) in the augmented BST. 5.2 Deletes: Recall from our section on BSTs that there are three cases for the deletion of node z in a binary search tree. The first case is when the deleted node has no children, and the second case is when the deleted node has one child. In both of these cases, after the deletion, we must decrease the subtree sizes for all nodes on the path from the root to z. The diagram below shows an example of the subtree sizes that must be updated after a node is deleted with either no children (left) or one child (right). If the node had two children, then the deletion is made using the successor node. In this case, we decrease the subtree sizes for all nodes on the path from the successor to the root. 7 Runtime: This update takes only a constant amount of time per node on the path to z. Therefore the delete operation remains O(h) in the augmented BST. 5.3 Rotations: Unfortunately, we may also make structural changes during an insert/delete operation. Both red-black trees and AVL trees rely on rotations to keep the tree balanced, and rotations change the subtree sizes. Therefore we must also be able quickly update the subtree sizes anytime we make a rotation. We show here that updating the subtree sizes during a rotation is also a constant-time operation. Suppose we carry out a right rotation as shown below. Assume that the three subtrees have sizes x, y and z. Notice that after the right rotation, the subtree sizes of the triangles below do not change. However the subtree sizes of A and B do change. We can simply update A.size to 2 + x + y + z and B.size to 1 + y + z. Runtime: Updating subtree sizes during a rotation is constant, and therefore the overall runtime of the insert/delete operations in a balanced BST remains O(log n). 6 Summary We have shown that by storing the subtree size in the node of a binary search tree, that we can carry out BST-Select(k) and Rank(x) in O(h) time. Furthermore, updating the subtree size information during an insert and delete can also be done in O(h) time. Finally, if the tree requires rotations (such as Red-black or AVL) during an insert or delete, we can update the subtree sizes accordingly in constant-time per rotation. Therefore inserts and deletes in red-black or AVL trees still run in time O(log n). 8 Binary Search Trees In this lecture we define and analyze one of the most well-known structures in computer science: the binary search tree. This famous structure allows us to maintain a set of n elements in such a way that they are esssentially sorted: to output the sorted elements takes only O(n) time. Furthermore, we can carry out updates and queries to this set fairly quickly. For example, we may want to insert a new element, or quickly find a maximum, or delete an element. The operations on the binary search tree are dynamic, meaning that each operation does not change the underlying definition of the BST. We shall see in this lecture how quickly these operations can be done in the best, worst, and expected case. In the following lecture we look at some variations of the binary search tree that have a better guaranteed worst-case time. 1 The Binary Search tree We defined a Binary tree in our section on heaps. As a quick recall, the binary tree is simply a tree where each node in the tree has zero, one, or two children. A binary search tree is a binary tree with the following properties: • The tree has a specific node called the root, which is the only node that has no parent node. • The tree is ordered, meaning that the children are labelled as either the left child or the right child. • The nodes of the tree store a key • The keys in the left subtree of a node are all less than the node’s key. • The keys in the right subtree of a node are all larger than the node’s key In the example above, the nodes in light green are all smaller than the root. The nodes in light purple are all larger than the root. The node with key 8 is the left child of the root. The node with key 25 is the right child of the root. Note that this property is true for every other node in the tree. A binary search tree can be used for any type of total order, which is simply a set for which we can compare any two elements. For example, natural numbers could be used as the keys of a binary search tree since we can compare the size of any two numbers with ≤. We could also create a binary search tree using a set of english words, where words are compared based on their alphabetical order. 1 1.1 Implementation Typically binary search trees are implemented in such a way that each node of the tree is an object. The object may have several attributes, however those that are relevant to the structure of the BST are: • node.key: the key that is used the in binary search tree property • node.lef t: a pointer to the left child • node.right: a pointer to the right child • Optional: node.parent: a pointer to the parent Most algorithms that we see here could be implemented without the use of a parent pointer. The decision to use one or not use one simply depends on the particular implementation. 1.2 How do we output the elements in sorted order? One of the most powerful aspects of a binary search tree is that it maintains the data in such a way that sorting the keys takes time Θ(n). The algorithm that carries this out is called a inorder traversal, which is a recursive algorithm that calls itself on the left subtree, then prints the key of the current node, and then calls itself on the right subtree. The algorithm below takes as input the root node x of a BST and outputs the keys in sorted order. INORDER(x) If x 6= N IL INORDER(x.lef t) print(x.key) INORDER(x.right) Given a single node x with no children, INORDER(x) prints the key value in x. Both the left and right children of x are NIL and thus the recursive calls to the children of x will simply terminate. Given a node y with two children, INORDER(y) makes a call to INORDER(y.lef t) which prints the key 2, then prints y.key, and then makes a call to INORDER(y.right) which prints the key 5. Given a node z with left and right subtrees, INORDER(z) makes a call to INORDER(z.lef t) which will print all the keys in the left subtree in increasing order, then print z.key, and then will call INORDER(z.right) which prints all the keys in the right subtree in increasing order. 2 Time Complexity: Θ(n) The inorder traversal is a very simple recursive algorithm, and we can use our techniques from the section on recursion to determine its runtime. If the node x is NIL, then the algorithm returns in constant time. However, if the node x is the root of a tree of size n, then the runtime consists of the time it takes to complete on the left and right subtrees, and also the constant time it takes to print the current node: T (n) = T (|x.lef t|) + T (|x.right|) + c and we can easily show by induction that this is Θ(n). This exercise is included in the problem set. 2 Searching a Binary Search Tree To search for a particular key k in the binary search tree, we exploit the ordered nature of the children and simply trace along the path that would lead to our particular key. For each node x it encounters on the path, it compares x.key to k. If k = x.key, then we have found the correct node, and we return x. If k is smaller than x.key, the search continues to the left. Otherwise the search continues to the right. An example is shown below: The algorithm itself could be written recursively or iteratively (with a simple while loop). In both cases, the search traces down a path of the tree. Recall that the height of the tree is the longest path in the tree, thus the runtime of BST-search is O(h), where h is the height of the tree. We shall see later what this height is, in the best, worst, and average case. In the pseudo-code below, the searching algorithm is implemented iteratively. TREE-SEARCH(x,k) takes as input a pointer x to the root of the tree, and a target key k. 3 TREE-SEARCH(x,k) While x 6= N IL and k 6= x.key if k < x.key x = x.left else x = x.right return x Runtime: O(h) where h is the height of the tree. 2.1 Maximums and Minimums: Unlike heaps, the binary search tree holds the maximum (and minimum) in a unique spot in the tree. • The maximum is the right-most node • The minimum is the left-most node In the example above, the right-most node is a 17 and the left-most node is a 2. In order to find the right-most node for example, we would simply search along the path from the root following only the right child until there are no more right children. The last node we encounter must be the maximum. The same idea holds for finding the minimum. Again, since this procedure follows along a path in the tree, then it runs in worst-case time O(h) where h is the height of the tree. We see the pseudo-code of the Max/Min finding algorithm in the practice problems for this week. 3 Inserts and Deletes in a BST When a new key is to be inserted in the binary search tree, we must chose a spot for the key that maintains the binary search tree property. We shall show in this section that the insert operation is rather easy, and the delete operation requires only a few adjustments of the tree structure. 3.1 Insert Suppose we wish to insert a new node z into the tree T . Assume that z holds a key in z.key. If the tree is empty, then we simply insert z at the root and we are done, and return z as the new root. However, if T is not empty, then we must search for the insert position of z. This is much like a regular tree search, using the key value z.key, except that we use the addition of a trailing pointer, which is a pointer that follows the path to the insertion position, but remains at the position of the parent of x. The diagram below illustrates this search down the tree until it terminates when x is NIL. The pointer x is used for the search, and the trailing pointer is y. 4 Once the search terminates (when x is NIL), the new node z is inserted as the child of y. It is inserted as a right child if it larger than y and otherwise as a left child: As with the TREE-SEARCH algorithm, the worst-case runtime for an insert is O(h), and therefore it depends on the height of the tree. The pseudo-code below takes as input a pointer, T to the root of the tree. If the pointer T is null, the tree is empty. Therefore the algorithm returns z as the new root. Otherwise, the algorithm searches for the position of insert, and returns the original pointer T which now points to a tree containing the new node z. TREE-INSERT(T,z) if T = N IL return z else x = T While x 6= N IL y=x if z.key < x.key x = x.left else x = x.right z.parent = y if z.key < y.key y.lef t = z else y.right = z return T Runtime: O(h) where h is the height of the tree. 3.2 Delete The deletion of a node x in the tree is slightly more delicate. We can’t simply chop x out of the tree. Instead, the tree may need to be restructured in order to maintain the binary search tree property. There are three distinct cases that arise, the first two being quite simple to handle: • The node x is actually a leaf. In this case, we simply remove it from the BST. • If the node x has only one child, then we simply shift up the entire subtree belonging to that child and have it take the place of x: 5 • If x has two children, then we find the successor of x which is in the right subtree of x. The successor can be found easily be searching for the left-most node in the right subtree. The successor is removed and is used to replace x in the tree. If the successor has a right subtree, it is shifted up to take its place. Again, this delete procedure makes at most one traversal down the tree, and thus runs in time O(h). The exact pseudo-code for the deletion procedure is studied in our practice problems for the week. 4 The height of a BST Most of the operations we have looked at so far depend on the height of the binary search tree. We saw in our section on heaps that complete binary trees have height Θ(log n). Since the heaps we studied in that section were complete binary trees, we had a definite bound on the height of the heap, and could use that in our analysis of how long the heap operations took. Binary search trees on the other hand are not necessarily complete trees. Furthermore, the operations we did on heaps did not change the height. For binary search trees unfortunately, each time we carry out an insert or a delete, we alter the structure and possibly the height of the tree. We begin by looking at how we could initially build a tree and what the resulting height would be. If we build a binary search tree by inserting elements one at a time into a tree, we could end up with a BST which is extremely unbalanced. For example, suppose we inserted the keys 1, 2, 3, 4 one at a time in increasing order. The resulting tree would be: 6 Thus it is possible to insert keys in such a way that the tree height is O(n). In this case, our operations above (Insert, Delete, etc) would run in time O(n). However, if the keys were inserted in random order, then inserting the keys in increasing (or decreasing) order would be rather unlikely. It turns out that indeed, by inserting the keys into the tree in random order we have the following famous result: Random BST height: A binary search tree built by inserting n distinct keys in random order has expected height O(log n). The formal proof of this fact is found in CLRS 12.4. We do not show it here. There are some interesting “’light” proofs that you may find here and there, many of them not technically correct, but they often give you the basic idea. However the actual proof regards levels of probability that are too involved. 4.1 A Random BST and Quicksort In this last section, we study the runtime of building a random binary search tree. Instead of developing the runtime from scratch, we use an interesting trick. We will establish the following fact: Given: An array of size n. Fact: The number of comparisons made by Randomzied Quicksort is the same as the number of comparisons made when building the Random BST, when the random pivots selected by Quicksort corresponds to the random nodes inserted in the BST. We establish the above fact using the example below. In this example, we compare Quicksort and the construction of a binary search tree. Both algorithms will run on the same input: 3, 6, 5, 2, 7, 8, 1, and the random pivot selected at each iteration by Quicksort will be the same random element inserted next into the BST. We notice something very interesting: that the number of comparisons made by Quicksort is identical to the number of comparisons made during the construction of the BST. In fact, each time two elements are compared during the PARTITION step of Quicksort, this corresponds exactly to the same comparison made during the BST construction. On the left of the figure we see the steps of Quicksort. Each time a pivot is selected, we note the number of comparisons made during the partitioning. On the right of the figure, we construct a BST by inserting the elements in the order determined by the random pivots. Notice that this results in the exact same number comparisons being made. 7 Quicksort Binary Search Tree (Keys inserted in the order 3,1,6,2,5,8,7) 2 1 3 6 5 7 8 3 Comparisons with 2, 1, 6, 5, 7, 8 6 comparisons 1 2 5 6 7 8 Comparison with 2 1 6 Comparison with 5, 8, 7 1 comparison 3 comparisons 2 5 7 2 8 5 8 Comparison with 7 1 comparison 7 7 Total comparisons: 11 Total comparisons: 11 Therefore if Quicksort runs in expected time O(n log n) when the pivots are randomly selected, then the construction of a binary search tree by randomly selected the input order is also O(n log n). Expected time to build a random binary search tree: Given a set of n distinct keys, if we insert them into an initially empty binary search tree in random order, the total worst-case runtime of building the tree is O(n log n). 5 Conclusion: Thanks to the above result, we know we can build a random BST in expected time O(n log n). The resulting tree is balanced : its height is expected to be O(log n). This may seem like wonderful news, because we have constructed a BST that has a very efficient height, making our O(h) operations run in time O(log n). However, the tree height changes as we perform insertions and deletions. Indeed, a randomly build BST may start out with height O(log n), but as we insert and delete, we may eventually end up back at the worst-case scenario of a tree of height O(n). In the next section, we look at examples of balanced trees, those for which we can maintain a height of O(log n) even as we perform our dynamic operations. 8 Question 2: Tree traversals (a) 4 points Write the pseudo-code for an algorithm that prints the keys of a BST in decreasing order. Your algorithm must be recursive. You cannot simply use Inorder traversal and then afterwards reverse the output. Explain why the runtime is O(n). (b) 6 points Let T be the root node of a BST. Write the pseudo-code for an algorithm called Full- Complete(T) that returns -1 if the tree T is NOT a full and complete tree. If T is a full and complete tree, the algorithm should return the height of the tree. Justify the runtime of O(n). (*Reminder that you can reference and use any algorithms from class and practice sets). (c) 6 points Consider the following algorithm that takes as input the root node T of a binary search tree and and an integer i. Run the algorithm on the binary tree shown on the right, with i = 5. Justify why the runtime of the algorithm is O(22). 1 PrintKeys(T, i) if T = nil return 32 52 74 42 89 15 35 45 53 68 77 if i = 0 / \ \ / / \ print T.key 5 11 29 34 38 44 49 59 67 71 75 84 else 26 25 31 36 41 /\ 55 61 / \ 80 87 Print Keys (T.left, i-1) / / /\ Print Keys(T.right,i-1) 21 39 62 79 82 23 (d) 4 points Using the above algorithm, write the pseudo-code for an algorithm called PrintTree(T) that takes as input a reference to the root node of a BST and prints the keys out in the order shown in the figure below. If the input tree T is full and complete, what is the runtime of your algorithm in big-oh notation? 13 32 42 19 35 / 5 12 29 34 39 32 13 42 9 19 35 5 12 29 34 39

Turn in your highest-quality paper
Get a qualified writer to help you with

“ tree traversals question ”

Get high-quality paper

Guarantee! All work is written by expert writers!

Still stressed from student homework?

Get quality assistance from academic writers!

Order now