HEIGHT-BALANCED TREES
In order to prevent a tree
unbalance, with height-balanced trees we associate a balance indicator with
each node in the tree. This indicator will contain one of the three values
which we denote as left(L), right(R), or balance(B),
according to the following defenitions.
Left: A node will be called
left heavy if the longest path in its left subtree is
one longer than the longest path of its right subtree.
Balance: A node will be
called balanced if the longest paths in both of its subtrees
are equal.
Right: A node will be called right heavy if the
longest path in its right subtree is one longer than
the longest path of its left subtree.
In a balanced tree each node
must be in one of these three states. If there exists
a node in a tree where this is not true, then such a tree is said to be
unbalanced.
6
5
/ \ \
5 9 6
/ /
\
\
4 8 11
7
Balanced tree Unbalanced
tree
A new node is inserted
at the leaf or terminal node level. The only nodes which can have their balance
indicator changed by such an insertion are those which lie on a path between
the root of the tree and the newly inserted leaf. The possible changes which
can occur to a node on this path are as follows:
1. The node was either left or right
heavy and has now become balanced.
2. The node was balanced and has now
become left or right heavy.
3. The node was heavy and the new node
has been inserted in the heavy subtree, thus creating
an unbalanced subtree. Such a node is said to be a
critical node.
If condition 1 applies to a
current node, then the balance indicators of all ancestor nodes of this node
remain unchanged, since the longest path in the subtree
remains unchanged. When condition 2 applies to a current node, then the balance
indicators of the ancestors will change. If condition 3 applies to a current
node, then the tree has become unbalanced and this node has become critical.
In the case of rebalancing a
tree when a critical node has been encountered, there are 2 broad cases which
can arise, each of which can be further subdivided into 2 essentially similar subcases. A general representation of case 1 is given in
the following figure, where the rectangles labeled T1, T2, and T3 represent
trees and the node labeled NEW denotes the node being inserted. The expression
at the bottom of each rectangle denotes the maximum path length in that tree
after insertions. For ex. in figure (a) since node X is critical, the node Y
must have been balanced prior to insertion. This case covers the situation when
Y has become heavy in the same direction that X was heavy. A concrete ex. of
the second possibility for case 1 is exemplified in figure (c). The PATH and DIRECTION
vectors are defined in the next algorithm. The basic step
involve the changing of 3 pointers.
x
y
/ \
/ \
y T3
T1 x
/
\
| / \
T1
T2 new T2 T3
|
new
(i) (ii)
(a)
x
y
/
\
/ \
T1
y
x T3
/ \
/ \ |
T2 T3 T1 T2 new
|
new
(i)
(ii)
(b)
7
7
/ \
/ \
5 9
5 11
/ \
/ \
/ \ / \
4 6 8
11 4 6 9
12
/ \
/ \ \
10 12 8 10
13
\
13
Before balancing After balancing
(c)
The second case, which is given
in figure (d), Y becomes heavy in an opposite
direction to that in which X was heavy. It is clear that node Z must have been
balanced prior to insertion. A specific ex. of case 2b is given in figure (e).
Again PATH and DIRECTION refer to vectors that are associated with the next
algorithm.
X
Z
/ \
/ \
Y T4 Y X
/ \
/ \ /
\
T1 Z T1 T2 T3
T4
/ \ |
T2
T3
new
|
new (i)
X
Z
/
\
/ \
T1 Y X Y
/ \ / \
/ \
Z T4 T1 T2 T3 T4
/ \ |
T2 T3
new
|
new
(ii)
(d) Case 2 for
rebalancing a tree
7
7
/ \ / \
5 9
5 11
/
\ / \
/ \ / \
4
6 8 12 4 6 9
12
/ \
/ \ \
11 13 8 10 13
/
10
Before balancing After balancing
(e)Example
of case 2
The node structure of a
tree will consist of a left pointer (LPTR), a right pointer (RPTR), a key field
(K), a balance indicator (BI), and an information field (DATA). The name of the
node structure is NODE. A list head for the tree is assumed with its left
pointer containing the address of the root of the actual tree. A general algorithm
for inserting a node into a height balanced tree is as follows
1. If this is the first insertion
then
allocate a node, set its fields and exit
2. If the name is not already in the
tree
then
attach the new node to the existing tree
else write
item already present and exit
3. Search for an unbalanced node
4. Adjust the balance indicators, if
there is no critical node, exit
5. If the node was balanced and then
becomes heavy or
the node
was heavy and becomes balanced
then
adjust the balance indicators and exit
6. Rebalance the tree and exit.
The detailed algorithm uses the
following procedure which creates a new leaf node.
Procedure CREAT_LEAF(NAME, INFO, NEW). Given the key of a new node, NAME,
and its associated information, INFO, this procedure creates a new leaf node
and initializes its contents. NEW denotes the address of this new leaf.
1. Create a new leaf node and
initialize
NEW=NODE
LPTR(NEW)=RPTR(NEW)=NULL
BI(NEW)='B'
K(NEW)=NAME
DATA(NEW)=INFO
2. Finished
Return
A formulation of the algorithm
follows.
Function BALANCED_INSERT(HEAD, NAME, INFO). Given a linked representation of a
balanced binary tree with a list head HEAD whose
structure has just been described, and the variables NAME and INFO that contain
the key value and information contents of the new element being inserted, this
algorithm inserts the new element into the tree in such a manner as to maintain
the balance property. The function returns NEW, the address of the new node
created. The vector PATH is used to store the address of the nodes between the
list head and the point in the tree where the insertion is made. The
corresponding vector DIRECTION is used to store the direction of each branch in
this path. The values 'L' and 'R' are use to denote a left and right branch,
respectively. The variable MARK denotes the index of a vector element in PATH
that contains the address of the critical node (X). F points to the parent of
the critical node before rebalancing takes place. The variables X, Y and Z are pointer variables whose
functions have been previously described. LEVEL is and index variable. T is a
temporary pointer used in traversing the tree from the root to the node being
inserted.
1. Is this a first
insertion?
if LPTR(HEAD)=HEAD
then call
CREATE_LEAF(NAME, INFO, NEW)
LPTR(HEAD)=NEW
return(NEW)
2. Initialize
LEVEL=0
DIRECTION[LEVEL]='L'
PATH[LEVEL]=HEAD
T=LPTR(HEAD)
3. Continue until found or
inserted
repeat step 4
forever (infinite loop)
4. Compare and insert, if
required
if NAME<K(T)
then if
LPTR(T)/=NULL
then
LEVEL=LEVEL+1
PATH[LEVEL]=T
DIRECTION[LEVEL]='L'
T=LPTR(T)
else
call CREARE_ELEMENT(NAME, INFO, NEW)
LPTR(T)=NEW
LEVEL=LEVEL+1
PATH[LEVEL]=T
DIRECTION[
LEVEL]='L'
exitloop
else if
NAME>K(T)
then if
RPTR/=NULL
then
LEVEL=LEVEL+1
PATH[LEVEL]=T
DIRECTION[ LEVEL]='R'
T=RPTR(T)
else
call CREARE_ELEMENT(NAME, INFO, NEW)
RPTR(T)=NEW
LEVEL=LEVEL+1
PATH[LEVEL]=T
DIRECTION[
LEVEL]='R'
exitloop
else
(A match; NAME=K(T)
write ('ITEM
ALREADY THERE')
return(NULL)
5. Search for an unbalanced
node
MARK=0
Repeat for I= LEVEL, LEVEL-1....,1
P=PATH[I]
if
BI(P)/='B'
then
MARK=1
exitloop
6.
Adjust balance indicators
repeat for I=
MARK+1, MARK+2,.......,LEVEL
if
NAME<K(PATH[I])
then
BI(PATH[I])='L'
else
BI(PATH[I])='R'
7.
Is there a critical node?
if MARK=0
then
return(NEW)
D=DIRECTION[MARK]
X=PATH[MARK]
Y=PATH[MARK+1]
if BI(X)/=D
then (The node
was heavy and now becomes balanced)
BI(X)='B'
return (NEW)
8.
Rebalancing: case 1
if BI(Y)=D
then (The node
was heavy and now becomes critical)
if
D='L'
then
LPTR(X)=RPTR(Y)
RPTR(Y)=X
else
RPTR(X)=LPTR(T)
LPTR(Y)=X
BI(X)=BI(Y)='B'
F=PATH[MARK-1]
if
X=LPTR(F)
then
LPTR(F)=Y
else
RPTR(F)=Y
return
(NEW)
9.
Rebalancing tree: case 2
(a) Change structure links
if D='L'
then
Z=RPTR(Y)
RPTR(Y)=LPTR(Z)
LPTR(Z)=Y
LPTR(X)=RPTR(Z)
RPTR(Z)=X
else
Z=LPTR(Y)
LPTR(Y)=RPTR(Z)
RPTR(Z)=Y
RPTR(X)=LPTR(Z)
LPTR(Z)=X
F=PATH[MARK-1]
if
X=LPTR(F)
then
LPTR(F)=Z
else
RPTR(F)=Z
(b)
Change balance indicators
if BI(Z)=D
then
BI(Y)=BI(Z)='B'
if D='L'
then
BI(X)='R'
else
BI(X)='L'
else if BI(Z)='B'
then if
BI(X)=BI(Y)=BI(Z)='B'
else
BI(X)=BI(Z)='B'
BI(Y)=D
return (NEW)
Step 3 attaches the new node
to the existing tree, if it is not there already, and stores into vectors PATH
and DIRECTION the address of the nodes on the path between the list head and
the leaf being inserted, and it stores the direction of the path at each node,
respectively. Step 5 searches for an unbalanced node which is closest to the
new node just inserted. In step 6 the balance indicators of the nodes between
the unbalanced node found in the previous step and the new node are adjusted.
Step 7 determines whether there is a critical node. If there is, control
proceeds either to step 8 or step 9. When no critical node is found, the
balance indicator of the unbalanced node found in step 5 is adjusted. The last
2 steps of the algorithm correspond to case 1 and case 2 in the previous
discussion, and rebalancing of the tree is performed in each case.
It can be shown that the
maximum path length m in a height balanced tree of n nodes is 1.5 log(n+1) to the base 2. The worst ALOS for performing an
insertion with any necessary rebalancing is of O(log
n).
General algorithm for deleting a node
is as follows:
1.
Search for the node marked for deletion.
2.
If the indicated node has 2 children
then find the inorder successor of this node
delete the inorder successor
replace
the node marked for deletion by its inorder successor
else delete the
marked node and replace it by its child (if any).
3. Rebalance the tree along
the ancestor path.
The perfomance
of this algorithm can be shown to be O(log n).