HEIGHT-BALANCED TREES

 

 

                In order to prevent a tree unbalance, with height-balanced trees we associate a balance indicator with each node in the tree. This indicator will contain one of the three values which we denote as left(L), right(R), or balance(B), according to the following defenitions.

Left: A node will be called left heavy if the longest path in its left subtree is one longer than the longest path of its right subtree.

Balance: A node will be called balanced if the longest paths in both of its subtrees are equal.

Right: A  node will be called right heavy if the longest path in its right subtree is one longer than the longest path of its left subtree.

               In a balanced tree each node must be in one of these three states. If there exists a node in a tree where this is not true, then such a tree is said to be unbalanced.

                                      6                                               5

                                    /   \                                               \  

                                  5     9                                              6

                                 /      /  \                                               \

                                4     8  11                                              7

                                       

                            Balanced tree                                 Unbalanced tree


                  A new node is inserted at the leaf or terminal node level. The only nodes which can have their balance indicator changed by such an insertion are those which lie on a path between the root of the tree and the newly inserted leaf. The possible changes which can occur to a node on this path are as follows:

         1. The node was either left or right heavy and has now become balanced.

         2. The node was balanced and has now become left or right heavy.

         3. The node was heavy and the new node has been inserted in the heavy subtree, thus creating an unbalanced subtree. Such a node is said to be a critical node.

                    If condition 1 applies to a current node, then the balance indicators of all ancestor nodes of this node remain unchanged, since the longest path in the subtree remains unchanged. When condition 2 applies to a current node, then the balance indicators of the ancestors will change. If condition 3 applies to a current node, then the tree has become unbalanced and this node has become critical.

                   In the case of rebalancing a tree when a critical node has been encountered, there are 2 broad cases which can arise, each of which can be further subdivided into 2 essentially similar subcases. A general representation of case 1 is given in the following figure, where the rectangles labeled T1, T2, and T3 represent trees and the node labeled NEW denotes the node being inserted. The expression at the bottom of each rectangle denotes the maximum path length in that tree after insertions. For ex. in figure (a) since node X is critical, the node Y must have been balanced prior to insertion. This case covers the situation when Y has become heavy in the same direction that X was heavy. A concrete ex. of the second possibility for case 1 is exemplified in figure (c). The PATH and DIRECTION vectors are defined in the next algorithm. The basic step involve the changing of 3 pointers.

               x                                                      y

             /    \                                                 /    \

           y      T3                                          T1     x 

         /   \                                                    |       /  \

      T1    T2                                             new  T2  T3

        |

     new

           (i)                                                        (ii)

                                     (a)

 

             x                                                        y

           /    \                                                    /   \ 

         T1    y                                                x     T3

                /   \                                             /  \       |

             T2   T3                                      T1   T2  new

                      |

                    new

               (i)                                                     (ii)

                                       (b)

           

                         7                                                      7

                        /  \                                                   /   \

                      5     9                                               5     11

                    /  \    /   \                                           /  \    /   \

                  4    6 8   11                                        4   6  9    12

                               /   \                                              /  \      \

                             10    12                                        8   10    13

                                       \

                                       13

               Before balancing                               After balancing

                                                   (c)

 

                The second case, which is given in figure (d), Y becomes heavy in an opposite direction to that in which X was heavy. It is clear that node Z must have been balanced prior to insertion. A specific ex. of case 2b is given in figure (e). Again PATH and DIRECTION refer to vectors that are associated with the next algorithm.

                       X                                                  Z

                     /    \                                              /     \

                   Y     T4                                       Y         X

                  /   \                                             /    \      /    \

                T1   Z                                       T1   T2  T3   T4

                      /   \                                              |

                    T2   T3                                       new

                     |

                   new                   (i)

 

 

                       X                                                 Z                                       

                     /    \                                             /     \

                  T1     Y                                        X      Y

                          /   \                                      /  \     /   \

                        Z    T4                               T1  T2 T3  T4

                      /   \                                              |

                  T2    T3                                        new 

                    |

                  new

                                             (ii)

                     

                                (d) Case 2 for rebalancing a tree

              

                   7                                                        7

                 /   \                                                     /    \

               5      9                                                 5     11

             /   \    /  \                                              /  \    /   \

           4    6  8   12                                         4    6 9    12

                         /   \                                                / \       \

                       11   13                                            8  10    13

                       /

                     10

          Before balancing                                 After balancing   

                                    (e)Example of case 2    

  

                    The node structure of a tree will consist of a left pointer (LPTR), a right pointer (RPTR), a key field (K), a balance indicator (BI), and an information field (DATA). The name of the node structure is NODE. A list head for the tree is assumed with its left pointer containing the address of the root of the actual tree. A general algorithm for inserting a node into a height balanced tree is as follows

        1. If this is the first insertion

            then allocate a node, set its fields and exit

        2. If the name is not already in the tree

            then attach the new node to the existing tree

            else write item already present and exit

        3. Search for an unbalanced node

       4. Adjust the balance indicators, if there is no critical node, exit

        5. If the node was balanced and then becomes heavy or

                the node was heavy and becomes balanced

                then adjust the balance indicators and exit

        6. Rebalance the tree and exit.

           The detailed algorithm uses the following procedure which creates a new leaf node.  

Procedure CREAT_LEAF(NAME, INFO, NEW). Given the key of a new node, NAME, and its associated information, INFO, this procedure creates a new leaf node and initializes its contents. NEW denotes the address of this new leaf.

1. Create a new leaf node and initialize

          NEW=NODE

          LPTR(NEW)=RPTR(NEW)=NULL

          BI(NEW)='B'

          K(NEW)=NAME

          DATA(NEW)=INFO

2. Finished

          Return

         A formulation of the algorithm follows.

Function BALANCED_INSERT(HEAD, NAME, INFO). Given a linked representation of a balanced binary tree with a list head HEAD whose structure has just been described, and the variables NAME and INFO that contain the key value and information contents of the new element being inserted, this algorithm inserts the new element into the tree in such a manner as to maintain the balance property. The function returns NEW, the address of the new node created. The vector PATH is used to store the address of the nodes between the list head and the point in the tree where the insertion is made. The corresponding vector DIRECTION is used to store the direction of each branch in this path. The values 'L' and 'R' are use to denote a left and right branch, respectively. The variable MARK denotes the index of a vector element in PATH that contains the address of the critical node (X). F points to the parent of the critical node before rebalancing takes place. The variables    X, Y  and Z are pointer variables whose functions have been previously described. LEVEL is and index variable. T is a temporary pointer used in traversing the tree from the root to the node being inserted.

1. Is this a first insertion?                  

      if  LPTR(HEAD)=HEAD

      then call CREATE_LEAF(NAME, INFO, NEW)

              LPTR(HEAD)=NEW

              return(NEW)

2. Initialize

        LEVEL=0

        DIRECTION[LEVEL]='L'

       PATH[LEVEL]=HEAD

       T=LPTR(HEAD)

3. Continue until found or inserted

       repeat step 4 forever (infinite loop)

4. Compare and insert, if required

           if NAME<K(T)

          then if LPTR(T)/=NULL

                 then LEVEL=LEVEL+1

                         PATH[LEVEL]=T

                         DIRECTION[LEVEL]='L'

                        T=LPTR(T)

                 else call CREARE_ELEMENT(NAME, INFO, NEW)

                        LPTR(T)=NEW

                        LEVEL=LEVEL+1

                        PATH[LEVEL]=T

                        DIRECTION[ LEVEL]='L'

                       exitloop

        else if NAME>K(T)

               then if RPTR/=NULL

                       then LEVEL=LEVEL+1

                              PATH[LEVEL]=T

                              DIRECTION[ LEVEL]='R'

                              T=RPTR(T)

                      else call CREARE_ELEMENT(NAME, INFO, NEW)

                             RPTR(T)=NEW

                            LEVEL=LEVEL+1

                            PATH[LEVEL]=T

                           DIRECTION[ LEVEL]='R'

                           exitloop

                     else (A match; NAME=K(T)

                           write ('ITEM ALREADY THERE')

                           return(NULL)

5. Search for an unbalanced node

         MARK=0

          Repeat for I= LEVEL, LEVEL-1....,1

             P=PATH[I]

             if BI(P)/='B'

             then MARK=1

            exitloop

6. Adjust balance indicators

         repeat for I= MARK+1, MARK+2,.......,LEVEL

               if NAME<K(PATH[I])

               then BI(PATH[I])='L'

                else BI(PATH[I])='R'

7. Is there a critical node?

          if MARK=0

          then return(NEW)

                   D=DIRECTION[MARK]

                  X=PATH[MARK]

                 Y=PATH[MARK+1]

         if  BI(X)/=D

        then (The node was heavy and now becomes balanced)

            BI(X)='B'

       return (NEW)

8. Rebalancing: case 1

        if BI(Y)=D

        then (The node was heavy and now becomes critical)

                 if D='L'

                 then LPTR(X)=RPTR(Y)

                        RPTR(Y)=X

                 else RPTR(X)=LPTR(T)

                        LPTR(Y)=X

                        BI(X)=BI(Y)='B'

                        F=PATH[MARK-1]

                       if X=LPTR(F)

                 then LPTR(F)=Y

                else RPTR(F)=Y

                 return (NEW)

9. Rebalancing tree: case 2

     (a) Change structure links

              if D='L'

              then Z=RPTR(Y)

                      RPTR(Y)=LPTR(Z)

                      LPTR(Z)=Y

                      LPTR(X)=RPTR(Z)

                      RPTR(Z)=X

              else Z=LPTR(Y)

                     LPTR(Y)=RPTR(Z)

                     RPTR(Z)=Y

                     RPTR(X)=LPTR(Z)

                    LPTR(Z)=X

                    F=PATH[MARK-1]

             if X=LPTR(F)

            then LPTR(F)=Z

           else RPTR(F)=Z

(b) Change balance indicators

    if BI(Z)=D

    then BI(Y)=BI(Z)='B'

             if D='L'

             then BI(X)='R'

             else BI(X)='L'

     else if BI(Z)='B'

            then if BI(X)=BI(Y)=BI(Z)='B'

            else BI(X)=BI(Z)='B'

                   BI(Y)=D

     return (NEW)

                  Step 3 attaches the new node to the existing tree, if it is not there already, and stores into vectors PATH and DIRECTION the address of the nodes on the path between the list head and the leaf being inserted, and it stores the direction of the path at each node, respectively. Step 5 searches for an unbalanced node which is closest to the new node just inserted. In step 6 the balance indicators of the nodes between the unbalanced node found in the previous step and the new node are adjusted. Step 7 determines whether there is a critical node. If there is, control proceeds either to step 8 or step 9. When no critical node is found, the balance indicator of the unbalanced node found in step 5 is adjusted. The last 2 steps of the algorithm correspond to case 1 and case 2 in the previous discussion, and rebalancing of the tree is performed in each case.

                  It can be shown that the maximum path length m in a height balanced tree of n nodes is 1.5 log(n+1) to the base 2. The worst ALOS for performing an insertion with any necessary rebalancing is of O(log n).

          General algorithm for deleting a node is as follows:

1. Search for the node marked for deletion.

2. If the indicated node has 2 children

    then find the inorder successor of this node

           delete the inorder successor

            replace the node marked for deletion by its inorder successor

    else delete the marked node and replace it by its child (if any).

3. Rebalance the tree along the ancestor path.

The perfomance of this algorithm can be shown to be O(log n).