Software Design Using C++AVL TreesThe ConceptThese are self-adjusting, height-balanced binary search trees and are named after the inventors: Adelson-Velskii and Landis. A balanced binary search tree has Theta(lg n) height and hence Theta(lg n) worst case lookup and insertion times. However, ordinary binary search trees have a bad worst case. When sorted data is inserted, the binary search tree is very unbalanced, essentially more of a linear list, with Theta(n) height and thus Theta(n) worst case insertion and lookup times. AVL trees overcome this problem. DefinitionsThe height of a binary tree is the maximum path length from the root to a leaf. A single-node binary tree has height 0, and an empty binary tree has height -1. As another example, the following binary tree has height 3. 7 / \ 3 12 / / \ 2 10 20 / \ 9 11 An AVL tree is a binary search tree in which every node is height balanced, that is, the difference in the heights of its two subtrees is at most 1. The balance factor of a node is the height of its right subtree minus the height of its left subtree. Note that many texts define the balance factor as the height of the left subtree minus the height of the right subtree. Watch out for that! Be clear on which definition of a binary search tree you are using. An equivalent definition for our version of an AVL tree is that it is a binary search tree in which each node has a balance factor of -1, 0, or +1. Note that a balance factor of -1 means that the subtree is left-heavy, and a balance factor of +1 means that the subtree is right-heavy. For example, in the following AVL tree, note that the root node with balance factor +1 has a right subtree of height 1 more than the height of the left subtree. (The balance factors are shown at the top of each node.) +1 30 / \ -1 0 22 62 / / \ 0 +1 -1 5 44 95 \ / 0 0 51 77 The idea is that an AVL tree is close to being completely balanced. Hence it should have Theta(lg n) height (it does - always) and so have Theta(lg n) worst case insertion and lookup times. An AVL tree does not have a bad worst case, unlike a binary search tree which can become very unbalanced and give Theta(n) worst case lookup and insertion times. The following binary search tree is not an AVL tree. Notice the balance factor of -2 at node 70. -1 100 / \ -2 -1 70 150 / \ / \ +1 0 +1 0 30 80 130 180 / \ \ 0 -1 0 10 40 140 / 0 36 Inserting a New ItemInitially, a new item is inserted just as in a binary search tree. Note that the item always goes into a new leaf. The tree is then readjusted as needed in order to maintain it as an AVL tree. There are three main cases to consider when inserting a new node. Case 1:A node with balance factor 0 may change to +1 or -1 when a new node is inserted below it. No change is needed at this node. Consider the following example. Note that after an insertion one only needs to check the balances along the path from the new leaf to the root. 0 40 / \ +1 0 20 50 \ / \ 0 0 0 30 45 70 After inserting 60 we get: +1 40 / \ +1 +1 20 50 \ / \ 0 0 -1 30 45 70 / 0 60 Case 2:A node with balance factor -1 changes to 0 when a new node is inserted in its right subtree. (Similarly for +1 changing to 0 when inserting in the left subtree.) No change is needed at this node. Consider the following example. -1 40 / \ +1 0 20 50 / \ / \ 0 0 0 0 10 30 45 70 / \ 0 0 22 32 After inserting 60 we get: 0 <-- the -1 changed to a 0 (case 2) 40 / \ +1 +1 <-- an example of case 1 20 50 / \ / \ 0 0 0 -1 <-- an example of case 1 10 30 45 70 / \ / 0 0 0 22 32 60 Case 3:A node with balance factor -1 changes to -2 when a new node is inserted in its left subtree. (Similarly for +1 changing to +2 when inserting in the right subtree.) Change is needed at this node. The tree is restored to an AVL tree by using a rotation. Subcase A:This consists of the following situation, where P denotes the parent of the subtree being examined, LC is P's left child, and X is the new node added. Note that inserting X makes P have a balance factor of -2 and LC have a balance factor of -1. The -2 must be fixed. This is accomplished by doing a right rotation at P. Note that rotations do not mess up the order of the nodes given in an inorder traversal. This is very important since it means that we still have a legitimate binary search tree. (Note, too, that the mirror image situation is also included under subcase A.) (rest of tree) | -2 P / \ -1 sub LC tree of / \ height n sub sub tree tree of of height height n n / X The fix is to use a single right rotation at node P. (In the mirror image case a single left rotation is used at P.) This gives the following picture. (rest of tree) | 0 LC / \ sub P tree of / \ height n sub sub / tree tree X of of height height n n Consider the following more detailed example that illustrates subcase A. -1 80 / \ -1 -1 30 100 / \ / 0 0 0 15 40 90 / \ 0 0 10 20 We then insert 5 and then check the balance factors from the new leaf up toward the root. (Always check from the bottom up.) -2 80 / \ -2 -1 30 100 / \ / -1 0 0 15 40 90 / \ -1 0 10 20 / 0 5 This reveals a balance factor of -2 at node 30 that must be fixed. (Since we work bottom up, we reach the -2 at 30 first. The other -2 problem will go away once we fix the problem at 30.) The fix is accomplished with a right rotation at node 30, leading to the following picture. -1 80 / \ 0 -1 15 100 / \ / -1 0 0 10 30 90 / / \ 0 0 0 5 20 40 Recall that the mirror image situation is also included under subcase A. The following is a general illustration of this situation. The fix is to use a single left rotation at P. See if you can draw a picture of the following after the left rotation at P. Then draw a picture of a particular example that fits our general picture below and fix it with a left rotation. (rest of tree) | +2 P / \ sub +1 tree RC of height / \ n sub sub tree tree of of height height n n \ X Subcase B:This consists of the following situation, where P denotes the parent of the subtree being examined, LC is P's left child, NP is the node that will be the new parent, and X is the new node added. X might be added to either of the subtrees of height n-1. Note that inserting X makes P have a balance factor of -2 and LC have a balance factor of +1. The -2 must be fixed. This is accomplished by doing a double rotation at P (explained below). (Note that the mirror image situation is also included under subcase B.) (rest of tree) | -2 P / \ +1 sub LC tree of / \ height n sub -1 tree NP of / \ height sub sub n tree tree n-1 n-1 / X The fix is to use a double right rotation at node P. A double right rotation at P consists of a single left rotation at LC followed by a single right rotation at P. (In the mirror image case a double left rotation is used at P. This consists of a single right rotation at the right child RC followed by a single left rotation at P.) In the above picture, the double rotation gives the following (where we first show the result of the left rotation at LC, then a new picture for the result of the right rotation at P). (rest of tree) | -2 P / \ -2 sub NP tree of / \ height n 0 sub LC tree / \ n-1 sub sub tree tree of n-1 height / n X Finally we have the following picture after doing the right rotation at P. (rest of tree) | 0 NP / \ 0 +1 LC P / \ / \ sub sub sub sub tree tree tree tree of n-1 n-1 of height / height n X n Consider the following concrete example of subcase B. -1 80 / \ 0 0 30 100 / \ / \ -1 0 0 0 20 50 90 120 / / \ 0 0 0 10 40 60 After inserting 55, we get a problem, a balance factor of -2 at the root node, as seen below. -2 80 / \ +1 0 30 100 / \ / \ -1 +1 0 0 20 50 90 120 / / \ 0 0 -1 10 40 60 / 0 55 As discussed above, this calls for a double rotation. First we do a single left rotation at 30. This gives the following picture. -2 80 / \ -1 0 50 100 / \ / \ -1 -1 0 0 30 60 90 120 / \ / -1 0 0 20 40 55 / 0 10 Finally, the right rotation at 80 restores the binary search tree to be an AVL tree. The resulting picture is shown below. 0 50 / \ -1 0 30 80 / \ / \ -1 0 -1 0 20 40 60 100 / / / \ 0 0 0 0 10 55 90 120 Example ProgramFind function, etc.
The class
Note that the
One messy feature of the above example is that in many places a cast
is required so that a pointer to a
This is needed because the data field
This one is needed since we are using the inherited Efficiency ConcernsHere is a summary of what Tenenbaum and Augenstein say about efficiency. See their text (noted below) for more details. The maximum height of an AVL tree is 1.44 * lg n, which is an O(lg n) function. This means that in the worst possible case, a lookup in a large AVL tree needs no more than 44% more comparisons than a lookup in a completely balanced tree. Even in the worst case, then, AVL trees are efficient; they still have O(lg n) lookup times. On average, for large n, AVL trees have lookup times of (lg n) + 0.25, which is even better than the above worst case figure, though still O(lg n). On average, a rotation (single or double) is required in 46.5% of insertions. Only one (single or double) rotation is needed to readjust an AVL tree after an insertion throws it out of balance. References
Related ItemBinary Trees Back to the main page for Software Design Using C++ |