Software Design Using C++Heaps and HeapsortSorting RevisitedOne of the typical tasks in a Data Structures course is to compare the running times of various sort algorithms. This can be done using analysis of algorithms techniques. For example, we already know that bubblesort is Theta(n^2), but quicksort is Theta(n * lg(n)), both in the average case. Another way to compare sort routines is to actually run several of them on the same set of test data. Do you see an obvious speed difference between some of the sorts? DefinitionsRecall that in a binary tree each node can have a left child node and/or a right child node. A leaf is a node with no children. An almost complete binary tree is a binary tree in which the following 3 conditions hold: all the leaves are at the bottom level or the bottom 2 levels, all the leaves are in the leftmost possible positions, and (except possibly for the bottom level) all levels are completely filled with nodes. Here are some examples of almost complete binary trees. Note that there is no particular ordering of the letters. R / \ G A / \ / T W G K / \ L C / \ / \ B T Q Z / \ D P The following example is a complete binary tree. (Of course, it is also fits the definition of almost complete. Any complete binary tree is automatically almost complete.) G / \ R T / \ / \ V E S N The following binary trees are not almost complete: W / \ T P / \ \ H K V (V should be pushed to the left) M / \ E L / \ / I R Y (this level should be completely filled) / \ S C H / \ N D / \ R V (R and V should be pushed left, under N) A / \ G J (there are leaves at 3 levels) / \ Y O / \ B Z What Is a Heap?Definition: A minimal heap (descending heap) is an almost complete binary tree in which the value at each parent node is less than or equal to the values in its child nodes. Obviously, the minimum value is in the root node. Note, too, that any path from a leaf to the root passes through the data in descending order. Here is an example of a minimal heap: C / \ H K / \ / L I M Storage of Heap DataThe typical storage method for a heap, or any almost complete binary tree, works as follows. Begin by numbering the nodes level by level from the top down, left to right. For example, consider the following heap. The numbering has been added below the nodes. C 0 / \ H K 1 2 / \ / L I M 3 4 5 Then store the data in an array as shown below:
The advantage of this method over using the usual pointers and nodes is that
there is no wasting of space due to storing two pointer fields
in each node. Instead, starting with the current index,
For example, if we start at node H (with index 1), the right child is at index 2 * (1 + 1) = 4, that is, node I. Inserting into a HeapThis is done by temporarily placing the new item at the end of the heap (array) and then calling a FilterUp routine to make any needed
adjustments on the path from this leaf to the root.
For example, let's insert E into the following heap:
C / \ H K / \ / L I M First, temporarily place E in the next available position: C / \ H K / \ / \ L I M E
Of course, the new tree might not be a heap. The C / \ H E / \ / \ L I M K For practice, let's take the above heap and insert another item, D. First, place D temporarily in the next available position: C / \ H E / \ / \ L I M K / D
Then the C / \ D E / \ / \ H I M K / L Things have now been adjusted so that we again have a heap! Removing from a HeapWe always remove the item from the root. That way we always get the smallest item. The problem is then how to adjust the binary tree so that we again have a heap (with one less item).
The algorithm works like this: First, remove the root item and replace
it temporarily with the item in the last position. Call this replacement
the target. A C / \ D E / \ / \ H I M K / L First we remove the C and replace it with the last item (the target), L: L / \ D E / \ / \ H I M K The smaller child of L is D. Since D is out of order compared to the target L, we move D up. The smaller child under where D had been is H. When H is compared to L we see that the H too needs to be moved up. Since we are now at a leaf, this empty leaf is where the target L is put. D / \ H E / \ / \ L I M K For another example, let's remove the E from the following heap: E / \ G K / \ / \ J N K X / \ / X Y P First remove the E and replace it with the target P (the last item): P / \ G K / \ / \ J N K X / \ X Y
Now use the G / \ J K / \ / \ P N K X / \ X Y Comic ReliefThanks to http://xkcd.com. Note their license information.
HeapsortHeapsort is performed by somehow creating a heap and then removing the data items one at a time. The heap could start as an empty heap, with items inserted one by one. However, there is a relatively easy routine to convert an array of items into a heap, so that method is often used. This routine is described below. Once the array is converted into a heap, we remove the root item (the smallest), readjust the remaining items into a heap, and place the removed item at the end of the heap (array). Then we remove the new item in the root (the second smallest), readjust the heap, and place the removed item in the next to the last position, etc. Heapsort is Theta(n * lg(n)), either average case or worst case. This is great for a sorting algorithm! No appreciable extra storage space is needed either. On average, quicksort (which is also Theta(n * lg(n)) for the average case) is faster than heapsort. However, quicksort has that bad Theta(n^2) worst case running time. Example Trace of HeapsortLet's heapsort the following array:
To convert this to a heap, first go to the index of the last parent node.
This is given by (HeapSize - 2) / 2. In this case, (9 - 2) / 2 = 3. Thus K
is the last parent in the tree. We then apply the In our example, the array corresponds directly to the following binary tree. Note that this is not yet a heap. P / \ S C / \ / \ K M L A / \ X E
Applying P / \ S C / \ / \ E M L A / \ X K
Now apply P / \ S A / \ / \ E M L C / \ X K
Next, apply P / \ E A / \ / \ K M L C / \ X S
Finally, apply A / \ E C / \ / \ K M L P / \ X S Now we have a heap! The first main step of heapsort has been completed. The other main component of heapsort was described earlier: to repeatedly remove the root item, adjust the heap, and put the removed item in the empty slot toward the end of the array (heap).
First we remove the A, adjust the heap by using C (the target is S) / \ E L / \ / \ K M S P / . X A Of course, all of this is really taking place in the array that holds the heap. At this point it looks like the following. Note that the heap is stored from index 0 to index 7. The A is after the end of the heap.
Next we remove the C, adjust the heap by using E (the target is X) / \ K L / \ / \ X M S P . . C A
Next we remove the E, adjust the heap by using K (the target is P) / \ M L / \ / . X P S E . . C A
Next we remove the K, adjust the heap by using L (the target is S) / \ M S / \ . . X P K E . . C A
Next we remove the L, adjust the heap by using M (the target is P) / \ P S / . . . X L K E . . C A
Next we remove the M, adjust the heap by using P (the target is X) / \ X S . . . . M L K E . . C A
Next we remove the P, adjust the heap by using S (the target is S) / . X P . . . . M L K E . . C A Next we remove the S, adjust the heap (now a trivial operation), and place the S at the end of the heap: X (the target is X) . . S P . . . . M L K E . . C A Since only the item X remains in the heap, and since we have removed the smallest item, then the second smallest, etc., the X must be the largest item and should be left where it is. If you now look at the array that holds the above items you will see that we have sorted the array in descending order:
Complete Code for HeapsortHeapClass class has
been set up with a constructor and functions for inserting and removing an
item. An object of this class contains three data fields: the maximum heap
size, the actual heap size (number of items currently in the heap), and
a pointer to an array containing the heap data. Note that the array itself
is located outside of the object. This may not be the cleanest approach,
but it is useful since we are going to start with an ordinary array and
sort it via heapsort. Why have a second array inside of the object, when
this array would hold the very same data?
The
In the code for the
Once
Remember that even after the object H goes out of existence at the end of
the above function, the heap data is still sitting in References:
Related ItemBinary Trees Back to the main page for Software Design Using C++ |