Heaps

Information on this page is taken from The Algorithm Design Manual by Steven S. Skiena.

Heaps are a simple and elegant data structure that efficiently supports the priority queue operations insert and extract-min. Heaps work by maintaining a partial order on the set of elements. A heap-labeled tree is a binary tree such that the key labeling of each node dominates the key labeling of each of its children. In a min-heap a node dominates its children by containing a smaller key than they do.

heap.png

Operation Complexity
Insert \(O(\log n)\)
Find Min \(O(1)\)
Delete Min \(O(\log n)\)
Search \(O(n)\)

A heap is used in heapsort (see Sorting Algorithms for more information).

Implementation

Heaps can be stored with pointers (nodes with children) or arrays.

In an array, the root of the tree is in the first position, and its left and right children are in the second and third positions. In general, the keys of the ith level of the binary tree are stored in \(2^{i - 1}\) to \(2^i - 1\). This means that the left child of \(k\) sits in position \(2k\) and the right child in \(2k + 1\), while the parent of \(k\) is in \(\lceil k / 2 \rceil\).

Note that sparse trees can be very space inefficient, so the implementation needs to carefully pack elements as far left as possible. This implicit array representation of binary trees saves memory, but is less flexible than using pointers. We cannot store arbitrary tree topologies without wasting large amounts of space. We cannot move subtrees around by just changing a single pointer. This loss of flexibility explains why we cannot use this idea to represent binary search trees.

Insert

Place the new element into the left-most open spot in the array, namely the $(n + 1)$st position of a previously n-element heap. Then bubble up the new key to its proper position in the hierarchy by swapping the element with its parent until the parent dominates the element. Insertion takes at most \(O(\log n)\) time.

def insert(element, array, size):
    heap[size] = element
    bubble_up(size, heap)

def bubble_up(index, heap):
    parent = heap[index // 2]
    if index > 0 and heap[index] > parent:
        array[index], array[index // 2] = array[index // 2], array[index]
        bubble_up(index // 2, heap)

Extracting the Minimum

The minimum can easily be found by looking in the first position in the array. Removing the top element leaves a hole in the array. Fill by moving the element from the right-most leaf (sitting in the nth position of the array) into the first position. Then bubble down the new key until it dominates all its children. The key should be switched with the dominant child.

def extract_minimum(heap, size):
    minimum = heap[0]

    heap[0] = heap[size - 1]
    bubble_down(0, heap)
    return minimum

def bubble_down(index, heap):
    smaller_index = find_smaller_child(index, heap)
    if smaller_index is not None:
        heap[index], heap[smaller_index] = heap[smaller_index], heap[index]
        bubble_down[smaller_index]

def find_smaller_child(index, heap):
    if 2 * index + 1 < len(heap) and heap[2 * index] > heap[2 * index + 1]:
        return 2 * index + 1
    elif 2 * index < len(heap):
        return 2 * index

Applications

A heap is a good choice when you need to compute the k smallest/largest elements in a collection. Suppose you were asked to write a program which takes a sequence of strings presented in streaming fashion: you cannot back up to read an earlier value. Your program must compute the \(k\) longest strings in the sequence. As we process the input, we want to track the \(k\) longest strings seen so far. Out of these \(k\) strings, the strings to be evicted when a longer string is to be added is the shortest one. A min-heap (not a max-heap!) is the right data structure for this application, since it supports efficient find-min, remove-min, and insert.

Another example of a heap-applicable problem is merging k sorted files. We initialize a min-heap to the first item from each file. Extract-min to find the next merged item. At each extract we also insert the next element from that same file. This algorithm takes \(O(N \log k)\) time and \(O(\log k)\) space.

Libraries

Heap functionality in Python is provided by the heapq module. This module only provides a min-heap. See the Python documentation for more information.