# Heaps

Information on this page is taken from *The Algorithm Design Manual* by Steven
S. Skiena.

**Heaps** are a simple and elegant data structure that efficiently supports the
**priority queue** operations insert and extract-min. Heaps work by maintaining a
partial order on the set of elements. A **heap-labeled tree** is a binary tree
such that the key labeling of each node *dominates* the key labeling of each of
its children. In a **min-heap** a node dominates its children by containing a
smaller key than they do.

Operation | Complexity |
---|---|

Insert | \(O(\log n)\) |

Find Min | \(O(1)\) |

Delete Min | \(O(\log n)\) |

Search | \(O(n)\) |

A heap is used in heapsort (see Sorting Algorithms for more information).

## Implementation

Heaps can be stored with pointers (nodes with children) or arrays.

In an array, the root of the tree is in the first position, and its left and right children are in the second and third positions. In general, the keys of the ith level of the binary tree are stored in \(2^{i - 1}\) to \(2^i - 1\). This means that the left child of \(k\) sits in position \(2k\) and the right child in \(2k + 1\), while the parent of \(k\) is in \(\lceil k / 2 \rceil\).

Note that sparse trees can be very space inefficient, so the implementation needs to carefully pack elements as far left as possible. This implicit array representation of binary trees saves memory, but is less flexible than using pointers. We cannot store arbitrary tree topologies without wasting large amounts of space. We cannot move subtrees around by just changing a single pointer. This loss of flexibility explains why we cannot use this idea to represent binary search trees.

### Insert

Place the new element into the left-most *open* spot in the array, namely the
$(n + 1)$st position of a previously n-element heap. Then *bubble up* the new
key to its proper position in the hierarchy by swapping the element with its
parent until the parent dominates the element. Insertion takes at most
\(O(\log n)\) time.

def insert(element, array, size): heap[size] = element bubble_up(size, heap) def bubble_up(index, heap): parent = heap[index // 2] if index > 0 and heap[index] > parent: array[index], array[index // 2] = array[index // 2], array[index] bubble_up(index // 2, heap)

### Extracting the Minimum

The minimum can easily be found by looking in the first position in the
array. Removing the top element leaves a hole in the array. Fill by moving
the element from the *right-most* leaf (sitting in the nth position of the
array) into the first position. Then *bubble down* the new key until it
dominates all its children. The key should be switched with the dominant
child.

def extract_minimum(heap, size): minimum = heap[0] heap[0] = heap[size - 1] bubble_down(0, heap) return minimum def bubble_down(index, heap): smaller_index = find_smaller_child(index, heap) if smaller_index is not None: heap[index], heap[smaller_index] = heap[smaller_index], heap[index] bubble_down[smaller_index] def find_smaller_child(index, heap): if 2 * index + 1 < len(heap) and heap[2 * index] > heap[2 * index + 1]: return 2 * index + 1 elif 2 * index < len(heap): return 2 * index

## Applications

A heap is a good choice when you need to compute the *k smallest/largest
elements* in a collection. Suppose you were asked to write a program which
takes a sequence of strings presented in **streaming** fashion: you cannot back
up to read an earlier value. Your program must compute the \(k\) longest strings
in the sequence. As we process the input, we want to track the \(k\) longest
strings seen so far. Out of these \(k\) strings, the strings to be evicted when
a longer string is to be added is the shortest one. A min-heap (not a
max-heap!) is the right data structure for this application, since it supports
efficient find-min, remove-min, and insert.

Another example of a heap-applicable problem is *merging k sorted files*. We
initialize a min-heap to the first item from each file. Extract-min to find
the next merged item. At each extract we also *insert* the next element from
that same file. This algorithm takes \(O(N \log k)\) time and \(O(\log k)\) space.

## Libraries

Heap functionality in Python is provided by the `heapq`

module. This module
only provides a min-heap. See the Python documentation for more information.