Sorting Algorithms

Sorting algorithms put the elements of a collection into a certain order. Sorting is a basic building block that many other algorithms are built around. Sorting is used to preprocess the collection to make searching faster (e.g. binary search), as well as identify items that are similar (e.g. students are sorted on test scores).

A number of sorting algorithms run in \(O(n\logn)\) time - heapsort, merge sort, and quicksort are examples. Each has its advantages and disadvantages: for example, heapsort is in-place but not stable; merge sort is stable but not in-place; quicksort runs \(O(n^2)\) time in worst-case. In practice, a well-implemented quicksort is usually the best choice for sorting.

Algorithm Properties

Items may be places in ascending or descending order. The sorting algorithm may reference a key or the entire record. These properties are often encapsulated in a comparison function.

A stable sorting algorithm maintains the relative order of the input when keys are equal. For example, suppose one has an array of people. Each person has an age and a name. Using a stable sorting algorithm, the array is sorted first on age, then on name. When you look at the cluster of people named "John," you'll notice that they are still organized by age.

In-place refers to memory usage. An in-place algorithms needs only constant (\(O(1)\)) memory beyond the items being sorted. In-place sorting algorithms usually work within and overwrite the input array. Note that sometimes \(O(\log{}n)\) additional memory is considered in-place.

Algorithms

Bubble Sort

Overview

Bubble sort compares adjacent neighbors, and if out of order, exchanges them. At the end of the first pass, the largest is at the end. Sorting usually requires passes proportional to the input.

Best Complexity \(O(n)\)
Average Complexity \(O(n^2)\)
Worst Complexity \(O(n^2)\)
Memory Usage In-place
Stability Stable

Implementation

def bubble_sort(array):
    while True:
        modified = False
        for i in range(len(array) - 1):
            if array[i] > array[i + 1]:
                array[i], array[i + 1] = array[i + 1], array[i]
                modified = True
        if not modified:
            return array

Selection Sort

Overview

Selection sort maintains a sorted and an unsorted portion of the list. Initially the sorted portion is of length zero. Each iteration of the algorithm selects the smallest element in the unsorted portion and places it at the end of the sorted portion.

Best Complexity \(O(n^2)\)
Average Complexity \(O(n^2)\)
Worst Complexity \(O(n^2)\)
Memory Usage In-place
Stability Stable

Implementation

def selection_sort(array):
    for i in range(len(array)):
        for j in range(i, len(array)):
            if array[i] > array[j]:
                array[i], array[j] = array[j], array[i]
    return list

Insertion Sort

Overview

Just like selection sort, insertion sort maintains a sorted and an unsorted portion of the list. Initially the sorted portion is of length zero. Each iteration of the algorithm takes the first element in the unsorted portion and inserts it into the correct location in the sorted portion.

Best Complexity \(O(n)\)
Average Complexity \(O(n^2)\)
Worst Complexity \(O(n^2)\)
Memory Usage In-place
Stability Stable

Implementation

def insertion_sort(array):
    for i in range(len(array)):
        for j in range(i, 0, -1):
            if array[j] < array[j - 1]:
                array[j], array[j - 1] = array[j - 1], array[j]
            else:
                break
    return array

Shellsort

Overview

Shellsort repeatedly applies another sorting algorithm (usually insertion sort) to subsections of the list. The subsections are defined by a gap sequence. Let's say our list is [x1, x2, x3, x4, x5, x6, x7] and our gap sequence is [3, 1]. In the first pass, Shellsort would have a gap of 3 and therefore sort [x1, x4, x7]. In the second pass, Shellsort would have a gap of 1 and sort [x1, x2, x3, x4, x5, x6, x7] (the entire list).

Best Complexity Depends on gap sequence
Average Complexity Depends on gap sequence
Worst Complexity Depends on gap sequence
Memory Usage In-place
Stability Unstable

Implementation

The following function uses a \(2^k - 1\) gap sequence. This gap sequence gives the algorithm a complexity of \(O(n)\).

def shellsort(array):
    gaps = [2**k - 1 for k in range(1, int(math.log(len(array) + 1) /
                                           math.log(2)))]
    for gap in reversed(gaps):
        for i in range(0, len(array), gap):
            for j in range(i, 0, -gap):
                if array[j] < array[j - gap]:
                    array[j], array[j - gap] = array[j - gap], array[j]
                else:
                    break

    return list

Merge Sort

Overview

Merge sort is a classic divide-and-conquer algorithm. The algorithm first divides the input into smaller and smaller lists. At the base case (list length = 1) the list is sorted. These sorted sublists are progressively merged until we have sorted the original list.

Best Complexity \(O(n\log n)\)
Average Complexity \(O(n\log n)\)
Worst Complexity \(O(n\log n)\)
Memory Usage \(O(n)\)
Stability Stable

Implementation

The efficiency of merge sort depends upon how we combine the two sorted halves into a single sorted list. We need to merge the two lists together. Observe that the smallest overall item in the two sorted lists must sit at the top of one of the two lists. To merge, we remove the smallest element, then repeat. Because the recursion goes \(\lg n\) levels deep, and a linear amount of work is done per level, merge sort takes \(O(n \log n)\) time in the worst case.

def merge_sort(array):
    if len(array) < 2:
        return array
    else:
        middle = len(array) / 2
        left, right = merge_sort(array[:middle]), merge_sort(array[middle:])
        return merge(left, right)

def merge(array1, array2):
    """Combine two sorted lists into one sorted list."""
    merged = []
    while array1 or array2:
        if not array2 or (array1 and array1[0] < array2[0]):
            merged.append(array1.pop(0))
        else:
            merged.append(array2.pop(0))
    return merged

Quicksort

Overview

Like merge sort, quicksort is a divide-and-conquer algorithm. In merge sort, the hard part is combining the sublists. In quicksort, the hard part is dividing the list. Quicksort first chooses a pivot. The input is then divided into two parts: one with elements smaller than the pivot and one with elements larger than the pivot. We place the pivot between the other two piles, and then sort piles independently.

Best Complexity \(O(n\log n)\)
Average Complexity \(O(n\log n)\)
Worst Complexity \(O(n^2)\)
Memory Usage Extra \(O(\log n)\)
Stability Stable

Quicksort runs in \(O(n * h)\), where \(h\) is the height of the recursion tree. Suppose, luckily, we always the median element, the subproblems are always half the size of the previous level. This produces \(O(n \log n)\), the best case of quicksort. Suppose, unluckily, we always choose the biggest or smallest element in the sub-array. This produces \(O(n^2)\), the worst case of quicksort.

Quicksort is typically 2-3 times faster than merge sort or heapsort when implemented well. All three algorithms are \(O(n \log n)\), but experimentation shows that the simpler operations in the inner loop give quicksort a constant improvement.

Implementation

The following implementation uses the leftmost element as the pivot. Unfortunately, this choice produces worst-case performance on sorted lists. Most implementations will therefore select a different pivot.

def quicksort(array):
    if len(array) < 2:
        return array
    else:
        pivot = array[0]
        left = [x for x in array[1:] if x <= pivot]
        right = [x for x in array[1:] if x > pivot]
        return quicksort(left) + [pivot] + quicksort(right)

Note that the implementation makes new lists at each logarithmic step. It's possible to implement quicksort with only \(O(\log n)\) extra memory (for the stack). See the Hoare partition scheme for an approach that progressively switches elements around a central pivot.

Randomization

Randomization is a powerful tool to improve algorithms with bad worst-case but good average-case complexity.

If we randomly choose the pivot in quicksort, we can expect, with high probability, \(O(n \log n)\). The best possible selection for the pivot is the median. Suppose a key is good enough if it lies in the center half of the sorted space of keys. Since the expected number of good splits and bad splits is the same, the bad splits can only double the height of the tree, which still produces \(O(\log n)\) height. This randomization may be done by either shuffling the array first or by selecting a random index at each step.

Heapsort

Overview

Selection sort (see above) is a simple algorithm that repeatedly extracts the smallest remaining element from the unsorted part of an array. A computer takes \(O(n)\) time to find the smallest element in an array. This is the operation supported by a priority queue. What if we improve the data structure? Heapsort is nothing but an implementation of selection sort using the right data structure. Heapsort uses a heap.

Best Complexity \(O(n\log n)\)
Average Complexity \(O(n\log n)\)
Worst Complexity \(O(n\log n)\)
Memory Usage \(O(1)\)
Stability Unstable

Implementation

Heapsort creates a heap and repeatedly extracts the minimum to give a worst-case \((O \log n)\) algorithm. Heapsort can be implemented as an in-place sort.

from heapq import heappush, heappop

def heapsort(list):
    heap = []
    for item in list:
        heappush(heap, item)
    return [heappop(heap) for _ in range(len(heap))]

Distribution Sort

Suppose we have a list of names to sort. We could partition them according to the first letter. This creates 26 different piles, or buckets, or names. Then, we partition each pile based on the second letter of each name, etc. The names will be sorted as soon as each bucket contains only a single name. At the end, we'll be able to simply concatenate the bunch of piles together. This algorithm is commonly called bucketsort or distribution sort.

Bucketing is a very effective idea whenever we are confident that the distribution of data will be roughly uniform. It is the idea that underlies hash tables, kd-trees, and a variety of other practical data structures. The downside of such techniques is that the performance can be terrible when the data distribution is not what we expected.