Combinatorial Search and Heuristic Methods

Information on this page is taken from chapter The Algorithm Design Manual by Steven S. Skiena.


Backtracking is a systematic way to iterate through all the possible configurations of a combinatorial search space. These configurations may represent all possible arrangements of objects (permutations), all possible ways of building a collection of them (subsets), or even possible move sequences in a game. We model each solution as a vector \(a = (a_1, a_2, ..., a_n)\), where each element \(a_i\) is selected from a finite ordered set \(S_i\). We must be careful to avoid repetitions and missing configurations.

def backtrack(a):
    if is_solution(a):
        s_i = find_candidates(a)
        while s_i:
            backtrack(a + [s_i.pop()])

At each step in the backtracking algorithm, we try to extend a given partial solution \(a = (a_1, a_2, ..., a_k)\) by adding another element at the end. After extending it, we must test whether what we now have is a solution or if not we must check whether the partial solution is still extendible to some complete solution. We're using a depth-first search to enumerate solutions. Breadth-first search would require more space (proportional to the width instead of the height of the search tree).


Backtracking Subsets

Suppose we are generating subsets of an n-element set, say \(\{1,...,n\}\). Define each subset as an array of \(n\) cells, where the value of \(a_i\) (true or false) signifies whether the ith item is in the given subset. We consider the subset a solution when every cell has true/false (length == n).

Backtracking Permutations

\(\{1,...,n\}\) has \(n!\) distinct permutations. Each permutation is represented by an array of \(n\) cells. The set of candidates for the ith position will be the set of elements that have not appeared in the \((i - 1)\) elements of the partial solution, corresponding to the first \((i - 1)\) elements of the permutation. Our array is a solution whenever length equals \(n\).

Backtracking Graph Paths

The starting point of any path from \(s\) to \(t\) is always \(s\). Thus, \(s\) is the only candidate for the first position and \(S_0 = \{s\}\). The possible candidates for the second position are the vertices \(v\) such that \((s, v)\) is an edge of the graph and \(v\) hasn't been used in the partial solution. We have a solution when \(a_k\) is equal to \(t\). Some paths might be shorter than others.


Sometimes, backtracking is all about finding the right way to represent the solution. Suppose we're trying to generate all non-attacking placements of n-Queens. A naive implementation would represent each solution as an \(nxn\) matrix. A cleaner approach represents each solution by an array of length \(n\), where the ith entry is the location of the queen on row \(i\). For example, a solution when n is 4 would be [2, 0, 3, 1].

def n_queens(n):
    solutions = []

    def backtrack(board):
        if is_solution(board, n):
            for candidate in find_candidates(board, n):
                backtrack(board + [candidate])

    return solutions

def is_solution(board, n):
    return len(board) == n

def find_candidates(board, n):
    row = len(board)
    for column in range(n):
        if all(abs(column - other_column) not in (0, row - other_row)
               for other_row, other_column in enumerate(board)):
            yield column

Another good example of clever solution representation is subsets/power-set (seen above). We represent each solution as a bitmap.

Search Pruning

Pruning is the technique of cutting off the search the instant we have established that a partial solution cannot be extended into a full solution. Pruning is powerful. Even simple pruning strategies can suffice to reduce running time from impossible to instantaneous.

For the traveling salesman, we seek the cheapest tour that visits all vertices. Suppose that in the course of our search we find a tour \(t\) whose cost is \(C_t\). Later, we may have a partial solution \(a\) whose edge sum \(C_A > C_t\). Any tour with this prefix will have cost greater than tour \(t\), and hence is doomed to be nonoptimal. Cutting away such failed partial tours as soon as possible can have an enormous impact on running time.

As another example, suppose we're solving a Sudoku puzzle. We run through empty squares, try candidate numbers, and backtrack when we are out of candidates. The naive search randomly chooses open squares. Instead, we could choose the square with the fewest number of candidates. Additionally, when generating candidates, we could look ahead to see if the partial solution causes some other open square to have no candidates. Successful pruning often requires looking ahead to see when a solution is doomed to go nowhere, and backing off as soon as possible.

Exploiting symmetry is another avenue for reducing combinatorial searches

Heuristic Search Methods

Heuristic methods provide an alternate way to approach difficult combinatorial optimization problems. Backtracking gave us a method to find the best of all possible solutions, as scored by a given objective function. However, any algorithm searching all configurations is doomed to be impossible on large instances.

The methods observed below have two common components: solution space representation and a cost function.

Random Sampling

The simplest method to search in a solution space uses random sampling. It is also called the Monte Carlo method. We repeatedly construct random solutions and evaluate them, stopping as soon as we get a good enough solution, or (more likely) when we are tired of waiting. We report the best solution found over the course of our sampling.

True random sampling requires that we are able to select elements form the solution space uniformly at random. This means that each of the elements of the solution space must have an equal probability of being the next candidate selected.

Random sampling does well when there's a high proportion of acceptable solutions or when there is no coherence in the solution space. For example, hunting for a any large prime number.

Local Search

A local search employs the local neighborhood around every element in the solution space. Think of each element \(x\) in the solution space as a vertex, with a directed edge \((x, y)\) to every candidate solution \(y\) that is a neighbor of \(x\). Our search proceeds from \(x\) to the most promising candidate in x's neighborhood.

We certainly do not want to construct the neighborhood graph for any sizable solution space. We want a general transition mechanism that takes us to the next solution by slightly modifying the current one. Typical mechanisms include swapping a random pair of items or changing (inserting or deleting) a single item in the solution.

In a hill-climbing procedure, we try to find the top of a mountain (or alternatively, the lowest point in a ditch) by starting at some arbitrary point and taking any step that leads in the direction we want to travel. We repeat until we have reached a point where all our neighbors lead us in the wrong direction.

Suppose you wake up in a sky lodge, eager to reach the top of the neighboring peak. Your first transition to grain altitude might be to go upstairs to the top of the building. And then you are trapped. To reach the top of the mountain, you must go downstairs and walk outside, but this violates the requirement that each step has to increase your score. Hill-climbing and closely related heuristics such as greedy search or gradient descent search are great at finding local optima quickly, but often fail to find the globally best solution.

Use local search when there is great coherence in the solution space. Hill climbing is at its best when the solution space is convex. Local search is also useful whenever the cost of incremental evaluation is much cheaper than global evaluation.

Simulated Annealing

Simulated annealing is a heuristic search procedure that allows occasional transitions leading to more expensive (and hence inferior) solutions. This may not sound like progress, but it helps keep our search from getting stuck in local optima.

The inspiration for simulated annealing comes from the physical process of cooling molten materials down to the solid state. In thermodynamic theory, a particle's energy state is a function of its temperature. We can mimic physics to solve combinatorial optimization problems.

Our problem representation includes both a representation of the solution space and an easily computable cost function \(C(s)\) measuring the quality of a given solution. The new component is the cooling schedule, whose parameters govern how likely we are to accept a bad transition as a function of time.

At the beginning of the search, we are eager to use randomness to explore the search space widely, so the probability of accepting a negative transition should be high. As the search progresses, we seek to limit transitions to local improvements and optimizations.

Genetic Algorithms

Genetic algorithms draw their inspiration from evolution and natural selection. Through the process of natural selection, organisms adapt to optimize their chances for survival in a given environment. Random mutations occur in an organism's genetic description, which then get passed on to its children. Should a mutation prove helpful, these children are more likely to survive and reproduce. Should it be harmful, these children won't, and so the bad trait will die with them.

Genetic algorithms maintain a "population" of solution candidates for the given problem. Elements are drawn at random from this population and allowed to "reproduce" by combining aspects of the two-parent solutions. The probability that an element is chosen to reproduce is based on its "fitness," - essentially the cost of the solution it represents. Unfit elements die from the population, to be replaced by a successful-solution offspring.

The idea behind genetic algorithms is extremely appealing. However, they don't seem to work as well on practical combinatorial optimization problems as simulated annealing does.

See Genetic Algorithms for more information.