csc 591-024, (8290)
csc 791-024, (8291)
fall 2024, special topics in computer science
Tim Menzies, timm@ieee.org, com sci, nc state
In the 1930s, the famous geneticist Sewall Wright introduced a metaphor that would shape our understanding of complex optimization problems. Wright described a population of rabbits living on a mountainous landscape. In this metaphor, the fitness of each rabbit is determined by its position on the landscape—the higher up a hill the rabbit is, the fitter it is. These hills represent peaks in fitness, while the valleys represent low fitness.
This landscape wasn’t smooth, though. It was rugged, filled with ridges, local maxima, and local minima—representing the difficult nature of real-world optimization problems. Wright’s “fitness landscape” was the first formal description of an optimization landscape, and it introduced a powerful way of visualizing how algorithms navigate complex spaces to find solutions.
Fast forward to today, optimization algorithms are those “rabbits” traversing this landscape in search of the highest peak (the global maximum) or lowest valley (the global minimum).
In optimization, the landscape can be thought of as a surface where each point represents a possible solution, and the height (or depth) represents the value of the objective function you’re optimizing—either trying to minimize or maximize.
The Problem of Local Minima and Maxima: Algorithms often find themselves stuck in local minima or maxima—a situation where no nearby solutions seem better, but better solutions exist elsewhere. For example, a rabbit may reach a small hill, think it’s on the highest peak, and stop searching—missing a nearby taller mountain.
Solution:
initialize solution S
initialize temperature T
repeat
new_solution = perturb(S)
if (new_solution is better than S) or (accept_worse_solution_probabilistically(T)):
S = new_solution
T = decrease_temperature(T)
until stopping_condition
return S
Following the Ridges: Navigating ridges is one of the most delicate tasks in optimization. Solutions along ridges are tempting because small movements in one direction improve the objective, but large movements perpendicular to the ridge can cause the algorithm to miss the ridge entirely.
Solution:
initialize population P of size N
repeat
select parents from P
create new_population by crossover and mutation of parents
select survivors to form new P
until stopping_condition
return best_solution_in(P)
Escaping Plateaus: Plateaus, or flat regions, present another unique challenge. With no clear indication of which direction is better, algorithms can waste time wandering without making progress.
Solution:
initialize solution S
repeat
for small_number_of_steps:
improve S locally by flipping a variable
with probability p:
randomly modify S
until stopping_condition
return best_solution_found
Climbing to Higher Peaks: Differential Evolution (DE) and NSGA-II come into play when you want to combine information from different parts of the landscape. DE uses differences between members of the population to create new solutions that are likely to move toward better regions of the landscape.
NSGA-II excels at managing multiple objectives. It finds solutions that balance trade-offs between conflicting objectives, ensuring a diverse set of solutions along the Pareto front (a kind of multi-dimensional ridge).
initialize population P of size N
evaluate P
repeat
generate offspring population Q using crossover and mutation
evaluate Q
combine P and Q into R
non-dominated_sort(R)
calculate crowding distance for each individual in R
select the N best individuals based on rank and crowding distance to form new P
until stopping_condition
return Pareto_front(P)
Using Surrogates for Complex Landscapes: When the landscape becomes too complex, and evaluating every point is costly, Sequential Model Optimization (SMO) or Bayesian Optimization (BO) uses surrogate models—simpler approximations of the landscape. These models predict which regions of the landscape are worth exploring, reducing the number of evaluations needed.
initialize surrogate model M
initialize sample points S
repeat
find promising point using M (e.g., based on expected improvement)
evaluate real function at promising point
update M with new point
until stopping_condition
return best_solution
The optimization landscape is full of challenges—ridges, local minima, local maxima, and plateaus. Algorithms have been developed to navigate these landscapes in different ways, from simple probabilistic methods like Simulated Annealing to more advanced techniques like Differential Evolution and Sequential Model Optimization.
Just like Wright’s rabbits, algorithms wander this rugged landscape in search of the best solution, and it is our task to help them find the highest peaks or the deepest valleys efficiently.
Once upon a time, memory and CPU power were precious commodities, and optimization algorithms had to be simple and resource-efficient.
In the earliest days, when memory was scarce and CPU time was expensive, optimization had to be as lightweight as possible. Enter Simulated Annealing (SA)—a 1+1 algorithm. SA’s origins can be traced back to the Metropolis algorithm from 1953, which was originally designed to simulate the behavior of particles in thermodynamic systems.
It wasn’t until the early 1980s, when Scott Kirkpatrick, C. D. Gelatt, and M. P. Vecchi pointed out that the Metropolis algorithm could be used as an optimizer, that the idea evolved into Simulated Annealing (SA). The concept was simple yet powerful: start with a solution, perturb it, and decide probabilistically whether to accept the new solution based on a “temperature” parameter, gradually cooling down over time to focus on local exploration.
initialize solution S
initialize temperature T
repeat
new_solution = perturb(S)
if (new_solution is better than S) or (accept_worse_solution_probabilistically(T)):
S = new_solution
T = decrease_temperature(T)
until stopping_condition
return S
SA is efficient—using just one solution at a time—but it explores the space slowly, like a lone traveler wandering through a complex landscape. Ref: Metropolis Algorithm, 1953 Ref: Simulated Annealing, 1983
i### Phase 2: Genetic Algorithms (GA) – Enter the N+N Era
As memory got cheaper, we could afford to hold populations of solutions. Thus, Genetic Algorithms (GA) became popular. Instead of a single solution, GA maintained a population (N solutions) that evolved through selection, crossover, and mutation—mimicking the process of natural evolution. This allowed GA to explore the solution space more broadly.
initialize population P of size N
repeat
select parents from P
create new_population by crossover and mutation of parents
select survivors to form new P
until stopping_condition
return best_solution_in(P)
GA’s strength lay in diversity. Multiple solutions (N+N) evolved simultaneously, allowing it to explore many regions of the search space at once. But this required more memory. Ref: John Holland, 1975
##i# Phase 3: MaxWalkSAT (MWS) – 1+smallN, Many Times
Then, as CPU time became cheaper, we could afford to run many small explorations. MaxWalkSAT (MWS), a hybrid algorithm, arose to solve satisfiability problems. In this approach, you explored solutions by greedily walking toward better solutions, but you could also take random steps to avoid getting stuck in local optima.
Instead of using a full population, MaxWalkSAT balances between exploiting local improvement (1+smallN) and allowing some randomness.
initialize solution S
repeat
for small_number_of_steps:
improve S locally by flipping a variable
with probability p:
randomly modify S
until stopping_condition
return best_solution_found
This algorithm capitalized on faster CPUs by running these walks many times, combining fast local search with occasional randomness. Ref: MaxWalkSAT, 1996
With the rise of smarter algorithms, Differential Evolution (DE) emerged. DE works by creating new solutions through a process of “differential mutation,” where the difference between two population members is added to a third. This clever mechanism balances exploration and exploitation while maintaining computational efficiency.
initialize population P of size N
repeat
for each individual in P:
select three distinct random individuals
generate trial vector by combining them
if trial vector is better than the current individual:
replace current individual with trial vector
until stopping_condition
return best_solution_in(P)
Meanwhile, in the world of multi-objective optimization, NSGA-II became a gold standard for solving multi-objective problems (now we might call it the old standard). NSGA-II introduced a clever approach that combined non-dominated sorting with crowding distance to preserve solution diversity along the Pareto front.
initialize population P of size N
evaluate P
repeat
generate offspring population Q using crossover and mutation
evaluate Q
combine P and Q into R
non-dominated_sort(R)
calculate crowding distance for each individual in R
select the N best individuals based on rank and crowding distance to form new P
until stopping_condition
return Pareto_front(P)
NSGA-II ensures that solutions are both spread out along the Pareto front and converge toward the optimal trade-offs, handling multiple objectives efficiently. Ref: NSGA-II, 2002
MOEA/D (Multi-Objective Evolutionary Algorithm based on Decomposition) took a more structured approach. Instead of solving multi-objective problems as one large problem, MOEA/D decomposed the problem into simpler subproblems that were easier to solve individually.
initialize population P with subproblem solutions
for each subproblem:
update solutions based on neighbor subproblems
generate new solutions using crossover and mutation
until stopping_condition
return Pareto_front(P)
MOEA/D exemplified the era where ambition grew, and the complexity of problems escalated. We had multiple objectives, and we got more creative in solving them. Ref: MOEA/D, 2007
With success came ambition. Problems became more complex and expensive to evaluate. Optimizing expensive functions (like simulations or experiments) needed surrogate models—models that approximate the expensive function to save computational resources.
Enter SMO (Sequential Model Optimization) and Bayesian Optimization (BO). These methods build surrogate models (often using Gaussian processes or other regression models) to guide optimization, selecting points where the model’s uncertainty is high.
initialize surrogate model M
initialize sample points S
repeat
find promising point using M (e.g., based on expected improvement)
evaluate real function at promising point
update M with new point
until stopping_condition
return best_solution
In this age, we became strategic. Instead of evaluating every possible solution, we used the surrogate model to predict where the best solutions might lie. This gave us the power to tackle much more complex problems without exhausting computational resources. Ref: SMO/BO, 2010s
The history of optimization is one of resource adaptation: as memory became cheaper, we expanded to population-based algorithms like GA; as CPU power increased, we could afford to run more sophisticated algorithms like MaxWalkSAT. Finally, as our ambitions grew, we began using smarter algorithms like DE, NSGA-II, and MOEA/D, and ultimately, when faced with even more complex problems, we turned to surrogates like SMO to guide us efficiently through difficult landscapes.
Care to guess what is next?