Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. It was independently invented by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi in 1983, and by V. Černý in 1985.
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one.
By analogy with this physical process, each step of the SA algorithm replaces the current solution by a random "nearby" solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter T (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when T is large, but increasingly "downhill" as T goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local minima—which are the bane of greedier methods.
One essential requirement for the transition probability P is that it must be nonzero when e' > e, meaning that the system may move to the new state even when it is worse (has a higher energy) than the current one. It is this feature that prevents the method from becoming stuck in a local minimum — a state whose energy is far from being minimum, but is still less than that of any neighbour.
On the other hand, when T goes to zero, the probability P(e, e' , T) must tend to zero if e' > e, and to a positive value if e' < e. That way, for sufficiently small values of T, the system will increasingly favor moves that go "downhill" (to lower energy values), and avoid those that go "uphill". In particular, when T becomes 0, the procedure will reduce to the greedy algorithm — which makes the move if and only if it goes downhill.
The P function is usually chosen so that the probability of accepting a move decreases when the difference e' e increases — that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met.
Given these properties, the evolution of the state s depends crucially on the temperature T. Roughly speaking, the evolution of s is sensitive only to coarser energy variations when T is large, and to finer variations when T is small.
Sometimes a problem may arise when one approaches what is known as a local minimum, in this case, a set of numbers that is the "most optimized" for its current "region". In an attempt to find a better optimization, this technique may use various methods to "jump out of" the current "pit", such as searching for better optimizations randomly by a factor of the inverse amount of the previous adjustment. Keep in mind that implementations of this method are strictly problem specific, and so the way in which one finds an optimization will vary from problem to problem.
s := s0; e := E(s) // Initial state, energy.
k := 0 // Energy evaluation count.
while k < kmax and e > emax // While time remains & not good enough:
sn := neighbour(s) // Pick some neighbour.
en := E(sn) // Compute its energy.
if random() < P(e, en, temp(k/kmax)) then // Should we move to it?
s := sn; e := en // Yes, change state.
k := k + 1 // One more evaluation done
return s // Return current solution
s := s0; e := E(s) // Initial state, energy.
sb := s; eb := e // Initial "best" solution
k := 0 // Energy evaluation count.
while k < kmax and e > emax // While time remains & not good enough:
sn := neighbour(s) // Pick some neighbour.
en := E(sn) // Compute its energy.
if en < eb then // Is this a new best?
sb := sn; eb := en // Yes, save it.
if random() < P(e, en, temp(k/kmax)) then // Should we move to it?
s := sn; e := en // Yes, change state.
k := k + 1 // One more evaluation done
return sb // Return the best solution found.
While copying the state may be an expensive operation, this step happens only on a small fraction of the moves, so the change is usually worth the cost.
Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This is called restarting. To do this we set s and e to sb and eb and perhaps restart the annealing schedule. The decision to restart could be based on a fixed number of steps, or based on the current energy being too high from the best energy so far.
In practice, one tries to achieve this criterion by using a search graph where the neighbours of s are expected to have about the same energy as s. Thus, in the traveling salesman problem above, generating the neighbour by swapping two adjacent cities is expected to be more effective than swapping two arbitrary cities. It is true that reaching the goal can always be done with only n-1 general swaps, while it may take as many as n(n-1)/2 adjacent swaps. However, if one were to apply a random general swap to a fairly good solution, one would almost certainly get a large energy increase, and thus an extremely low probability of a swap. The probability of swapping two adjacent cities is more likely to have a smaller energy increase, and thus a much greater probability of a swap. In fact, the swap probability is so much greater in the latter case that the choice of adjacent cities leads to a more efficient algorithm. This is a special case of the general heuristic that "more correlated landscapes are better", as noted above.
In the original formulation of the method by Kirkpatrick et. al, the transition probability P(e, e' , T) was defined as 1 if e' < e (i.e., downhill moves were always performed); otherwise, the probability would be exp((ee' )/T).
This formula was said to come from the Metropolis-Hastings algorithm, used here to generate samples from the Maxwell-Boltzmann distribution governing the distribution of energies of molecules in a gas. However, there is no mathematical justification for using this particular formula in SA. Even the physical analogy is misleading: since the neighbours are tested sequentially in the SA algorithm, the actual probability of the SA algorithm moving from a state s to a state s' definitely is not given by that formula.
The temperature must then decrease so that it is zero, or nearly zero, at the end of the alloted time. A popular choice is the exponential schedule, where the temperature decreases by a fixed factor α < 1 at each step.
Stochastic hill climbing (SH) runs many hill-climbing searches from random initial locations.
Genetic algorithms (GA) maintain a pool of solutions rather than just one. New candidate solutions are generated not only by "mutation" (as in SA), but also by "combination" of two solutions from the pool. Probabilistic criteria, similar to those used in SA, are used to select the candidates for mutation or combination, and for discarding excess solutions from the pool.
Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas.
The cross-entropy method (CE) generates candidates solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration.
In performing annealing, one may use quantum fluctuations instead of thermal fluctuations. The resulting method is known as quantum annealing. It has been demonstrated experimentally as well as theoretically, that quantum annealing can indeed outperform thermal annealing in certain cases, specifically, where the potential energy (cost) landscape consists of very high but thin barriers surrounding shallow local minima. Since thermal transition probabilities (~exp( − Δ / kBT); T => Temperature, kB => Boltzmann constant) depend only on the height Δ of the barriers, it is very difficult for thermal fluctuations to get the system out from such local minima. But quantum tunneling probabilities through a barrier depend not only the height Δ of the barrier, but also on its width w; if the barriers are thin enough, quantum fluctuations may bring the system out of the shallow local minima surrounded by them.
Simulierte Abkühlung | Simulated annealing | Recuit_simulé | 擬似焼きなまし法 | Simulated annealing | Symulowane wyżarzanie | Simulated annealing | 模拟退火
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Simulated annealing".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world