Editing Simulated annealing (section)

==Selecting the parameters==
In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function {{code|E()}}, the candidate generator procedure {{code|neighbour()}}, the acceptance probability function {{code|P()}}, and the annealing schedule {{code|temperature()}} AND initial temperature {{code|init_temp}}. These choices can have a significant impact on the method's effectiveness.  Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem.  The following sections give some general guidelines.

===Sufficiently near neighbour===
Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the {{code|neighbour()}} function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum{{snd}} the diameter of the search graph must be small. In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the number of neighbors of each vertex is <math>\sum_{k=1}^{n-1} k=\frac{n(n-1)}{2}=190</math> edges (coming from <math>n \choose 2</math>), and the diameter of the graph is <math>n-1</math>.

===Transition probabilities===
To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the ''transition probabilities'' that result from the various design choices made in the implementation of the algorithm.  For each edge <math>(s,s')</math> of the search graph, the transition probability is defined as the probability that the simulated annealing algorithm will move to state  <math>s'</math> when its current state is <math>s</math>.  This probability depends on the current temperature as specified by {{code|temperature()}}, on the order in which the candidate moves are generated by the {{code|neighbour()}} function, and on the acceptance probability function {{code|P()}}. (Note that the transition probability is '''not''' simply <math>P(e, e', T)</math>, because the candidates are tested serially.)

===Acceptance probabilities===
The specification of {{code|neighbour()}}, {{code|P()}}, and {{code|temperature()}} is partially redundant.  In practice, it's common to use the same acceptance function {{code|P()}} for many problems and adjust the other two functions according to the specific problem.

In the formulation of the method by Kirkpatrick et al., the acceptance probability function <math>P(e, e', T)</math> was defined as 1 if <math>e' < e</math>, and <math>\exp(-(e'-e)/T)</math> otherwise. This formula was superficially justified by analogy with the transitions of a physical system; it corresponds to the [[Metropolis–Hastings algorithm]], in the case where T=1 and the proposal distribution of Metropolis–Hastings is symmetric. However, this acceptance probability is often used for simulated annealing even when the {{code|neighbour()}} function, which is analogous to the proposal distribution in Metropolis–Hastings, is not symmetric, or not probabilistic at all. As a result, the transition probabilities of the simulated annealing algorithm do not correspond to the transitions of the analogous physical system, and the long-term distribution of states at a constant temperature <math>T</math> need not bear any resemblance to the thermodynamic equilibrium distribution over states of that physical system, at any temperature. Nevertheless, most descriptions of simulated annealing assume the original acceptance function, which is probably hard-coded in many implementations of SA.

In 1990, Moscato and Fontanari,<ref>{{citation |journal=Physics Letters A |pages= 204–208 |last1=Moscato |first1=P. |last2=Fontanari |first2=J.F.| title=Stochastic versus deterministic update in simulated annealing |volume=146 |issue=4 |year=1990|doi=10.1016/0375-9601(90)90166-L|bibcode= 1990PhLA..146..204M }}</ref> and independently Dueck and Scheuer,<ref>{{citation |journal=Journal of Computational Physics| pages=161–175 | last1=Dueck |first1=G.| last2=Scheuer| first2=T.| title=Threshold accepting: A general purpose optimization algorithm appearing superior to simulated annealing| volume=90 | issue=1 | year=1990 | issn=0021-9991 | 
doi=10.1016/0021-9991(90)90201-B| bibcode=1990JCoPh..90..161D }}</ref> proposed that a deterministic update (i.e. one that is not based on the probabilistic acceptance rule) could speed-up the optimization process without impacting on the final quality. Moscato and Fontanari conclude from observing the analogous of the "specific heat" curve of the "threshold updating" annealing originating from their study that "the stochasticity of the Metropolis updating in the simulated annealing algorithm does not play a major role in the search of near-optimal minima". Instead, they proposed that "the smoothening of the cost function landscape at high temperature and the gradual definition of the minima during the cooling process are the fundamental ingredients for the success of simulated annealing." The method subsequently popularized under the denomination of "threshold accepting" due to Dueck and Scheuer's denomination. In 2001, Franz, Hoffmann and Salamon showed that the deterministic update strategy is indeed the optimal one within the large class of algorithms that simulate a random walk on the cost/energy landscape.<ref>{{citation |journal=Physical Review Letters| pages=5219–5222 | volume=86 | issue=3 | title= Best optimal strategy for finding ground states | last1=Franz | first1=A. | last2=Hoffmann | first2=K.H. | last3=Salamon | first3= P | doi=10.1103/PhysRevLett.86.5219 | year=2001| pmid=11384462 }}</ref>

===Efficient candidate generation===
When choosing the candidate generator {{code|neighbour()}}, one must consider that after a few iterations of the simulated annealing algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state <math>s'</math> is likely to be similar to that of the current state. This [[heuristic]] (which is the main principle of the [[Metropolis–Hastings algorithm]]) tends to exclude ''very good'' candidate moves as well as ''very bad'' ones; however, the former are usually much less common than the latter, so the heuristic is generally quite effective.

In the traveling salesman problem above, for example, swapping two ''consecutive'' cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two ''arbitrary'' cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbor generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with <math>n-1</math> swaps, instead of <math>n(n-1)/2</math>).

A more precise statement of the heuristic is that one should try the first candidate states <math>s'</math> for which <math>P(E(s), E(s'), T)</math> is large. For the "standard" acceptance function <math>P</math> above, it means that <math>E(s') - E(s)</math> is on the order of <math>T</math> or less. Thus, in the traveling salesman example above, one could use a {{code|neighbour()}} function that swaps two random cities, where the probability of choosing a city-pair vanishes as their distance increases beyond <math>T</math>.

===Barrier avoidance===
When choosing the candidate generator {{code| neighbor ()}} one must also try to reduce the number of "deep" local minima—states (or sets of connected states) that have much lower energy than all its neighboring states. Such "closed [[catchment]] basins" of the energy function may trap the simulated annealing algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin).

As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours <math>A</math>, <math>B</math>, with nearly equal lengths, such that (1) <math>A</math> is optimal, (2) every sequence of city-pair swaps that converts <math>A</math> to <math>B</math> goes through tours that are much longer than both, and (3) <math>A</math> can be transformed into <math>B</math> by flipping (reversing the order of) a set of consecutive cities. In this example, <math>A</math> and <math>B</math> lie in different "deep basins" if the generator performs only random pair-swaps; but they will be in the same basin if the generator performs random segment-flips.

===Cooling schedule===
The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near [[thermodynamic equilibrium]] at all times.  Unfortunately, the ''relaxation time''—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature.  In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way.  Note that all these parameters are usually provided as [[procedural parameter|black box functions]] to the simulated annealing algorithm. Therefore, the ideal cooling rate cannot be determined beforehand and should be empirically adjusted for each problem. [[Adaptive simulated annealing]] algorithms address this problem by connecting the cooling schedule to the search progress. Other adaptive approaches such as Thermodynamic Simulated Annealing,<ref>{{cite journal |doi=10.1016/j.physleta.2003.08.070 |title=Placement by thermodynamic simulated annealing |year=2003 |last1=De Vicente |first1=Juan |last2=Lanchares |first2=Juan |last3=Hermida |first3=Román |journal=Physics Letters A |volume=317 |issue=5–6 |pages=415–423|bibcode=2003PhLA..317..415D }}</ref> automatically adjusts the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.