Physica A 349 (2005) 649–666
Weight annealing
Matan Ninioa
, Johannes J. Schneidera,b,Ã
a
School of Engineering and Computer Science, The Hebrew University of Jerusalem, Givat Ram,
Jerusalem 91904, Israel
b
Department of Physics, Johannes Gutenberg University of Mainz, Staudinger Weg 7,
55099 Mainz, Germany
Received 5 August 2004; received in revised form 10 October 2004
Available online 11 November 2004
Abstract
Simulated Annealing has become a standard optimization technique for a wide variety of
problems: starting at a random conﬁguration and performing a sequence of moves, the system
is optimized using a control parameter which partially allows for accepting a deterioration and
therefore for climbing over barriers in the energy landscape. Our approach, Weight Annealing,
changes the energy landscape by assigning variable weights to the single parts of the proposed
problem. We describe the philosophies behind these algorithms and present results for the
Traveling Salesman Problem and the Sherrington–Kirkpatrick-model for spin glasses.
r 2004 Elsevier B.V. All rights reserved.
PACS: 05.10.Ln; 02.70.Uu; 89.75.Àk
Keywords: Optimization; Simulated annealing; Energy landscape; Traveling Salesman Problem; SK-model
1. Introduction
Many optimization heuristics have been developed over the last decades in
order to solve problems approximately for which no analytical solutions exist.
ARTICLE IN PRESS
www.elsevier.com/locate/physa
0378-4371/$ - see front matter r 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.physa.2004.10.022
ÃCorresponding author. Department of Physics, Johannes Gutenberg University of Mainz, Staudinger
Weg 7, 55099 Mainz, Germany. Tel.: +49 6131 3923646; fax: +49 6131 3925441.
E-mail addresses: matan@cs.huji.ac.il (M. Ninio), jsch@cs.huji.ac.il, schneidj@uni-mainz.de
(J.J. Schneider).
Improvement heuristics start with an arbitrary conﬁguration of the given problem
instance and try to improve it gradually by applying a sequence of moves, which
change the conﬁguration slightly. The simplest of these heuristics is the Greedy
algorithm (GRE), in which the moves to be applied are chosen at random and which
only accepts the changes made by the move if the tentative new conﬁguration is
better than or at least equally good as the current one, otherwise it is rejected and
one stays with the current conﬁguration. Thus, the probability p for accepting the
move from a conﬁguration s to a conﬁguration t is given by
pðs ! tÞ ¼
1 if DH ¼ HðtÞ À HðsÞp0 ;
0 otherwise

(1)
with H being the cost or energy function of the proposed problem.
However, using this GRE, one usually gets stuck in high-lying local minima in the
energy landscape from which the system cannot free itself as it is not allowed to
climb up a hill in the energy landscape. Thus, one usually works with more elaborate
acceptance functions which also allow for deteriorations but which reduce the
probability of accepting such a deterioration during the optimization run by means
of a control parameter. A widely used optimization algorithm of this kind is
Simulated Annealing (SA) [1]. SA simulates a cooling process in which a system is
transferred from a high-energetic unordered regime to a low-energetic ordered
solution by decreasing the temperature T gradually. When using SA, the Metropolis
acceptance probability
pðs ! tÞ ¼
expðÀDH=TÞ if DH40 ;
1 otherwise

(2)
is typically used.
SA allows for climbing over barriers in the energy landscape and thus typically
ends up at much better solutions than the GRE. The control parameter T governs
the probability for accepting a deterioration, such that at high temperatures the
system performs a quasi random walk and at low temperatures the system is in the
Greedy mode. By the decrease of the control parameter, the system is transferred
between these regimes.
In contrast to SA, there are also several approaches to change the energy
landscape, such that also the GRE can end up in quite good solutions. One example
for this procedure is Search Space Smoothing [2–4], in which one starts with a
completely ﬂat energy landscape. This energy landscape is then gradually
desmoothed until the original energy landscape is achieved at the end. After each
desmoothing step, a Greedy optimization run is performed, which starts with the
ﬁnal conﬁguration of the previous Greedy optimization run and thus with a locally
minimum conﬁguration in the previous energy landscape. The minima are usually
shifted such that formerly minimum conﬁgurations are no longer locally minimum.
If the desmoothing process is performed carefully enough, one can trust in a
guidance effect: it leads the formerly locally minimum conﬁguration to the new
slightly shifted minimum in the energy landscape. This is also true for the global
optimum of the ﬁrst quasi ﬂat landscape, such that the hope is that when starting
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666650
from this conﬁguration, the optimization run will end up in an at least very good
local minimum, perhaps even in the global optimum of the original landscape.
Another approach in this ﬁeld of working with various energy landscape works as
follows: one starts with the original problem instance, performs a Greedy
optimization run for it, and then switches to a slightly differing instance [5–7].
One uses the ﬁnal conﬁguration of the Greedy run for the original instance as an
input for the starting conﬁguration of the slightly altered instance. Starting from this
conﬁguration, one again performs a Greedy optimization run. Now there are two
possibilities: one either switches back to the original instance, taking the result of the
Greedy run for the changed instance as a starting conﬁguration for a new Greedy
run, and thus switches between Greedy runs for the original instance and slightly
changed instances. But one could also proceed with Greedy runs between a series of
instances which are derived from the original instance by changing it slightly. Of
course, in both cases the series has to end with a Greedy run for the original instance.
This approach is based on the belief that similar problem instances exhibit similar
energy landscapes. The large structures in the energy landscapes should be roughly
the same, especially the broad deep valleys. Thus, one will more likely end up in a
local minimum inside such a deep valley and therefore in a better conﬁguration as if
working with the GRE on the original landscape only.
2. The weight annealing method
The methods sketched above have in common that they want to ﬁnd a way out of
being trapped in a high-lying local optimum. But none of them use any additional
knowledge about where the considered problem instance is easy and where it is hard
to solve. Instead, they consider the energy landscape as a constant property of the
proposed optimization problem even if they change it during the optimization
process in order to overcome barriers in the energy landscape. But basically they
want to solve the original problem with the original energy landscape by applying
some external parameters to it without having a closer look at the speciﬁc local
properties of the problem instance.
When looking at different solutions achieved for the proposed problem instance,
one often ﬁnds that they are rather well solved for some local parts whereas other
local parts are solved or at least seem to be solved rather badly. Like this eyemeasure,
one often neglects or tries to neglect long-range interactions within the
complex system and tries to write down the energy or cost function as a sum of local
Subhamiltonians, each of them measuring how well a local part is solved:
H ¼
XN
i¼1
Hi : (3)
Of course, one wants to improve the parts which are solved worse, and therefore
wants the optimization algorithm to put stronger emphasis on these parts. To
achieve this, one can assign a weight Wi to each part i of the system, depending on
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 651
how well this part of the system is solved. This set of weights deﬁnes a new weighted
cost function HW over the N parts of the system,
HW ¼
XN
i¼1
Wi Â Hi : (4)
Now the question arises how to choose these weights. Of course, this can be done
by looking at various solutions of the problem instance, but it might be even better if
one could use internal properties of the problem instance. Such a knowledge of the
internal can be achieved by deeper insight in the problem, but also by looking at
various solutions for a problem instance: ﬁrst of all, the nature of the problem has to
be considered: of course, one can often map a problem onto another problem, the
nature of which has already been quite well understood. But such a mapping
procedure often leads to the additional problem that some constraints of the original
problem are considered by using penalty functions which add some virtual costs to
the Hamiltonian of the problem if the corresponding constraints are violated and
which should vanish at the end of the optimization run, as then one has to get a
feasible solution. An example for this is the mapping of the Traveling Salesman
Problem (TSP) onto a problem with binary variables [8]: the traveling salesman has
the task to ﬁnd the shortest closed roundtrip through a given set of nodes. The
constraint that each node is visited exactly once has to be introduced via penalty
functions if working with binary variables Zi;a with Zi;a ¼ 1 if node i is the ath stop in
the tour and 0 otherwise. A better modelling of a conﬁguration of the Traveling
Salesman Problem is a permutation of the numbers 1; . . . ; N: Each node is
represented by a number. The constraint that there is exactly one roundtrip
containing each node once is automatically fulﬁlled in this representation. Generally,
it is preferable to model a problem in a way that as many hard constraints as possible
are automatically fulﬁlled in each conﬁguration. Moves should—wherever possible—lead
only to such feasible conﬁgurations which fulﬁll these constraints and
should ensure that each such conﬁguration can be reached from any other
conﬁguration by applying a ﬁnite sequence of moves in a Random Walk.
When considering this nature of the problem, one can often determine why the
proposed optimization problem is not trivially solvable: for example, the cost
function H of the problem can consist of several functions, H ¼
P
i Hi; which
compete with each other in the sense that a move leading to an improvement
according to some part Hj might lead to an overall deterioration due to the other
parts of the cost function. Another example is that a problem instance might have to
be solved rather badly somewhere locally in order to get an overall good solution.
There are also systems in which competing interactions do not allow solutions in
which all interactions are fulﬁlled. Such competing cost functions, parts, and
interactions lead to so-called frustration effects in the system [9].
This deeper insight in the internal properties of the problem can be used for getting
estimates how well the system can be solved; for example, if the cost function of the
system can be written as a sum of competing addends Hi; then one can, e.g., solve the
system according to each of the Hi separately, which provides a lower bound for the
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666652
optimum solution. Analogously, one can have a look at the pessima according to the
various Hi for getting an upper bound. If one has understood the frustration effects
due to competing parts or interactions in the system, one can have a look at the
corresponding unfrustrated system and often get an exact cost value for its ground
state, which can also serve as a lower bound for the optimum of the original
problem. An example for this is the Held–Karp-bound for the TSP, in which the
problem instance is modelled with binary variables which are then relaxed to real
numbers in the interval ½0; 1 in order to get a problem which can be solved exactly
with the Simplex algorithm [10,11].
If such analytical considerations do not help any further, one has to perform
several optimization runs and have a close look at various solutions: when looking at
the various parts with the eye, one often sees what parts are well and what are rather
badly solved. The cost value can be split according to the single parts of the problem
or according to the addends in the cost function, providing best-so-far energy values
for these parts. This information can also be used for choosing the weights.
Regardless of the speciﬁc choice of the weights, the general outline of the Weight
Annealing (WA) algorithm is as follows:
(1) One starts off with an initial conﬁguration, which can, e.g., be achieved by using
the GRE on the original landscape.
(2) Then one determines a new set ðWiÞ of weights, based on the result of the
previous optimization run and possibly also on further insight into the problem.
(3) Then one performs a new Greedy optimization run starting with the resulting
conﬁguration of the previous optimization run and using the new weight values.
(4) Finally, one returns to Step 2 until some stopping criterion is met.
Step 1 of the algorithm can be considered to be a special case of an iteration (as
described in Step 3) of this WA algorithm with the initial values Wi  1 for all
weights.
This outline leaves a wide range of different possible ways to choose the weights.
There might also be some meta-parameters, like some control parameter T, which
govern the amount by which each single weight can be changed. For example, one
wants to start off allowing signiﬁcant changes to the weights and wants to ﬁnish the
run with the original weights Wi ¼ 1; so the output solution would be optimized
within the energy landscape of the original problem instance. This imposes—in a
physicists’ language—a ‘‘cooling schedule’’ on the weights Wi which has to end with
all weights being equal to 1.
Any weighting scheme must comply to some requirements in order to be used in
WA. The ﬁrst requirement is that all the weights must remain non-negative, as a
negative weight will cause the system to look for the worst possible local solutions
for the matching part. The second requirement is that the weighting scheme depends
on the control parameter T in such a way that when T is large, weights can be chosen
freely, while as T approaches zero all the weights approach one. For normalization,
one can generally choose the weights in such a way that the sum of the weights
remains constant, e.g., that the sum of the weights divided through the number of the
weights is one.
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 653
There are several ways to choose the weights. As a ﬁrst step, we want to consider
the case in which the single weights Wi are chosen randomly and thus independently
of how well the system is solved locally. Next one may want the weights to consider
how difﬁcult it is to solve a local part of the system optimally.
2.1. Random reweighting
The natural distribution to use for T-dependent weights is the Gaussian
distribution with mean 1 (around the original weights) and variance monotonically
dependent on the temperature. In our case, this is an inadequate solution, as some of
the weights may end up negative. Instead, we follow the approach in Ref. [12] and
use the G-distribution:
GK;bðxÞ ¼
xKÀ1
expðÀx=bÞ
bK
GðKÞ
: (5)
This distribution is particularly appropriate:
 The Gamma distribution is continuous.
 The Gamma distribution is non-negative.
 The sum of l independent, GK;b-distributed variables is GlÂK;b-distributed. In order
to normalize the expected weight to 1, we set the scale parameter b to 1=K: The
variance therefore equals the inverse of the shape parameter: s2
¼ 1=K:
 For small variances, this form of the Gamma distribution approaches the
Gaussian distribution around the mean.
To allow for a ‘‘cooling schedule’’ as mentioned above, we used the temperature as
the variance of the Gamma distribution.
2.2. Adversarial reweighting
As an alternative approach, one might also want to consider how well the system
is solved locally, and place more emphasis on the parts that are farther from optimal.
Again following the approach in Ref. [12], we suggest the following Min–Max
approach to choose the weights:
 For the current conﬁguration, estimate how well each part is solved.
 Write HW as a function of these local estimates.
 Use a distance dðW; W0
Þ over the weight vectors and deﬁne the T-neighborhood
of the original weights as the set of all weight vectors within T-distance.
 Search this T-neighborhood with some local search algorithm for a set of weights
ðWÃ
i Þ such that HWÃ is as large as possible.
Instead of searching the T-neighborhood for a maximum of HWÃ ; we can perform
a search—without a neighborhood restriction—for a maximum of HWÃ À
bdðWÃ
; W0
Þ; where b controls as a penalty factor the relative signiﬁcance of d:
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666654
Following the work in Ref. [12], we chose b ¼ 1=T and the distance d to be the
Kullback–Leibler measure [13] for the divergence between two distributions
dðW; W0
Þ ¼ KLðWjjW0
Þ 
X
i
Wi logðWi=W0
i Þ ; (6)
where W0
i are the original weights (usually set to 1).
The reweighting step attempts to ‘‘worsen’’ the formerly minimum conﬁguration,
in which the previous search process ended up, as much as possible. This step
produces a new set of weights. They deﬁne a new weighted Hamiltonian, which is a
new challenge for the search algorithm, which restarts at the previous stopping point.
If the temperature is high, the reweighting step has enough freedom to change the
energy landscape in a way that the previous local minimum is typically no longer
locally minimal. As the temperature approaches zero, the reweighting step has less
possibilities to change the weights. As the changes of the weights must be smaller for
smaller temperatures, we can assume that when the temperature approaches zero, a
good local minimum will remain with high probability locally minimal, allowing the
algorithm to converge to a good solution.
2.3. Variations of the reweighted Hamiltonian
Instead of using the weighted Hamiltonian HW ; we can use some measure related
to it as long as the two measures reach the same minima as the weights Wi approach
the value 1.
For example, one might have some deeper insight in the problem and know that
there are lower bounds Hlower bound
i for the values of the local parts Hi of the
Hamiltonian. These lower bounds can be calculated in various ways, e.g., by relaxing
the proposed integer problem into a non-integer problem or by looking at the
corresponding problem without frustration. The original Hamiltonian does not
change if we subtract these lower bounds from the corresponding addends,
~H ¼
X
i
ðHi À Hlower bound
i Þ ¼ H À
X
i
Hlower bound
i ¼ H À const: (7)
as then only a constant is subtracted from the Hamiltonian. However, if reweighting
this Hamiltonian,
~HW ¼
X
i
Wi Â ðHi À Hlower bound
i Þ (8)
and giving this reweighted Hamiltonian to the search process for the reweighting
technique, the outcome of this technique has to be quite different: Here now more
emphasis is not given to those addends which are absolutely large but to those which
could—from the point of view of the bounding function—be solved much better. If
the bounding functions provide good estimates, then this approach is surely superior
to the original approach.
However, it is not always possible to easily ﬁnd some good bounding functions.
Furthermore, many complex problems have to be solved at some local part in a bad
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 655
way in order to get to an overall excellent solution, whereas their unfrustrated
counterparts do not exhibit this property. For at least these problems, another
approach is even better: again one needs some measure for how well the system is
solved locally. One can store the values Hbsf
i which represent the best cost function
values for the single parts found so far in the previous iterations of the WA
approach. Given these values instead of the Hlower bound
i ; one does not necessarily
know how well the parts are solved, as one could have solved one part more or less
badly in all iterations. If one has to solve it in a bad way in order to get to an overall
good solution, then there is nothing to say against that. However, it might be that
this approach is not able to introduce a large enough pressure on the system to solve
this part in a much better way. A second problem which occurs within this variation
is that it might be rather unstable: imagine that some iteration of our algorithm leads
to a new record for some local part i. Then the current value for Hi and Hbsf
i
coincide. The reweighting process then thinks that this part is solved in an optimum
way, that there is nothing which can be improved there, such that it puts much less
emphasis on this part. Therefore, the next iteration of the algorithm will probably
lead to a solution which is solved much worse in this local part. The algorithm will
then put more emphasis again on this part, such that in the following iteration, the
algorithm may return to a solution with Hi ¼ Hbsf
i : This cycle may even be iterated
again and again.
In order to overcome this instability problem, one can e.g., subtract a small
amount from all Hbsf
i values. Of course, this is ‘‘lying to the algorithm’’: the
algorithm might be in a solution with all Hi ¼ Hbsf
i which happens also to be the
globally optimum of the problem. Then we tell the algorithm that there is still
something to do.
However, we want to stay ﬁrst with the ‘‘ﬁrst step’’ approach to work with
randomly chosen weights.
3. Application to the TSP and the SK-model for spin glasses
The question is now how to apply this algorithm to practical problems.
We want to sketch the possible applications at two prominent examples,
namely the TSP and the Sherrington–Kirkpatrick-model (SK-model) for spin
glasses.
The TSP is given by a N Â N distance matrix D between N nodes [14,15]. It is the
task of the traveling salesman to ﬁnd the shortest closed tour through these N nodes,
touching each node exactly once and returning to the initial node at the end. Each
conﬁguration can be coded as a permutation s of the numbers 1; . . . ; N: The
Hamiltonian of this problem is thus given by
HðsÞ ¼ DðsðNÞ; sð1ÞÞ þ
XNÀ1
i¼1
DðsðiÞ; sði þ 1ÞÞ : (9)
Usually, the symmetric TSP is considered in which Dði; jÞ ¼ Dðj; iÞ:
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666656
We implemented two possible moves for changing a conﬁguration, namely the
Lin-2-Opt, which turns around a part of the tour, and that possibility of the Lin-
3-Opt that exchanges two successive parts of the tour without changing their
direction [16,17]. Both moves are shown in Fig. 1. The Lin-2-Opt is called with a
probability of 1
6 : Of course, for both moves the energy difference DH can be easily
calculated by adding up the lengths of the new edges and subtracting the lengths of
the cut old edges.
The application of SA to the TSP is straightforward: one starts with a rather large
temperature T0; decreases T logarithmically by a factor of, e.g., 0.95, and ends at a
ﬁnal temperature at which the system is frozen in a local minimum or hopefully the
global optimum. Each move is accepted or rejected with the Metropolis acceptance
criterion.
In the case of WA, we have ﬁrst to state that the local parts are the nodes of the
TSP. We have to rewrite the Hamiltonian as
HðsÞ ¼
XN
i¼1
1
2
ðDðsðiÞ; sði þ 1ÞÞ þ DðsðiÞ; sði À 1ÞÞÞ (10)
(with sð0Þ  sðNÞ and sðN þ 1Þ  sð1Þ) in order to get the local cost functions
HiðsÞ ¼ 1
2ðDðsðiÞ; sði þ 1ÞÞ þ DðsðiÞ; sði À 1ÞÞÞ (11)
for the local parts. A weight Wi is then assigned to each node. Rewriting the
reweighted Hamiltonian in just the contrary way, one achieves the weighted distance
values
DW ði; jÞ ¼
Wi þ Wj
2
Dði; jÞ : (12)
Each move is accepted or rejected with the Greedy acceptance criterion, however,
based on the weighted cost function HW ; which works with the weighted distance
matrix DW instead of the original distance matrix D. Thus, the application of WA is
also very straightforward if a program for the Greedy algorithm already exists. After
ARTICLE IN PRESS
Fig. 1. Two possible moves for the TSP: the Lin-2-Opt (left) removes two edges from the tour, turns
around one of the two tour parts and connects them in a new way. There are four possibilities to perform a
Lin-3-Opt, i.e., a move which removes three edges and connects the three tour parts in a new way. We
implemented that type which exchanges two parts of the tour without changing their directions (right).
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 657
the end of a Greedy optimization run, the weight values Wi are changed, the distance
matrix DW is recalculated, and a new Greedy run is performed, starting with the
resulting conﬁguration of the previous run and based on the new weight values.
The SK-model is one of the standard models for spin glasses with long range
interactions between the single spins. Each of the N Ising-spins si; which can only
take the values þ1 and À1; interacts with all other spins via an interaction matrix J,
such that the Hamiltonian of this SK-model [18] is given by
HðsÞ ¼ À
1
2N
XN
i; j¼1
Ji; jsisj : (13)
The interactions Ji; j are Gaussian distributed. We implemented the single spin ﬂip,
i.e., the move si ! Àsi; as move for jumping from one conﬁguration to another one.
The application of SA to this problem is as straightforward as for the TSP. Again
one accepts or rejects the move with the Metropolis acceptance criterion and reduces
the temperature logarithmically from a high value in which the system nearly moves
at random to a low value at which it is frozen.
The WA method assigns weights to the single spins, such that a weighted
interaction matrix JW can be derived:
JW ði; jÞ ¼
Wi þ Wj
2
Jði; jÞ : (14)
In each step of the algorithm, a Greedy optimization run based on a newly weighted
interaction matrix is performed.
4. Computational results
In order to demonstrate the quality of our approach, we will apply it to ﬁve
benchmark instances of the TSP, namely the BEER127, LIN318, PCB442, ATT532,
and NRW1379 instances, which can be downloaded from Reinelt’s TSPLIB95 [19].
Thus, we cover the range between 100 and more than 1000 nodes which consists of
non-trivial and still real-life TSP instances. Furthermore, we generated three
instances of the SK-model with 100, 200, and 400 spins, respectively, which we call
here SK100, SK200, and SK400. For each instance, 1000 optimization runs were
performed, each for the Greedy algorithm, SA and WA. The cooling schedules for
SA and WA were chosen in such a way that the number of temperature steps was the
same, that the system was frozen at the ﬁnal temperature, and that the initial
temperature was large enough to be far beyond the transition temperature. The
overall number of sweeps performed in these simulations was thus generally the
same, in order to make the results comparable.
Fig. 2 shows the decrease of the energy with decreasing temperature both for a
TSP and a SK instance if applying SA. We ﬁnd in both cases a nice sigmoidal
decrease from the unordered high-energy region to an ordered low-energy solution,
in which the system ﬁnally freezes.
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666658
The curves for WA, which are shown in Figs. 3 and 4, look completely different
from those for SA. First of all, there are two curves, one for the original Hamiltonian
H ¼ H1; for which all weights are set to 1, and secondly for the weighted
Hamiltonian HW : We ﬁnd for the NRW1379 and the SK400 instance roughly the
ARTICLE IN PRESS
0
2E5
4E5
6E5
8E5
1E6
1.2E6
1.4E6
1E-4 1E-2 1 1E2 1E4 1E6
<H>
T
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
1E-6 1E-4 1E-2 1 1E2 1E4 1E6
<H>
T
Fig. 2. Decrease of the energy hHi with decreasing temperature T if using SA: curves for the NRW1379
TSP instance (left) and for an instance of the SK-model with 400 spins (SK400, right).
50000
60000
70000
80000
90000
100000
110000
120000
1E-4 1E-2 1 1E2 1E4 1E6 1E8
<H1
>
T
20000
30000
40000
50000
60000
70000
80000
1E-4 1E-2 1 1E2 1E4 1E6 1E8
<HW
>
T
Fig. 3. Change of the energy hHi with decreasing temperature T if using WA: curves of the NRW1379
TSP instance for the original Hamiltonian (left) and for the reweighted Hamiltonian (right).
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
1E-61E-41E-2 1 1E2 1E4 1E6 1E8
<H
1
>
T
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1E-61E-41E-2 1 1E2 1E4 1E6 1E8
<H
W
>
T
Fig. 4. Change of the energy hHi with decreasing temperature T if using WA: curves of the SK400 spin
glass instance for the original Hamiltonian (left) and for the reweighted Hamiltonian (right).
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 659
same behavior and can divide the cooling process in four steps: (a) for very large
temperatures, our WA implementation lets the system drop into some local
minimum. Here the weights stay all the same. Therefore, in this range, the WA
approach works in rather the same way as the GRE. The system is frozen in some
local minimum. (b) With decreasing temperature, an increase of hH1i can be
observed, which is rather smooth for the TSP instance, whereas it takes place in an
abrupt way after many ﬂuctuations for the SK instance. Here the WA algorithm
really starts to work, as can be seen at the curves for the weighted Hamiltonian,
where one can see that WA is again and again forced out of the local minimum
(upper peaks in the curve for the TSP instance) and tries to minimize the energy
(lower peaks, which occur in both curves more often). But due to the large
temperature values, basically some randomness is added to the system, as the weights
are allowed to develop rather freely. Therefore, the energy measure according to the
original Hamiltonian increases. (c) Decreasing the temperature even further, we ﬁnd
in both cases a more or less wide sigmoidal decrease of H1 while HW increases at the
same time. Obviously, here the optimization of the system takes place. The altered
energy landscape gradually approaches the original landscape, the single weight
values the value 1. Still, the GRE ﬁnds improvements at (nearly) all temperature
steps here, if looking at the original Hamiltonian. However, the part of the altered
energy landscape in which the system is trapped in is shifted energetically upwards,
as the reweighted Hamiltonian increases. (d) At small temperatures, the system is
frozen in some state which is energetically lower than the state at the beginning. The
changes in the energy landscape are so small, that the local or global optimum
remains to be locally minimal.
After this discussion of the behavior of the algorithm, we have now a look at the
quality of the results achieved with the WA technique in comparison to the GRE and
SA for various computing times (12–12,000 sweeps per temperature step). Tables 1–5
show results for the ﬁve TSP instances for various computing times. The results for
our SK instances are shown in Tables 6–8. For each instance, the minimum and
maximum value found for some calculation time is given. Furthermore, the mean
value with the error bar and the median is shown. If the optimum of an instance is
reached, then the fraction of runs which reached the optimum is shown in an
additional column at the right side of the table.
We ﬁnd for long computing times, that SA leads to better results for the SK
instances, whereas WA is superior for the TSP instances, while the GRE cannot
compete with the two more elaborate techniques SA and WA at all. For very short
computing times, SA remains the best algorithm for the SK instances, while the best
results for the TSP instances can be either achieved with the GRE or with WA. In
this regime, SA cannot compete as it allows for large deteriorations at high
temperatures, such that it loses time if compared to the Greedy which only accepts
improvements and trivial moves, which do not change the energy. WA, which is
based on the GRE, does not lose so much time if compared to the Greedy as it is
already quenched down at the beginning and then performs a transition from only
slightly randomized conﬁgurations, whereas SA has to go through the complete
transition.
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666660
5. Relations between weight annealing and other algorithms
Of course, we are fully aware of the fact that there are also other optimization
algorithms which are at least related or even rather similar to our algorithm. As
ARTICLE IN PRESS
Table 1
Results for the BEER127 instance achieved with the GRE, SA, and WA, respectively, for various
computing times (sweeps): the minimum and the maximum result achieved are shown. Furthermore, the
median lies close to the mean value. The D-value is the error bar. The right column of the table shows the
fractions of runs which reached the known global optimum
Method Sweeps Minimum Maximum Median Mean Æ D Opt (%)
GRE 12,000 118,986 130,706 123,123 123; 416 Æ 70 0
1200 118,986 130,706 123,123 123; 416 Æ 70 0
120 118,986 130,706 123,126 123; 430 Æ 70 0
12 119,367 134,971 125,272 125; 501 Æ 81 0
SA 12,000 118,294 120,642 118,572 118; 812 Æ 19 26.1
1200 118,294 121,946 119,665 119; 527 Æ 23 2.7
120 118,294 126,702 120,514 120; 560 Æ 32 0.1
12 119,698 134,566 125,436 125; 737 Æ 76 0
WA 12,000 118,294 120,883 118,740 118; 846 Æ 12 0.4
1200 118,294 121,758 118,818 118; 966 Æ 16 1.8
120 118,294 124,632 120,084 120; 155 Æ 33 0.2
12 119,023 132,330 124,511 124; 763 Æ 71 0
Table 2
Results for the LIN318 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D Opt (%)
GRE 12,000 42,787.0 45,979.7 44,127.8 44; 151:8 Æ 16 0
1200 42,787.0 45,979.7 44,127.8 44; 151:8 Æ 16 0
120 43,300.4 46,428.6 44,640.6 44; 658:4 Æ 18 0
12 45,623.0 52,347.2 49,302.0 49; 268:7 Æ 30 0
SA 12,000 42,042.5 43,188.9 42,518.3 42; 518:3 Æ 5:6 0.2
1200 42,275.9 43,683.4 42,824.5 42; 849:6 Æ 7:5 0
120 42,867.3 45,810.9 44,084.7 44; 106:8 Æ 15 0
12 50,048.2 57,154.1 53,129.7 53; 162:0 Æ 37 0
WA 12,000 42,062.5 43,282.5 42,527.5 42; 516:4 Æ 5:7 0
1200 42,102.4 43,433.1 42,731.2 42; 747:7 Æ 7:3 0
120 42,863.9 45,231.5 43,805.2 43; 843:7 Æ 14 0
12 47,328.6 54,034.7 50,498.0 50; 549:3 Æ 34 0
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 661
already mentioned, there is a technique called Search Space Smoothing which tries to
get to the global optimum by removing barriers in the energy landscape [2–4]. There
a smoothness control parameter a is introduced. With decreasing a; the energy
landscape which is ﬂat at the very beginning is gradually desmoothed until the
landscape reaches its original shape. For each value of a; a short Greedy run is
performed starting at the ﬁnal conﬁguration of the previous Greedy run. Thus, the
algorithm shall ‘‘guide’’ the Monte Carlo process from the global optimum in the
ARTICLE IN PRESS
Table 4
Results for the ATT532 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D
GRE 12,000 28,345 29,947 29,120 29; 129:6 Æ 8:0
1200 28,363 29,973 29,160 29; 169:2 Æ 8:1
120 29,122 31,051 29,997 30; 003:7 Æ 11
12 33,744 37,425 35,457 35; 451:3 Æ 20
SA 12,000 27,737 28,440 28,033 28; 026:3 Æ 4:1
1200 27,967 29,070 28,398 28; 401:5 Æ 5:5
120 28,721 30,560 29,578 29; 578:6 Æ 9:2
12 32,909 36,927 34,905 34; 911:8 Æ 19
WA 12,000 27,743 28,413 28,005 28; 011:8 Æ 3:7
1200 27,896 28,771 28,315 28; 315:4 Æ 4:9
120 28,656 30,528 29,429 29; 438:3 Æ 8:5
12 32,110 35,517 33,812 33; 795:0 Æ 17
Table 3
Results for the PCB442 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D
GRE 12,000 51,607.2 55,456.3 53,422.4 53; 436:5 Æ 19
1200 51,607.2 55,589.5 53,483.5 53; 501:5 Æ 19
120 52,827.2 57,310.7 54,765.7 54; 806:9 Æ 23
12 60,230.1 67,613.9 63,725.4 63; 722:4 Æ 34
SA 120,00 50,801.8 52,233.6 51,358.5 51; 361:7 Æ 6:5
1200 50,949.8 53,368.2 51,914.5 51; 934:1 Æ 12
120 52,364.0 56,510.9 54,115.8 54; 113:3 Æ 20
12 60,803.3 67,211.3 64,071.1 64; 050:0 Æ 35
WA 12,000 50,827.8 52,102.1 51,232.5 51; 248:8 Æ 5:9
1200 50,969.6 52,875.1 51,679.6 51; 706:4 Æ 9:4
120 52,446.0 56,173.3 54,185.6 54; 177:3 Æ 21
12 64,201.9 72,019.6 67,983.8 68; 032:9 Æ 38
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666662
smooth surface to an at least very good solution of the original problem. If applying
this algorithm to the TSP, the values of the distances are changed, thus changing
indirectly the shape of the energy landscape.
The algorithm which is probably most related to our approach is the Noising or
Permutation approach [5–7]: here one starts out with the thought that similar
instances should exhibit similar ground states. In order to overcome barriers in the
energy landscape, a series of surrogate instances is solved with the GRE, which starts
ARTICLE IN PRESS
Table 6
Results for the SK100 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D Opt (%)
GRE 12,000 À0.720852 À0.528869 À0.665552 À0:660706 Æ 1:1E À 3 0.3
1200 À0.720852 À0.528869 À0.665552 À0:660706 Æ 1:1E À 3 0.3
120 À0.720852 À0.528869 À0.665552 À0:660706 Æ 1:1E À 3 0.3
12 À0.720852 À0.528869 À0.665552 À0:660706 Æ 1:1E À 3 0.3
SA 12,000 À0.720852 À0.711584 À0.720852 À0:720055 Æ 4:1E À 5 63.3
1200 À0.720852 À0.710520 À0.719135 À0:719550 Æ 5:5E À 5 48.9
120 À0.720852 À0.704728 À0.719135 À0:718358 Æ 9:1E À 5 33.4
12 À0.720852 À0.690925 À0.715525 À0:715333 Æ 1:5E À 4 17.3
WA 12,000 À0.720852 À0.687413 À0.714539 À0:714618 Æ 1:7E À 4 18.5
1200 À0.720852 À0.686119 À0.714691 À0:714448 Æ 1:8E À 4 17.1
120 À0.720852 À0.686119 À0.715525 À0:714798 Æ 1:7E À 4 19.6
12 À0.720852 À0.686119 À0.714259 À0:714372 Æ 1:7E À 4 16.6
Table 5
Results for the NRW1379 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D
GRE 12,000 59,314.2 61,191.4 60,234.0 60; 238:2 Æ 9:4
1200 60,737.2 62,904.9 61,688.6 61; 686:3 Æ 12
120 62,674.1 65,565.1 64,029.8 64; 045:3 Æ 14
12 93,059.3 100,097 96,313.4 96; 327:9 Æ 36
SA 12,000 57,519.0 58,422.1 57,991.6 57; 992:8 Æ 4:6
1200 59,069.2 60,473.2 59,727.5 59; 739:3 Æ 7:7
120 62,058.8 65,218.4 63,321.6 63; 342:8 Æ 13
12 88,605.8 95,196.7 92,026.3 92; 001:2 Æ 31
WA 12,000 57,348.1 58,128.0 57,732.8 57; 739:9 Æ 4:4
1200 58,461.5 60,057.0 59,354.1 59; 349:2 Æ 6:8
120 61,759.0 64,240.2 62,868.1 62; 881:4 Æ 12
12 80,468.3 85,636.0 82,993.3 82; 987:2 Æ 28
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 663
with the ﬁnal conﬁguration of the previous run, which is mapped on the
corresponding conﬁguration of the new instance.
If using a more elaborate version of WA which really considers how well the
system is solved locally, then one could even go so far to say that the algorithm uses
some kind of adaptive memory and therefore belongs to the class of Tabu Search
algorithms if deﬁning this class rather broadly [20].
ARTICLE IN PRESS
Table 8
Results for the SK400 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D Opt (%)
GRE 12,000 À0.725058 À0.603360 À0.671471 À0:670502 Æ 5:8E À 4 0
1200 À0.725058 À0.603360 À0.671471 À0:670502 Æ 5:8E À 4 0
120 À0.725058 À0.603360 À0.671471 À0:670502 Æ 5:8E À 4 0
12 À0.725058 À0.603360 À0.671471 À0:670502 Æ 5:8E À 4 0
SA 12,000 À0.740229 À0.731108 À0.740229 À0:739917 Æ 3:8E À 5 78.0
1200 À0.740229 À0.727740 À0.740229 À0:738873 Æ 8:2E À 5 60.9
120 À0.740229 À0.722357 À0.737274 À0:736259 Æ 1:2E À 4 25.8
12 À0.740229 À0.711722 À0.730519 À0:730834 Æ 1:7E À 4 3.4
WA 12,000 À0.740229 À0.693402 À0.727807 À0:727116 Æ 2:5E À 4 3.8
1200 À0.740229 À0.696959 À0.727880 À0:726952 Æ 2:5E À 4 2.8
120 À0.740229 À0.698436 À0.728222 À0:727369 Æ 2:5E À 4 3.3
12 À0.740229 À0.691855 À0.727534 À0:726588 Æ 2:5E À 4 3.3
Table 7
Results for the SK200 instance (presented as in Table 1)
Method Sweeps Minimum Maximum Median Mean Æ D Opt (%)
GRE 12,000 À0.728967 À0.590085 À0.666175 À0:666622 Æ 7:8E À 4 0
1200 À0.728967 À0.590085 À0.666175 À0:666622 Æ 7:8E À 4 0
120 À0.728967 À0.590085 À0.666175 À0:666622 Æ 7:8E À 4 0
12 À0.728967 À0.590085 À0.666175 À0:666622 Æ 7:8E À 4 0
SA 12,000 À0.733633 À0.725348 À0.733633 À0:733110 Æ 5:8E À 5 91.5
1200 À0.733633 À0.715716 À0.733633 À0:732171 Æ 9:5E À 5 78.7
120 À0.733633 À0.715261 À0.733633 À0:729931 Æ 1:4E À 4 54.2
12 À0.733633 À0.699896 À0.725348 À0:725026 Æ 2:0E À 4 21.1
WA 12,000 À0.733633 À0.687242 À0.722297 À0:720926 Æ 2:4E À 4 6.9
1200 À0.733633 À0.686185 À0.721993 À0:720892 Æ 2:4E À 4 6.2
120 À0.733633 À0.678278 À0.723195 À0:721609 Æ 2:4E À 4 7.3
12 À0.733633 À0.690352 À0.722773 À0:721200 Æ 2:4E À 4 7.0
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666664
Another algorithm which uses a related type of an adaptive memory is Guided
Local Search [21,22]. This algorithm which can also be counted to the class of Tabu
Search algorithms assigns in successive iterations penalties to the worst solved parts
of the system, such that they shall be better solved in future iterations. This approach
is also rather related to ours.
There are of course many more algorithms changing the energy landscape and
making use of the history of the optimization run. For example, there are ant colony
optimization algorithms which try to ﬁnd the best roundtrip by making use of
pheromones [23]. The Ant Lion Heuristics also changes the energy landscape in
order to encourage the search process to visit good conﬁgurations and avoid bad
ones [24].
Summarizing, our algorithm is related both to the class of Tabu Search algorithms
and to the class of algorithms changing the energy landscape. Additionally, one has
to state that these two classes are again strongly related to each other. But generally,
any algorithm is related to either the Simulated Annealing algorithm or is a Genetic
Algorithm or a Tabu Search algorithm or a hybrid algorithm combining these
algorithms.
6. Summary and outlook
In this paper, we have applied a new approach called WA, which can be easily
applied to any problem wherever a Greedy search is possible, to the TSP and to the
SK-model: for problems, whose Hamiltonian can be written as a sum of
Subhamiltonians of the local parts, we assign weights to the local parts, with which
the Hamiltonian is reweighted. These weights can be, e.g., random variables,
such that the importance of the single parts of the system for the optimization
algorithm is altered in a random way. For this approach, we demonstrated that
the WA technique mostly leads to better results for the TSP instances than the
classic Simulated Annealing algorithm, for the short computing times we used.
On the other hand, in this implementation of the WA algorithm, we have been
unable to outperform Simulated Annealing for the SK instances. Of course, the
results for both SA and WA can be further improved if spending even more
calculation time.
More elaborate implementations of the WA algorithm, which actually take into
account how well the system is solved locally, might lead to better results. We have
already implemented an approach in which the local qualities of the solution are
related to lower bounds achieved from the corresponding unfrustrated system, as
deﬁned in Ref. [9]. In the case of the TSP, this is the sum of the distances to the two
nearest neighbors of a node. First runs showed that this technique leads to worse
results than the random approach. We guess that this is due to the fact that a
complex system like the TSP has sometimes to be solved in a locally non-optimum
way in order to get to an overall very good or even the globally optimum solution,
while the adversary reweighting approach puts more emphasis on parts which are
non-optimally solved, thus ending up at an overall worse conﬁguration. However,
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666 665
we believe that this approach will work signiﬁcantly better in other domains. We will
continue our research on this topic.
References
[1] S. Kirkpatrick, C.D. Gelatt Jr., M.P. Vecchi, Optimization by simulated annealing, Science 220
(1983) 671–680.
[2] J. Gu, X. Huang, Efﬁcient local search space smoothing: a case study of the traveling salesman
problem (TSP), IEEE Trans. Syst. Man Cybernet. 24 (1994) 728–735.
[3] J. Schneider, M. Dankesreiter, W. Fettes, I. Morgenstern, M. Schmid, J.M. Singer, Search space
smoothing for combinatorial optimization problems, Physica A 243 (1997) 77–112.
[4] S.P. Coy, B.L. Golden, E.A. Wasil, A computational study of smoothing heuristics for the traveling
salesman problem, Eur. J. Oper. Res. 124 (2000) 15–27.
[5] R.H. Storer, S.D. Wu, R. Vaccari, New search spaces for sequencing problems with application to job
shop scheduling, Manage. Sci. 38 (1992) 1495–1509.
[6] I. Charon, O. Hudry, The noising method: a new method for combinatorial optimization, Oper. Res.
Lett. 14 (1993) 133–137.
[7] B. Codenotti, G. Manzini, L. Margara, G. Resta, Perturbation: an efﬁcient technique for the solution
of very large instances of the Euclidean TSP, INFORMS J. Comput. 8 (1996) 125–133.
[8] K.H. Fischer, J.A. Hertz, Spin Glasses, Cambridge University Press, Cambridge, 1991.
[9] S. Kobe, T. Klotz, Frustration—how it can be measured, Phys. Rev. E 52 (1995) 5660–5663.
[10] M. Held, R.M. Karp, The traveling-salesman problem and minimum spanning trees, Oper. Res. 18
(1970) 1138–1162.
[11] M. Held, R.M. Karp, The traveling-salesman problem and minimum spanning trees: Part II, Math.
Programming 1 (1971) 6–25.
[12] G. Elidan, M. Ninio, N. Friedman, D. Schuurmans, Data perturbation for escaping local maxima in
learning, AAAI-02/IAAI-02, 2002, pp. 132–139.
[13] S. Kullback, R.A. Leibler, On information and sufﬁciency, Ann. Math. Stat. 22 (1951) 79–86.
[14] E.L. Lawler, J.K. Lenstra, A.H.G. Rinnoy Kan, D.B. Shmoys, The Traveling Salesman Problem,
Wiley, New York, 1985.
[15] G. Reinelt, The Traveling Salesman, Springer, Berlin, Germany, 1994.
[16] S. Lin, Computer solutions of the traveling salesman problem, Bell Syst. Tech. J. 44 (1965)
2245–2269.
[17] S. Lin, B.W. Kernighan, An effective heuristic algorithm for the traveling salesman problem, Oper.
Res. 21 (1973) 498–516.
[18] D. Sherrington, S. Kirkpatrick, Solvable model of a spin glass, Phys. Rev. Lett. 35 (1975) 1792.
[19] http://www.informatik.uni-heidelberg.de/groups/comopt/software/TSPLIB95.
[20] F. Glover, M. Laguna, Tabu Search, Kluwer Academic Publishers, Dordrecht, 1997.
[21] Ch. Voudouris, E. Tsang, Technical Report CSM-247, University of Essex, UK, 1995.
[22] Ch. Voudouris, E. Tsang, Guided local search and its application to the traveling salesman problem,
Eur. J. Oper. Res. 113 (1999) 469–499.
[23] A. Colorni, M. Dorigo, V. Maniezzo, Distributed optimization by ant colonies, Proceedings of
ECAL91—European Conference on Artiﬁcial Life, Paris, 1991, pp. 134–142.
[24] F.H. Stillinger, T.A. Weber, Nonlinear optimization simpliﬁed by hypersurface deformation, J. Stat.
Phys. 52 (1988) 1429–1445.
ARTICLE IN PRESS
M. Ninio, J.J. Schneider / Physica A 349 (2005) 649–666666