J. Chem. Theory Comput. 2008, 4, 1293-1306
1293
JCTG
Journal of Chemical Theory and Computation
The "Hot-Solvent/Cold-Solute" Problem Revisited
M. Lingenheil, R. Denschlag, R. Reichold, and P. Tavan*
Lehrstuhl für Biomolekulare Optik, Ludwig-Maximillians-Universität München, Oettingestrasse 67, 80538 München, Germany
Received February 1, 2008
Abstract: The temperature steers the equilibrium and nonequilibrium conformational dynamics of macromolecules in solution. Therefore, corresponding molecular dynamics simulations require a strategy for temperature control which should guarantee that the experimental statistical ensemble is also sampled in silico. Several algorithms for temperature control have been proposed. All these thermostats interfere with the macromolecule's "natural" dynamics as given by the Newtonian mechanics. Furthermore, using a single thermostat for an inhomogeneous solute-solvent system can lead to stationary temperature gradients. To avoid this "hot solvent/ cold solute" problem, two separate thermostats are frequently applied, one to the solute and one to the solvent. However, such a separate temperature control will perturb the dynamics of the macromolecule much more strongly than a global one and, therefore, can introduce large artifacts into its conformational dynamics. Based on the concept that an explicit solvent environment represents an ideal thermostat concerning the magnitude and time correlation of temperature fluctuations of the solute, we propose a temperature control strategy that, on the one hand, provides a homogeneous temperature distribution throughout the system together with the correct statistical ensemble for the solute molecule while, on the other hand, minimally perturbing its dynamics.
1. Introduction
Molecular dynamics (MD) simulations using molecular mechanics (MM) force fields have become a widespread tool to study the equilibrium conformational dynamics of proteins and peptides in solution,1 including processes of folding and refolding.2 More recently, also nonequilibrium processes have been simulated in which a protein or peptide is destabilized, for example by applying an external force mimicking the action of an atomic-force microscope,35 by exerting internal mechanical strain,6'7 by introducing point mutations into the protein sequence,8'9 or simply by elevating the temperature.9'10
The behavior of proteins in solution is steered by the thermodynamic conditions, notably by the temperature. The native state is stable only within a certain temperature range; processes of hot and cold unfolding have been observed.11 The temperature influences the stability and function of proteins not only directly by changing the relative importance
* Corresponding author e-mail: tavan@physik.uni-muenchen.de.
of the entropy but also indirectly via certain temperature dependent solvent properties such as the dielectric constant12 or the viscosity.13 Therefore, if one wants to describe experiments on proteins by MD simulations, the temperature must be properly controlled.
Clearly, an adequate method for temperature control is not the only precondition if one aims at quantitative descriptions of experimental data. In this respect, the quality of the employed force field, the sufficiency of statistical sampling achieved by finite simulation times, and other technical issues are also questions of concern.14 However, the temperature is of key importance because many experimental observables that can be compared with the information obtained from MD simulations sensitively depend on this parameter. Examples are the temperature factors in X-ray crystallography,15 the proton exchange and spin relaxation rates in nuclear magnetic resonance spectroscopy [see ref 16 and references therein], and the fluorescence depolarization rates17 as well as the thermodynamical measures of protein stability.18
10.1021/ct8000365 CCC: $40.75   © 2008 American Chemical Society Published on Web 07/08/2008
1294   J. Chem. Theory Comput., Vol 4, No. 8, 2008
Lingenheil et al.
The requirement of proper control does not only apply to the temperature, i.e. the average kinetic energy of the system, but also to other ensemble properties (e.g., energy fluctuations) associated with experimental observables. Thus, in a broader sense, the problem of temperature control in MD simulations is also that of generating the correct statistical ensemble (usually canonical or isothermal—isobaric). The accurate generation of a specific statistical ensemble by means of a MD simulation is also relevant for the application of generalized ensemble techniques like replica exchange molecular dynamics19-21 which has recently become very popular in order to enhance the sampling efficiency. These techniques rely on the assumption that the applied MD method samples the canonical ensemble at the respective temperature.
When simulating macromolecules in solution, the solvent environment, which is essential for the properties of the solute, can either be treated implicitly using continuum approximations or explicitly by including part of the solvent into the simulation system.14 The following discussions exclusively deal with the latter case and are devoted to the task of controlling the temperature of a solute macromolecule in explicit solvent. This task can comprise additional challenges if nonequilibrium relaxation processes are studied. Here, frequently, energy is released in one part of the system and then dissipated into the rest of the simulation box, e.g. from a solute molecule to the surrounding solvent. Since the kinetics of energy relaxation and heat transport can influence the dynamical properties of the solute,22 any applied temperature control method should make sure that the natural energy relaxation processes are unimpaired.
Generally, the ideal temperature control scheme for solute—solvent systems would be to simulate the complete simulation system microcanonically, i.e. at constant total energy in the NVE ensemble. One can show23 that, in this situation, an arbitrary subset of degrees of freedom in thermal contact with the rest of the system (e.g., the solute's kinetic degrees of freedom) will sample the canonical ensemble if the energy fluctuations of the subsystem are insignificant compared with the total energy in the rest of the system. Furthermore, one can show that the subsystem will sample the isothermal—isobaric ensemble if also the subsystem's volume fluctuations are negligible compared with the volume of the rest of the system. Finally, one expects that all those configurational degrees of freedom of the solute which directly interact with the solvent system will sample the canonical or isothermal—isobaric ensemble, respectively, if additionaly the solute—solvent interaction energy is neglib-ible compared with the solvent-internal interaction energy. In MD simulations systems, the latter condition is fulfilled if the solvent atoms by far outnumber those of the solute. Concurrently, by using the NVE approach, the solute's Newtonian dynamics are left completely undisturbed. The NVE strategy has been recommended24 for studies of protein folding kinetics and is occasionally applied25'26 to eliminate a putative influence of thermostat algorithms on the simulated dynamics.
Unfortunately, the simple NVE strategy is not easily applied to extended MD simulations. Numerical inaccuracies
associated with approximation schemes serving to speed up the computations generally lead to a violation of energy conservation. For example, heating may be caused by certain approximate treatments of long-range electrostatic interactions27'28 or by integrating the equations of motion with multiple-time-step algorithms.29 Cooling may occur, for instance, if constraining bond lengths or angles with a too loose tolerance30 or if neighbor lists for the calculation of the van der Waals interactions are not updated frequently enough.31 The defect of energy conservation could, in principle, be compensated by using an ergostat algorithm which would just scale the velocities of all atoms at every time step by an appropriate factor to keep the total energy exactly constant. However, the rates of algorithmic energy drift can vary among the constituents of an inhomogeneous simulation system leading to unphysical steady state temperature gradients,32 a problem sometimes referred to as the "hot-solvent/ cold-solute" problem.33 For example, such a gradient can result from an approximate treatment of the electrostatic interactions, which may render a mildly polar solute less affected by algorithmic noise than a strongly polar aqueous solvent.32'34'35
Thus, specifically for equilibrium simulations of macro-molecules in solution, the applied temperature control has to fulfill an important requirement: The temperature distribution has to be homogeneous throughout the inhomogeneous simulation system. As a strategy guaranteeing such a homogeneous temperature distribution it has been suggested to couple the subsystems independently to separate thermostats.36 Further below we will check this strategy among others because it is the central aim of this work to determine an optimal strategy for generating a homogeneous temperature distribution in solute—solvent simulation systems.
From a general point of view, the appropriateness of a given temperature control method involves the following three aspects:
a) Thermodynamics: Does the method generate the expected thermodynamical ensemble in principle (i.e., with simulations of infinite length and in the absence of numerical errors)?
b) Ergodicity: Does the method generate the expected ensemble within the time typically covered by modern MD simulations?
c) Dynamics: Is the time dependence and spatial distribution of the thermostatic forces realistic? For a solute in solution, for example, one would prefer to have no such forces at all beyond the thermostatting Newtonian interactions with the solvent.
A number of different algorithms has been proposed as realizations of the required thermostats (for a review see ref 36). Each of these algorithms has its specific merits and drawbacks. A critical discussion of these issues is another goal of our study.
For example, the widely used Berendsen thermostat37 (BT) has the advantage to couple only weakly to the dynamics of the controlled system (see the original paper ref 37 for this issue). On the other hand, it is clear from theoretical considerations that the BT does not create a canonical distribution of microstates,40 i.e. it introduces artifacts of type
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1295
a). Furthermore, the BT violates energy equipartition by redistributing energy from high to low frequency modes, which leads to the so-called "flying-ice- cube effect".38'39 It is unclear whether this effect is specific to the Berendsen method and closely related methods or it can occur with any thermostat belonging to the more general class of velocity rescaling algorithms.38
The more strongly coupling Nose-Hoover thermostat41'42 (NHT) is theoretically expected to generate the canonical distribution of microstates if certain conditions are obeyed thus conforming with the above question a).43 However, within the time covered by a typical MD simulation, amplitudes of temperature fluctuations were observed which were by 1 order of magnitude larger than those expected for a canonical ensemble.44 Several studies42'44-47 have shown that Nose-Hoover coupled systems do not necessarily acquire ergodicity in a reasonable time [cf. question b) above] if these systems are small, stiff, or at low temperatures. Additionally, by its very construction as a velocity rescaling algorithm, also the NHT could show the flying-ice-cube artifact (although we are not aware of any reports on a corresponding example).
As a reaction to these problems, modifications to both the Berendsen and Nose-Hoover schemes have been proposed. The most frequently employed variant of the Nose-Hoover thermostat is the so-called Nose-Hoover chain,48 which has been successfully tested by Cheng and Merz33 as a remedy to the hot-solvent/cold-solute problem. No artifacts or deviations from the canonical ensemble have been reported so far. Only recently, Bussi et al.49 suggested a modification of the Berendsen scheme in order to reliably generate a canonical distribution for systems that otherwise would sample the microcanonical ensemble. Both, the Nose-Hoover chain and the modified Berendsen thermostat induce temperature fluctuations of the correct size by artificially scaling the atomic velocities. For systems, however, which anyway sample the canonical ensemble, such a thermostat introduces an unnecessary perturbation of the dynamics, i.e. artifacts of type c). The generic example for such a system is a solute molecule in a sufficiently large explicit solvent system, which, as discussed above, always samples a canonical ensemble although possibly at the wrong temperature because of algorithmic inaccuracies.
Concerning temperature control of macromolecules in solution, we want to show how one can (i) generate the appropriate ensemble for the solute molecule in adequate time, (ii) leave invariant the time scales of energy relaxation and of equilibrium fluctuations, and (iii) guarantee a homogeneous temperature distribution in equilibrium simulations with (iv) minimal perturbation of the solute's Newtonian dynamics.
For this purpose we will scrutinize in section 2 the existing temperature control scenarios for MD simulations of solvent—solute systems by partially recollecting and partially developing associated theoretical concepts. These considerations will lead to the definition of strategies for a minimally invasive control of a solute temperature. In section 3 we will sketch the methods which we employed in a series of quite extended test simulations on peptides in aqueous solution.
As explained in section 4, these simulations were specifically designed to estimate the extent to which the theoretically expected effects of temperature control do actually modify the properties of a solute peptide. Section 5 discusses the results and suggests a practical procedure ensuring a minimally invasive temperature control.
2. Theory
Thermostats. The most widely used class of thermostat algorithms is based on the rescaling of atomic velocities. The equation of motion for an atom which belongs to a system under the rule of such a thermostat is
mjrl{t) = FitB(t)-miy(t)rl{t)
(1)
Here, the acceleration f,(0 of atom i is caused not only by the forces F, ff(f) derived from an MM force-field but also by a second term F,therm(0 = —m,y(t)t,{t), which is proportional to the atom's velocity f,(0 and to a generally time dependent thermostat parameter y(t).
In the Berendsen scheme, y(t) is directly given in terms of the instantaneous kinetic temperature50 T(t) by
y(0 =
2r
1
T{t)
(2)
with x denoting the coupling time and Tq the target temperature. For the Nose-Hoover41'42 thermostat, y(t) is coupled on a time scale tnht to T(t) by the differential equation
dy _ 1
dt
T{t)
1
(3)
Perturbation of the Dynamics. Every thermostat which is described by eq 1 perturbs the Newtonian dynamics generated by the forces F,ff(0 through the admixture of additional thermostatic forces F;jtherm(0- For a solute—solvent system, these thermostatic forces introduce artifacts of type c) concerning the dynamics (cf. section 1). The resulting perturbation can be measured for a selected atom i by the quotient
£i -       ther^ßAF. ff)D
(4)
where the brackets (...)d denote temporal averages over a simulation of a given duration D. The perturbation quotients (4) will depend on the system size and on the particular thermostat, i.e. on the form of y(t), as well as on the coupling time.
The perturbation quotients are strictly local measures for the influence of a thermostat on a simulated dynamics. However, one may also consider the local perturbation inflicted on a certain group G of atoms within a simulation system, e.g. on the Ca-atoms of a solute peptide embedded in a solvent environment. Then the root mean quotient £g =
"i(^)G over the £2 belonging to G can be used to compare how the dynamics of a solute is perturbed in different solute—solvent systems.
Instead of calculating the averages (F2therm)zj required for the evaluation of the directly from a simulation, one can also give a simple estimate for these average square forces.
1296   J. Chem. Theory Comput., Vol 4, No. 8, 2008
Lingenheil et al.
Assuming a sufficiently large simulation system, the velocities of the individual atoms will negligibly contribute to the temperature T(t). Hence, the correlation of y2 and if vanishes and one obtains
<F*tem>z> = (ml72i X = m2(y2)D(i2)D (5)
Assuming furthermore that the system is in equilibrium, that the atomic velocity distributions are undisturbed by the thermostat, and that the system is free of internal constraints (such as fixed bond lengths), the mean square velocity of atom i is expected to be (if)o ~ 3kBT/mj, where T = (T)D is the average temperature determined from the simulation. Equation 5 then becomes
time averages t obtained from a set of equilibrium simula-
(6)
We will check this estimate by sample simulations and show that it already holds for relatively small systems.
Inserting the estimate 6 into eq 4, one can recognize that the perturbation quotients of a given system which is simulated with different thermostatic strategies solely differ with respect to (y2)/j. Thus, in this case, comparisons of the mean square scaling activities (y2)o suffice for the evaluation of different thermostatic strategies concerning the size of local perturbations of the dynamics. However, thermostats do not only cause local perturbations of the Newtonian dynamics but may also interfere with ensemble properties like, for example, size and time scales of the temperature fluctuations.
Temperature Fluctuations. In an MD simulation, the statistics of the temperature fluctuations provides a probe for artifacts of type a) and b) pertaining the generation of the desired ensemble (section 1). For a system in contact with a heat bath of temperature Tb, the distribution of microstates is either given by the canonical or by the isothermal—isobaric ensemble. However, with respect to the temperature fluctuations, both ensembles are equal. The associated probability density p(T) for the instantaneous kinetic temperature is a ^-distribution51
P(T)--
(N^T/2T) T(NDoF/2)T
exp
^DoF*"
2T
(7)
where T = Tb is the expectation value of T, Ndof is the number of degrees of freedom (DoF) of the system, and r(...) denotes the Euler T-function. Consequently, the variance Oj of the temperature fluctuations is
2T2 Ov —-
(8)
Under the influence of a thermostat, the statistics can deviate from what is expected for a canonical ensemble. This deviation constitutes a measure for the global influence of the thermostat and for how close a simulation is to sampling the canonical ensemble. In the limit Ndof~^°°, eq 7 becomes a normal distribution.
The size of Or together with the autocorrelation time50 rc of the temperature fluctuations critically influences the accuracy with which the equilibrium temperature T is
tions with durations D can be estimated   to be
^2 — -l„2 c
of-2oT-
(9)
In order to judge whether a particular strategy is suited to correctly tune the temperature T, one has to perform a test simulation which is long enough to determine T with sufficient accuracy. For a small solute (large off) with a correlation time rc longer than 10 ps, an accuracy of 1 K may require simulation times of up to 10 ns.
Power of a Thermostat. By means of the observables introduced above, one can judge to what extent a thermostat can perturb the dynamical and equilibrium properties of a solute in solute—solvent simulations. Such perturbations can, of course, be avoided by using no thermostat at all. However, as outlined in section 1, this approach is generally not feasible because algorithmic inaccuracies, which are inevitable in large scale simulations using efficient MD codes, represent heat drains or sources that have to be compensated.
To properly tune this compensation, we consider the work performed by the thermostatic forces F,_ therm(0 on the atoms i for an ensemble of simulation systems with the temperature T{t) = (J(O)ens- The ensemble average power exerted by the thermostat on a given atom i is
A(0 = <F,therm(0-f,(0>e„s
(10)
Using the definition of F,-, therm [see eq 1] and the Berendsen expression 2 for y leads to
ft(0 = ;(W0OT)-l]>a
(11)
with the usual definition for the kinetic energy £,kin(0 of atom i. Employing once more the assumption of a negligible correlation between the velocity [and, thus, the kinetic energy kin(0] °f a single atom and the kinetic temperature T(t) of the system, one obtains
3kRTXt)
A(0 = —|^-[Vr(0-i]
(12)
where kg is the Boltzmann constant, and 7X0 = 2/3&b(£;i kin(0)ens is the ensemble average temperature of atom i. For equilibrated systems the ensemble averages employed in eq 12 can be replaced by temporal averages (...)d- This allows to calculate for every subsystem k from a simulation the (time) average thermostatic power
(13)
per degree of freedom using the average temperature tK = (Tk)d of the subsystem k, the corresponding average T = (T)o of the temperature T(t) controlled by the BT, and the thermostat parameters To and r.
Further below we will use eq 13 to determine the thermostatic power exerted by a BT on a solute peptide from sample simulations. These data will be used to check the validity of a heat conduction model which we will now introduce to analyze the hot-solvent/cold-solute problem occasionally hampering MD simulations of inhomogeneous
determined by a given simulation. The variance Of of the systems.
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1297
Figure 1. Heat flow model representing specifically the "hot-solvent/cold-solute" case for an inhomogeneous system consisting of two subsystems with different heating rates. The simulation system is coupled to a single thermostat, representing an external heat bath. Bright and dark colors code low and high temperatures, respectively. Heat flows driven by temperature gradients and heat sources are marked by arrows. A detailed discussion is given in the text.
Heat Flow Model. In simulations of solute—solvent systems, the algorithmic heat drains or sources may be inhomogeneously distributed and, thus, the temperature may likewise be inhomogeneous. According to requirement (iii) stated at the bottom of the Introduction, such inhomogeneous temperature distributions should be avoided. Figure 1 sketches a heat flow model from which one can derive strategies for the reliable control of the solute temperature. As drawn, the model refers to a particular strategy employing a single thermostat for the whole system. For further reference we denote this strategy by G.
The model depicted in Figure 1 consists of two subsystems k e {P, S\ with P denoting the solute and S denoting the solvent. The powers aK of algorithmic heating per DoF are assumed to be constant and homogeneous within the subsystems. Furthermore, the temperature is assumed to be homogeneous within each subsystem k, i.e. for the atomic temperatures T, we have T, = TK for all i e k. According to eq 12, the ensemble average work /3,(0 exerted on atom i per unit time by the global thermostat then only depends on whether i is part of P or S, respectively. Thus, for the subsystems k e {P, S\ the respective thermostatic powers per DoF are given by
ßK(t) = -1£-[T0/T(t)-l]
(14)
If the local temperatures TP and Ts differ, as is assumed in Figure 1, there will be a net heat flow
ßspV):
kB[Ts(t) - TP(t)]
(15)
between S and P, which we assume to depend linearly on the temperature difference. Here, the time constant tsp characterizes the thermal coupling of the subsystems.
The heat flowchart shown in Figure 1 immediately suggests stationarity conditions. In the steady state, the net heat flow must individually vanish for each of the two subsystems, i.e.
aP + ßSP + ßP = 0
(16)
(17)
Now suppose for a moment that the temperature distribution is homogeneous throughout the system, i.e. TP{t) = Ts{t) = T{t). According to eq 14 the thermostatic powers /3p(?) and j3s(t) exerted on the subsystems are then equal, and, by eq 15, the heat flow f3sp(t) between S and P vanishes. Equations 16 and 17 then immediately require as the stationarity condition that <Xp = (Xs, i.e. that the heat sources in the subsystems work at equal powers.
If this is not the case (dp ^ as), the temperature cannot be homogeneously distributed in the stationary state, and, by eq 15, there will be a nonvanishing heat exchange j3sp ^ 0 between the subsystems. As a result, a steady state temperature difference is inevitable whenever, upon applying scenario G, a single global thermostat is used to thermostatize a system exhibiting inhomogeneities with respect to the rates die of algorithmic heating. This is the origin of the hot-solvent/cold-solute problem as described e.g. in ref 34.
Separate Thermostats. To avoid temperature inhomogeneities, it has become a standard in simulations of macro-molecules to couple separate thermostats to the subsystems.36'47'52-57 We will denote this temperature control scenario by P. In the following discussion of scenario P, we will concentrate on the temperature control of the solute P, assuming that the temperature of the solvent S is reliably controlled. Such a reliable control can be achieved by a solvent thermostat combining a coupling time ts on the subpicosecond time scale (e.g., ts = 0.1 ps) with a target temperature To, s equal to the intended temperature. This choice of thermostat tuning actually is the standard (see e.g. refs 33, 36, 37, 47, 52-56), and, thus, we call it the classical setup.
For a scenario P, in which a separate BT is coupled to a (thermally homogeneous) solute P, the controlled temperature T(t) is the solute temperature Tp{t). Thus, we obtain from eq 14 the simplified expression
ßp(t)--
2t,
■[T0-TP(t)]
(18)
for the power of the thermostat acting on P. With eqs 18 and 15, the solute's stationarity condition 16 may be rewritten
kß(Ts   Tp)    kB(Top Tp)
2r,
2r„
= 0
(19)
where 7b, p denotes the target temperature, and tp denotes the coupling time of the solute thermostat. The first term characterizes the algorithmic heating within P, the second term characterizes the heat flow between P and S, and the third term characterizes the power j3p of the thermostat separately coupled to P.
Equation 19 is the quintessence of our linear heat flow model and may be used to predict the effects of three different thermostatic strategies within scenario P. In all these strategies, S is coupled to a classical BT and P is decoupled from this thermostat. The three strategies are as follows:
P. 1 The solute P is coupled to a classical thermostat. Here, the use of a correspondingly small coupling time tp ~ 0.1
1298   J. Chem. Theory Comput., Vol 4, No. 8, 2008
Lingenheil et al.
ps is the standard. - ~~ - For such small t>, eq 19 is completely dominated by the thermostatic term. The reason for this dominance is that t> is by at least 1 order of magnitude smaller than the solute—solvent coupling time rSp, which is typically larger than 1 ps (see further below). Neglecting the heat flow contribution, the deviation TP — To, p from the target temperature is given by 2dptplkB. For moderate algorithmic heating rates dp, this deviation is expected to be small because of the short time scale t>.
P.2 No separate thermostat is coupled to the solute P, i.e. Tp^°o, and solely the thermostatted solvent S acts as a heat bath. We call this strategy "noninvasive" because it does neither alter the Newtonian dynamics nor the energy relaxation properties of P. The expected temperature difference Tp — Ts = 2<XptsplkB will be small if the local heating dp is negligible on the time scale tsp of the thermal coupling between the subsystems.
P. 3 The solute is coupled to a thermostat with a very large coupling time tp»rSp to realize a "constant heat flow" (CHF) approach. As suggested by the heat balance eq 19, a homogeneous (Tp= Ts) and stationary temperature distribution only requires that the thermostatic power j3p cancels the power dp of algorithmic heating, i.e., dp = ks(Tp — To, p)/ 2tp. This condition can be satisfied for an arbitrarily large coupling time tp by a proper choice of the target temperature lb, p. In the limit tp-^°°, the thermostat variable y in eqs 1 and 2 becomes a constant yp, and the thermostat scheme may actually be descibed by this single parameter. At large tp, the thermostat works in a heating/cooling limit as a constant heat source/drain, and this activity solely serves to maintain the energy balance. Because of eq 2, the perturbation of the Newtonian dynamics [cf. eq 1] inflicted by such a CHF thermostat can be made very small. Therefore, we call the CHF approach to the solute's temperature control, which is applicable to non-negligible local heating rates a.p, "minimally invasive".
To set up a CHF simulation as required in strategy P.3, the a priori unknown power dp of algorithmic heating has to be determined in order to specify the constant thermostat parameter yp, or, equivalently, the paramteres Tb,p and tp if a traditional Berendsen thermostat is used in the heating/ cooling limit. To this end, the solute temperature Tp has to be measured in two test simulations with different heating powers j3p of the thermostat. The two heat balance eqs (19) of these tests then constitute a system of linear equations which determines the unknown parameters j3sp and dp. A detailed description of the setup protocol is given in Appendix A.
In the following we will examine the temperature control strategies G and P.l —P.3 introduced above by test simulations. Based on these results, we formulate guidelines for a temperature control satisfying the four conditions summarized at the bottom of the Introduction.
3. Methods
MD Simulation Techniques. The software packages EGO-MMII27 and GROMACS59 were used in several series of MD simulations. Besides EGO we also applied GROMACS because it provides an NHT in addition to a BT,
because it is computationally efficient for very small systems, and because it can provide data for a crosscheck of results. In EGO the electrostatic interactions are treated combining structure-adapted multipole expansions60'61 with a moving-boundary reaction-field approach62 and a multiple-time-step integration.29'63 In the GROMACS simulations we used the PME method28 with a 10 A cutoff for the real space contribution, with a grid spacing of 0.5 A, and with a sixth order interpolation of the charges to the grid. For both EGO and GROMACS simulations, the van der Waals interactions were truncated at 10 A. If not stated otherwise, the simulations were carried out with explicit solvent using periodic boundary conditions and with a BT (ts = 0.1 ps, To, s — 300 K) rapidly coupled to the solvent to guarantee that the solvent was closely kept at the target value. Bond lengths were constrained using the M-SHAKE algorithm30 with relative tolerances of 10~4 when using GROMACS, which is the recommended default value, and 10~6 when using EGO, which is hard-coded in the source code in this case.
We applied different simulation protocols to vary the heating properties within the simulation systems. Here, the first parameter was the software used for simulation, which we denote by E for EGO or G for GROMACS. Since the M-SHAKE algorithm is known30 to have a cooling effect, we varied the number of constraints by either constraining no bonds at all (N), only bonds involving a hydrogen atom (H), or all bonds (A). The last parameter which presumably influences the heating in the system is the length At of the basic integration time step, which we simply denote by its value in femtoseconds. Thus, a standard EGO simulation (constraints on bonds involving hydrogen atoms and At = 1 fs) would be denoted by E/H/l.
Model Systems. The first model system was a polyalanine octapeptide (8ALA) with charged termini described by the GROMOS96 force field64 and embedded in a cubic box of 20 A edge-length containing 236 simple point charge (SPC) water molecules.65 The number of DoF for the peptide then is 153/143/103 for N-/H-/A-constraining, respectively. The starting conformation was always fully extended. The system was equilibrated for 300 ps during which solute and solvent were coupled to separate BTs (tp = ts = 0. lps, To, s — To, p = 300 K). The second model system was an alanine dipeptide (ALDI) described by the CHARMM22 force field66 in a cubic box of 21.3 A edge length containing 324 water molecules modeled by the transferable three-point intermo-lecular potential (TIP3P).66'67 Here, the number of peptide DoF is 66/54/45 for N-/H-/A-constraining, respectively. The system was prepared as described for 8ALA, except that the equilibration time was only 100 ps.
MD Simulations. A first series of seven MD simulations of 8ALA in SPC water served to study the various situations encountered in the temperature control of inhomogeneous systems. Table 1 associates acronyms to these simulations and lists the employed parameters. In particular, in the last simulation G/A/2_P.3, the CHF approach was applied to the peptide. Using the data from the preceding simulation G/A/ 2_P.2 (tp = °°,Tp = 293.4 K), in which only S was coupled to a classical BT and the data from an independent 10 ns test simulation with an additional CHF thermostat coupled
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1299
Table 1. Simulation Names and Associated Parameters in Series #1*
name
protocol
thermostat parameters
software
At/is
D/ns
;ys/pS
Ts/pS
Tp/pS
To,p/K
G G
E/H/1 E/H/2_ E/H/2_P2 G/H/2_P2 G/A/2_P2 G/A/2_P1 G/A/2_P3
EGO EGO EGO
GROMACS GROMACS GROMACS GROMACS
H H H H A A A
20 20 20 20 20 20 20
0.1 0.1
0.1 0.1 0.1 0.1 0.1
0.1 500
300 2340
aThe simulation names code the varied parameters and temperature control scenarios. C specifies the type of bond length constraints, Af the size of the basic integration time step, and D the duration of the simulation. The parameters r specify the coupling times of the BTs coupled to the whole system (sys), to the solvent (S), or to the solute (P). T0,p is the target temperature of a thermostat coupled to the solute. The solute peptide was 8ALA in SPC water. See the text for further information.
Table 2. Simulation Parameters in Series #2a
protocol thermostat parameters
name	software	C	Atffs	D/ns		Tsys/pS	Ts/pS	Tp/pS	To,p/K
CHF.O	GROMACS	A	2	200 x	2	—	0.1	—	—
CHF.1	GROMACS	A	2	200 x	2	—	0.1	500	2340
CHF.2	GROMACS	A	2	200 x	2	—	0.1	500	4800
CHF.3	GROMACS	A	2	200 x	2	—	0.1	500	7700
CHF.4	GROMACS	A	2	200 x	2	—	0.1	500	11100
CLS.1	GROMACS	A	2	200 x	2	—	0.1	0.1	300
CLS.2	GROMACS	A	2	200 x	2	—	0.1	0.1	340
aThe model peptide was 8ALA in SPC water at 7s = 300 K. Except for the simulation set CHF.O, in which only Ts was controlled, separate BTs were applied to S and P. See the caption to Table 1 for further information.
to the peptide (t> = 500 ps, T0,P = 4800 K, 7> = 307.9 K), the unknown parameters in eq 19 were determined as described in the section 2. We found the values dp = —2.04fefi K/ps for the algorithmic heating rate and tsp =1.61 ps for the solute—solvent coupling time, which actually is in the picosecond time range as claimed further above. To realize a CHF thermostat maintaining the peptide at 7> ~ 300.0 K, these values were inserted into eq 19 yielding a "target temperature" 7o,p = 2340 K. If the assumptions underlying our heat-flow model are correct, this choice should compensate through j3p= —dp the algorithmic energy drift in the G/A/2_P.3 simulation.
The setup of a second series of simulations was chosen such that the effects of the local temperature and of a thermostat on the dynamics of 8ALA can be studied. We performed seven sets of 200 simulations each. Every single simulation had a duration of 2 ns, amounting to 400 ns per set and a total of 2.8 /is of simulation time. The simulation parameters are summarized in Table 2. All simulations were performed with the G/A/2 protocol. In the first set (CHF.O), no thermostat was coupled to the peptide, while in the following four sets (CHF.l to CHF.4) a BT targeting at increasingly large temperatures Tqp was coupled in an extremely slow fashion to the peptide. In the last two sets (CLS.l and CLS.2), a separate classical BT was coupled to the peptide using either the same (7b, p = 300 K) or a slightly higher (7op = 340 K) target temperature as compared to 7o, s. The 200 initial conditions were obtained by taking snapshots every 20 ps from a 2 ns preparatory simulation at 300 K, with the peptide's Ca atoms harmonically coupled to their initial coordinates of an extended conformation.
To compare the different thermostatic strategies discussed in section 2, we determined the corresponding thermostatic
Table 3. Simulation Parameters in Series #3'
protocol thermostat parameters
peptide	software	C	Atffs D/ns	Tsys/pS	Tp/pS	To,p/K
8ALA	GROMACS	N	1 0.25	NHT	0.064	300
8ALA	GROMACS	N	1 0.25	NHT	0.256	300
8ALA	GROMACS	N	1 0.25	NHT	1.024	300
8ALA	GROMACS	N	1 0.25	NHT	4.096	300
8ALA	GROMACS	N	1 0.25	NHT	16.384	300
8ALA/ALDI	EGO	N	1 0.25	BT	0.001	300
8ALA/ALDI	EGO	N	1 0.25	BT	0.004	300
8ALA/ALDI	EGO	N	1 0.25	BT	0.016	300
8ALA/ALDI	EGO	N	1 0.25	BT	0.064	300
8ALA/ALDI	EGO	N	1 0.25	BT	0.256	300
8ALA/ALDI	EGO	N	1 0.25	BT	1.024	300
8ALA/ALDI	EGO	N	1 0.25	BT	4.096	300
a For nomenclature see the caption to Table 1. In all simulations the solvent was coupled with rs = 0.1 ps to a Nose-Hoover (NHT) or a Berendsen (BT) thermostat, respectively.
forces (eqs 5 and 6) and perturbation ratios (eq 4) in a third series of relatively short 250 ps simulations. Simulations were performed for 8ALA with varying coupling strengths and BTs and NHTs, respectively. Additionally, we determined the thermostatic forces and the perturbation ratio also for ALDI and Berendsen coupling again varying the coupling strength. The simulation parameters of the third series are given in Table 3. As these simulations served to compare thermostatic and force-field forces, no bond lengths were constrained thus eliminating constraint forces.
Finally, a fourth series of slightly more extended simulations (500 ps) was designed to examine how the solute's variance of temperature fluctuations (cf. the corresponding paragraph in section 2) is affected by the coupling times of a BT. We studied 8ALA and ALDI in water and in vacuum
1300 J. Chem. Theory Comput., Vol 4, No. 8, 2008 Table 4. Simulation Parameters in Series #4a
Lingenheil et al.
	system		protocol			thermostat parameters	
peptide	environment	software	C	At/is	D/ns	Tp/pS	To,p/K
8ALA	water/vac	EGO	A	2	0.5	0.001	300
8ALA	water/vac	EGO	A	2	0.5	0.004	300
8ALA	water/vac	EGO	A	2	0.5	0.016	300
8ALA	water/vac	EGO	A	2	0.5	0.064	300
8ALA	water/vac	EGO	A	2	0.5	0.256	300
8ALA	water/vac	EGO	A	2	0.5	1.024	300
8ALA	water/vac	EGO	A	2	0.5	4.096	300
ALDI	water/vac	EGO	H	2	0.5	0.001	300
ALDI	water/vac	EGO	H	2	0.5	0.004	300
ALDI	water/vac	EGO	H	2	0.5	0.016	300
ALDI	water/vac	EGO	H	2	0.5	0.064	300
ALDI	water/vac	EGO	H	2	0.5	0.256	300
ALDI	water/vac	EGO	H	2	0.5	1.024	300
ALDI	water/vac	EGO	H	2	0.5	4.096	300
a For nomenclature see the caption to Table 1. BTs were used for solvent and solute. The solvent was coupled with a coupling time of 0.1 ps.
304 302 300 2~298
294 292 290
' <v' <v <y <y <y- <^ y XTX
^ c> c> <y c>
Figure 2. Average peptide temperature tP observed in the first series of simulations on 8ALA in SPC water. The associated acronyms and parameters characterizing the members of the series are given in Table 1.
by E/H/2 simulations using the same set of coupling times as in series #3. Table 4 summarizes the simulations of the last series.
4. Results and Discussion
Temperature Control Scenarios. As outlined above, the series of equilibrium simulations on the model peptide 8ALA in explicit water as characterized by Table 1 served to exemplify the problems connected with the temperature control of inhomogeneous systems. Figure 2 shows the average peptide temperatures obtained in these sample simulations. Using eq 9, the remaining uncertainty of these average temperatures was estimated to Ofp < 0.7 K. The solvent temperatures were 300.0 K where not mentioned explicitly.
In simulation E/H/1_G, we used the standard simulation protocol for EGO (see section 3 for details), which includes a classical BT coupled to the whole simulation system and, thus, represents an example for scenario G outlined in section 2. Neither the resulting temperatures of the peptide (cf. Figure 2) nor of the solvent showed any statistically significant deviations from the 300 K target value suggesting that in E/H/1_G the algorithmic noise was weak.
Figure 2 indicates that this behavior was lost in simulation E/H/2_G, in which the basic integration time step At was doubled to 2 fs. For our sample system, this doubling of At
led to a 3.0 K increase of the peptide temperature, indicating that the modified simulation setup has caused certain algorithmic inaccuracies. When using EGO, the choice of a larger At is expected to reduce the accuracy of the integration algorithm because the employed highly efficient multiple-time-step algorithm does not exactly guarantee energy conservation and because the corresponding violation increases with the size of At (see refs 63 and 29 for a discussion). According to Figure 2, the combination of a global Berendsen thermostat with a reduced accuracy of integration in simulation E/H/2_G apparently led to a moderately elevated temperature for the peptide and to a slightly (0.3 K) cooler temperature for the larger solvent system. Nevertheless, the temperature of the total system was accurately kept at 300.0 K by the thermostat. Apart from changed signs (hot solute in cold solvent), this result is an example for the classical problem reported in the literature,32'34'68 which can arise in scenario G from indiscriminately coupling a thermostat to all parts of an inhomogeneous system and which is described by the heat flow model sketched in Figure 1.
However, as demonstrated by the average peptide temperature displayed in Figure 2 for simulation E/H/2_P.2, this temperature control problem was eliminated by simply decoupling the peptide from the thermostat, i.e. by realizing scenario P.2. This observation suggests that in the E/H/2 simulations the solvent experiences a considerable cooling, whereas the level of algorithmic noise within the peptide is very low. According to our experience, such a decoupling of the solute is a proper solution for most temperature control problems which can occur in simulations of inhomogeneous systems using either EGO or GROMACS.
The fact that the application of scenario P.2 cannot always remove such problems is demonstrated by the results of simulation G/H/2_P.2, which was carried out with GROMACS using the same settings as in the EGO simulation E/H/2_P.2. According to Figure 2, in the G/H/2_P.2 simulation the peptide was by about 2 K too hot, indicating that the rate j3sp of heat transport from the peptide P into the solvent S was too slow to compensate the algorithmic heating dp > 0 of the solute occurring in this case.
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1301
It may be expected that introducing additional M-SHAKE constraints into the peptide system leads to a local cooling,30 which might compensate the observed algorithmic heating of P. This is the reason why we carried out simulation G/A/ 2_P.2, which differs from G/H/2_P.2 only in the number of constraints (50 vs 10) within the peptide. In fact, Figure 2 displays for simulation G/A/2_P.2 a peptide temperature which is by 6.6 K cooler than that of the solvent, implying that the original heating has been overcompensated by the local cooling. A deviation of this size is unacceptable in simulations serving to probe the equilibrium properties of the solute. Thus, the simulation setup G/A/2 is a typical case in which one of the two remaining temperature control strategies P.l and P.3 described in the section 2 should be applied.
Hence, in simulation G/A/2_P.l we utilized a separate classical BT for temperature control of the peptide, while in simulation G/A/2_P.3 we applied a CHF thermostat. Figure 2 shows that in both cases there is no significant deviation of the observed peptide temperatures from the solvent temperature. Both methods are capable of correctly thermo-statting the solute. For the CHF thermostat we conclude that the choice of parameters (cf. section 3) was correct and that the underlying heat flow model describes the situation in this case. This success has motivated us to further scrutinize the validity of this model.
Validity of the Heat Flow Model. The second quite extended series of simulations (see Table 2 for the parameters) can serve to assess the validity of eq 19, which expresses the contents of the model. With eq 18 the model 19 can be equivalently reformulated as
TP=Ts+^(fiP+aP) (20)
showing that the solute temperature 7> should depend linearly on the heating power j3p of the solute thermostat. To specify the unknown parameters dp and tsp in eq 20, one needs measurements of Ts and 7> from two simulations employing different heating powers j3p.
Estimates ftp for the heating powers /3p can be determined from simulations by evaluating eq 13 specifically for the case of a solute thermostat, i.e. for k = P, To = To, p,t = tp, and T = Tp. One obtains
h=^-\T0,p-TP] (21)
which is, up to the use of different averages, identical to eq 18. Thus, at a constant coupling time tp, the heating power j3p is steered by the choice of the target temperature 7o,p and measured through the average peptide temperature Tp. Therefore, the linear relationship 20 between Tp and j3p can be checked by comparing with data points (Tp,fip) obtained from simulations employing different target temperatures To, p.
An inspection of the first five simulation sets in series #2 listed in Table 2 shows that this set qualifies both for the evaluation of the unknown parameters in eq 20 and for the check of this linear equation. In all simulation sets of series #2, the simulation protocol was G/A/2 just like in the
340I I 1 I 1 I 1 I 1 I 1 I V 330 _«chf.o
.    ■ CHF.l , ** .
g- 320 - ♦ CHF.2 "
— ▲ CHF.3 ^ "*
<^310 - ▼ cHF.4
/3P(fcsK/ps)
Figure 3. The average temperatures tP of the peptide ALA8 (in SPC water at ts = 300 K) resulting from constant local heating with different powers (5P in the simulation sets CHF.O to CHF.4 of series #2 (cf. Table 2). The prediction of linear heat flow model eq 20 is drawn as a dashed line, and the solvent temperature ts is indicated as a dotted line (see the text for explanation).
simulation G/A/2.P.2 of the first series. However, the temperature control scenario P.2 (no separate thermostat for the peptide) was employed only in simulation CHF.O. In the remaining CHF simulations a BT was coupled to P using an extremely slow coupling time tp = 500 ps combined with a large and increasing target temperature (cf. Table 2). According to eq 21 this choice leads to a heating power j3p of this thermostat, which increases from simulation CHF.O (j3p = 0) to simulation CHF.4. Figure 3 shows the observed stationary peptide temperatures TP as a function of the observed heating power Jip. In the case of the simulation set CHF.O (black dot) the result of simulation G/A/2_P.2 (cf. Figure 2) is closely recovered because the same temperature control setting P.2 was applied, i.e. TP was by 6.5 K smaller than the solvent temperature of Ts = 300 K. With nonzero and successively growing fiP the peptide temperature tP is seen to increase.
The dashed line in Figure 3 expresses the linear relation 20 between j3p and Tp. The required parameters were determined as dp = —2.02kB K/ps and rSp = 1.60 ps from the simulation sets CHF.O and CHF.2. Therefore, the dashed line linearly interpolates between the data points 0p,tp) of these two simulation sets. The above values of the parameters dp and tsp closely agree with those calculated earlier (see Methods) for setting up the CHF thermostat used in simulation G/A/2.P.3. This result is expected because in both cases the parameters dp and tsp were computed from simulations employing the same parameters.
In simulation set CHF.l, the peptide temperature was nearly identical to Ts with Tp = 299.5 K (black square in Figure 3) because here the thermostat parameters were chosen equal to those of the simulation G/A/2.P.3 (series #1), which realizes the P.3 strategy. The temperature Tp predicted for CHF.l by the dashed line deviates by only 0.5 K from the observed average. This deviation is probably significant because the temperature averages shown in the figure are extremely well converged (otp< 0.1 K) due to the extended statistics. If a similar interpolation would be constructed using the data from the simulation sets CHF.3 or CHF.4 instead of CHF.2, the error in the prediction for CHF.l would increase to 1.1 K or 2.2 K, respectively, with increasing violation of the approximate linear relation 20 between j3p and Tp. In the case of 8ALA in explicit water, the assumption of a linear thermal coupling between solvent
1302   J. Chem. Theory Comput., Vol 4, No. 8, 2008
Lingenheil et al.
7>(K)
Figure 4. Temperature dependence of the peptide backbone dynamics of 8ALA. The graph shows the average number of transitions per angle and nanosecond of the ^--dihedral angles between the a-type region [-60°,-30°] and the /3-type region [95°, 145°] for the five CHF simulation sets (filled circles) and the two CLS sets (empty squares) over the observed average peptide temperature tP. The error bars give the range of plus/ minus one standard deviation. Additionally, an Arrhenius69 model (dashed line) fitted to the CHF data is plotted. The simulation parameters are summarized in Table 2.
and solute (eq 15), thus, obviously breaks down if Tp deviates by more than about 10 K from Ts, which is probably also true for related simulation systems. In test simulations serving to set up a CHF thermostat through eq 19, the deviation \TP — Ts\ should, thus, be smaller than about 10 K if one wants to guarantee an accurate tuning of TP in applications of strategy P.3.
Backbone Dynamics. As we have seen further above, the use of an inappropriate strategy for temperature control can lead to peptide temperatures considerably deviating from that of the solvent. It seems likely that such a deviation can entail an altered conformational dynamics of the peptide. To check this expectation, we analyzed the second simulation series also in this respect. Due to the extremely slow thermostat coupling employed in CHF.O to CHF.4, here, the dynamics should be exclusively affected by differences in the peptide temperatures.
Figure 4 shows how the kinetics of conformational transitions in 8ALA is modified by Tp in CHF.O—4 (black dots). This kinetics is measured by local flip rates of backbone torsional angles (see the figure caption). As expected, the flip rates increase with the temperature. A simple Arrhenius model69 fitted to the CHF data is drawn as a dashed line. This model yields an energy barrier of 434fc# K for the backbone flips. This value is well in the range of typical barrier heights reported for biomolecules in the literature.70
Having estimated the influence of the temperature on the conformational dynamics of our sample peptide 8ALA in SPC water, it seems appropriate to check whether a separate classical BT (as frequently applied in strategy P.l) changes the dynamics. Here, particularly a slowing down seems possible because a rapidly coupled thermostat can interfere with long-lasting energy fluctuations within the peptide, which are caused by random in- and outflow of energy from the solvent. For the purpose of such a check, we carried out the simulation sets CLS. 1—2 listed in Table 2, in which a classical BT separately coupled to P enforced temperatures Tp of about 300 K and 340 K, respectively.
	i 11 ritii]  i i iiiiii|  i 11	7TTT]	:
r			"5
r	O-OALDI, BT		
-	q--d 8ALA, BT		
_	■ ■■ 8ALA, NHT		—
7	........1 ........1  ' ''	Mill	i iHiml :
-i -7 -1 n 1 10      10      10      10 10
Tp (PS)
Figure 5. Root mean perturbation quotients |ca at the Ca atoms of 8ALA and ALDI evaluated from simulation series #3 for the NHTs and BTs, respectively, for different coupling times Tp of the peptide thermostats.
Figure 4 compares the flip rates observed when using a classical Berendsen thermostat (open squares) with the data for the CHF thermostat (filled circles) and demonstrates that our expectation is actually met. Thus, if one wants to sample the equilibrium fluctuations of a peptide in solution by MD as rapidly as possible, or if one wants to gain access to the kinetics of nonequilibrium relaxation processes, the separate coupling of a classical BT to a small peptide seems counterproductive.
We interpret the above result by the following physical picture: A rapidly coupled BT likewise dampens fluctuations to higher and lower energies, thus leading to the correct average temperature. However, barrier crossings are enabled by rare accidental accumulations of a critical amount of energy in the respective collective coordinates. Particularly by dampening the higher energy fluctuations of the peptide, a classical BT makes such accumulations and, thus, barrier crossings less likely. Note that we have additionally checked the performance of a NHT in the same setting. We found no reduction of flip rates (data not shown) as could be expected for a thermostat maintaining the canonical energy fluctuations.
Local Perturbations of the Dynamics. The flips of backbone dihedral angles are collective movements and, therefore, are not directly related to the perturbation which a thermostat inflicts on the dynamics of individual atoms. To check the latter, we collected from simulation series #3 (cf. Table 3) all those forces acting on the Ca atoms of 8ALA which are required for the evaluation of the perturbation quotients (4). We carried out this data collection for BTs and NHTs with coupling times tp covering 4 orders of magnitude. In the case of the smaller ALDI model, we concentrated on the Berendsen approach.
Figure 5 shows the resulting perturbation ratios (4) evaluated using the approximate expression 6. As demonstrated by the squares marking the 8ALA results, the perturbations £ca are small for both thermostats and decrease over a wide range linearly with the inverse of tp. For the classical BT (tp = 0.1 ps) the fca are only about 0.5%. Furthermore, the smaller ALDI model exhibits slightly larger ?ca (open diamonds) than 8ALA (open squares). However, this size-induced difference is much smaller than that between the NHTs and BTs. At a given tp, Nose-Hoover coupling inflicts perturbations which are by 1 order of
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1303
magnitude larger than in the Berendsen case (cf. Figure 5). For a Berendsen coupling of maximal strength (i> = 0.001 ps) the perturbation is comparable to that of a NHT with tp as large as 0.064 ps. Furthermore, for Nose-Hoover coupling tp cannot be chosen larger than about 0.256 ps where £ca is about 1% and, thus, not particularly small. In the given case of 8ALA, one otherwise observes long-lasting and artificial temperature oscillations, i.e. the so-called Toda daemon44 (data not shown).
One can compare the perturbations shown in Figure 5 to those which are inflicted by a CHF thermostat as employed in strategy P.3. In simulation G/A/2_P.3, the peptide 8ALA was kept at 300 K with a perturbation ratio of £ca ~ 10~4. As can be seen from Figure 5, this ratio corresponds to a Berendsen coupling time larger than 1 ps in the classical thermostat setup. However, a classical BT with 7o,p = Ts and tp > 1 ps cannot properly control the temperature because then tp is in the range of solvent—solute coupling time (tsp =1.6 ps), i.e. is too slow (cf. section 2). On the other hand, a more strongly coupled thermostat with tp = 0.1 ps does the job, but then the perturbation is more than ten times stronger than for a CHF thermostat.
The above analysis was based on data for perturbation ratios derived through the approximate expression 6 and, therefore, depends on the validity of this equation. The first assumption made in the derivation (cf. section 2) of eq 6 was that the atomic velocities f,(?) and the thermostat variable y(t) are uncorrected. We have checked this assumption for simulation series #3 by evaluating eq 5 with and without taking the correlation into account; the relative difference was less than 10~2 for both 8ALA and ALDI (data not shown).
The second assumption was that the individual atomic velocities f;(0 are drawn from an undisturbed Maxwell distribution and can be checked by comparing results of the exact expression 5 with results of the approximate expression 6. We evaluated these expressions for the trajectories of series #3 and determined the root-mean-square deviations. In the worst case of a BT at the maximum coupling strength (tp = 0.001 ps), we found root-mean-square deviations amounting to 8.3% of the mean thermostatic force for 8ALA and to 14% for ALDI. In view of the moderate statistics provided by the 250 ps simulations employed in series #3, the estimate 6 is fairly reliable. Thus, eq 6 is adequate if one wants to estimate thermostatic forces.
Temperature Fluctuations. In our suggestion of the minimally invasive CHF thermostat characterizing strategy P. 3 we were guided by the notion that a properly thermo-statted explicit solvent system is a canonical heat bath for an uncontrolled solute. To check this assumption, we compare in Figure 6 the canonical ^-distribution (eq 7) for the instantaneous peptide temperature Tp(t) with results from simulation G/A/2_P.3. For the 103 degrees of freedom of 8ALA, the ^-distribution (solid line) resembles a Gaussian (dashed line), which is expected for very large systems. Remarkably, the MD results (circles) closely reproduce the slight asymmetry of the ^-distribution. This agreement strongly indicates that the peptide has sampled the canonical ensemble in the simulation G/A/2_P.3. We have verified this
» 0,2 or
200
300
TP(K)
400
440
500
Figure 6. Distribution of the instantaneous temperature TP(t) of 8ALA (in SPC water at 300 K) during the 20 ns MD simulation G/A/2_P.3 (dots). The dashed line is a Gaussian fit to the data. The canonical distribution (eq 7) is drawn as a solid line.
1.0 0.8 0.6
b
S" 0.4 0.2 0.0
I I 11 Mill—I I llllll|—I I I IIIIII—TT
ERP
TS
A'
£,V""»"-'*—*
0/ ■ -■8ALA(H20) Ii ♦--♦ALDI (H20) ;l 8ALA (vac)
❖ ■••❖ALDI (vac)
........I_........i_........i_' '
uiL
10
10
10- 10"
Tp (PS)
10
Figure 7. Ratio aTJaTp of measured and canonical temperature fluctuations for various coupling times xP of a Berendsen solute thermostat. The model peptides are 8ALA (squares) and ALDI (diamonds). Simulations were performed in explicit water (H20, filled symbols) and vacuum (vac, empty symbols) for both peptides. Simulation parameters are given in Table 4.
result for a series of further CHF simulations. It did not change for larger solvent systems and was independent of the coupling time for the solvent thermostat provided that the solvent temperature remained well-tuned (data not shown).
To estimate how a classical BT separately coupled to a peptide (strategy P.l) affects its global statistical properties, we determined the temperature fluctuations of the peptides 8ALA and ALDI, respectively, as measured by the standard deviation djP in a fourth series of simulations (for details see Table 4). Figure 7 shows the ratio of djP and OjP, which is the value theoretically expected for a canonical ensemble and is given by eq 8. For peptides in explicit solvent the figure shows that OtpIotp is always smaller than one and approaches that limit for large tp. Thus, in the classical setting (tp = 10~2 ps) a BT strongly suppresses the canonical temperature fluctuations. These fluctuations successively become restored with increasing tp. The full range of canonical fluctuations is reached at coupling times tp > 10 ps, i.e. at values exceeding the solvent-peptide heat coupling time tsp by a factor of 10. As a result, the separate BT is effectively disconnected from the peptide, the solute—solvent heat exchange term j3sp dominates the heat balance eq 19, and strategy P.l reduces to the noninvasive strategy P.2.
1304   J. Chem. Theory Comput., Vol 4, No. 8, 2008
Lingenheil et al.
Figure 7 not only reveals the general suppression of temperature fluctuations within a peptide by a classical BT but also demonstrates through a comparison with vacuum simulation data that these fluctuations are caused (i) by a fast exchange of kinetic and potential energy within a peptide and (ii) by a slower energy exchange with the solvent.
In vacuum simulations, the exchange of kinetic and potential energy within the peptide is the only cause of temperature fluctuations. As shown by the data, a rapidly coupled Berendsen thermostat (i> < 0.1 ps) suppresses these microcanonical fluctuations in the same way as it suppresses the canonical temperature fluctuations of a solvated peptide. However, at slower coupling times t> the thermostat is seen to no longer affect the microcanonical fluctuations. The clear saturation of OtJotp at t> > 0.1 ps demonstrates that the microcanonical fluctuations occur on time scales below 0.1 ps. In contrast, additional fluctuations of a solvated peptide are still suppressed by the thermostat with even slower coupling. Thus, as claimed above, they occur on longer time scales.
In order to retain the correct statistics for the solute, it is important to choose the coupling time t> for the thermostat longer than the typical time scale of the canonical fluctuations, which, in our case, is in the range of 10 ps, as can be seen from Figure 7. However, this time may even be longer for more weakly coupling solvents or larger solutes.
5. Conclusions
Every thermostat changes the dynamics of the controlled system to a larger or lesser extent. Measured on a microscopic scale, these changes are by about 1 order of magnitude smaller for BTs than for NHTs (cf. the data on the perturbation quotients displayed in Figure 5). On the other hand, NHTs, in contrast to BTs, guarantee the canonical ensemble. For instance, as shown by the results on the temperature fluctuations (Figure 7), BTs suppress all those canonical energy fluctuations which are slower than the time scale x at which the BT is coupled to the system.
Whether such changes can modify the specific observables to be extracted from a simulation and to be compared with experimental data is a priori unclear in many cases. Even if one suspects that a given thermostat could possibly introduce an artifact into the computation of a certain observable, one may have to spend an enormous computational effort for a statistically clear proof. In fact, to prove a suspected dampening of peptide flip rates by a standard BT, we had to spend about 400 ns of simulation time on each of the data points to get the statistical certainty shown in Figure 4.
Especially if the popular strategy P.l is applied to a solute—solvent system, the specific drawbacks of the various thermostat algorithms may directly affect the properties of the solute. The P.l strategy with a BT is expected to cause artifacts of type a), i.e. artifacts resulting from an incorrect thermodynamical ensemble. In fact, as we have shown for a sample peptide, the dampening of the canonical energy fluctuations due to the BT can lead to reduced peptide flip rates. Furthermore, one expects that the combination of P. 1 with the NHT will render the solute vulnerable to artifacts of type b), i.e. lacking ergodicity. Using the P.l strategy with
other thermostats which suffer neither from type a) nor type b) drawbacks (e.g., the Nose-Hoover chain) still perturbs the dynamics much more strongly than necessary, i.e. such a strategy is prone to introduce artifacts of type c) (dynamics).
Given the need for some sort of temperature control in large scale MD simulations of complex systems, the optimal strategies to avoid artifacts of types a), b), and c) are P.2 or P.3, respectively. Here, the minimally invasive strategy P.3, which employs a constant heat flow to compensate the algorithmic heat production in the solute, has to be applied only if the noninvasive strategy P.2 turns out to be ineffective in a sufficiently extended test simulation. Strategy P.3 reduces the perturbation of the solute's dynamics to a minimum while keeping it nevertheless properly tempered. The precise protocol to set up a P.3 scheme is given in the Appendix.
The preservation of the canonical ensemble within the solute through strategies P.2 and P.3 (despite the use of a standard BT for the solvent which strongly perturbs the temperature fluctuations in this part of the system) is the most important result of this paper and proves our hypothesis that an explicitly simulated solvent of the correct temperature Ts represents the optimal thermostat for a solute. Admittedly, our quantitative analysis of the applicability of strategies P.2 and P.3 is restricted to relatively small peptides because an extended statistics is required for reliable results. Already for the small peptides with their short temperature autocorrelation times of 15 ps, it takes more than 10 ns to determine the average temperature with an accuracy of 1 K. For larger systems, the temperature autocorrelation times increase and so do the simulation times required for accurate temperature measurements. Too short simulations can easily lead to the false impression that the solute temperature sizably differs from the solvent temperature. To our experience, the noninvasive strategy P.2 can suffice for quite large solvent—solute systems. For instance, reinspecting a simulation8 of the C-terminal domain of the human prion protein (residues 125—228), which employed a global thermostat coupling (strategy G), we found that the protein temperature deviated by more than 10 K from that of the solvent. Subsequent simulations of a slightly larger fragment (residues 114—228), which employed strategy P.2 but otherwise the same simulation setup, showed no significant temperature difference. In the few cases in which one observes a seemingly intolerable temperature difference between solute and solvent, one can still use the solvent as the heat bath by applying the minimally invasive strategy P.3 to keep the solute well tempered.
It should be noted that our heat flow model and the associated setup protocol for the constant heat flow strategy P.3 are restricted to two subsystems with homogeneous local algorithmic heating rates. For simulations of more complex systems such as protein-DNA assemblies in solution, for which one expects more than two different heating rates, a constant heat flow strategy can be analogously designed. However, it will become increasingly difficult to determine the local heating rates of the various subsystems which have to be compensated.
The "Hot-Solvent/Cold-Solute" Problem Revisited
J. Chem. Theory Comput., Vol. 4, No. 8, 2008 1305
Acknowledgment. This work was supported by the Bavarian joint research project ForPrion (LMU02), by the VolkswagenStiftung (1/79 884), and by Deutsche Fors-chungsgemeinschaft (SFB533/C1, SFB749/C4).
Appendix: Setting up Strategies P.2 and P.3
Here, we give a detailed description of the steps needed in order to set up a simulation system containing a macromol-ecule P in thermal equilibrium with an explicit solvent enviroment S according to the strategies P.2 and P.3, respectively, using the standard Berendsen algorithm. After preparation (e.g., removal of close solvent—solute contacts by energy minimization), the following steps are necessary:
a) Heating phase: The subsystems are heated using two separate classical BTs (e.g., rs = tp = 0.1 ps) to the temperature rs;m desired in the production simulation. Depending on the initial deviations of the solute temperature Tp and solvent temperature 7s, it may take a simulation time of up to 30ts/p for the respective subsystems to safely attune
to rsim.
b) Relaxation phase I: The solute is decoupled from its thermostat (tp = °°) and relaxes to its new steady state temperature TPy l. The time constant for the relaxation to the steady state is the solvent—solute coupling time tsp. Since tsp is still unknown, an upper limit estimate (e.g., rSp ~ 20 ps) should be used to determine the relaxation time freiax ~ lOtsp.
c) Test simulation I: Here, the solute remains decoupled from its thermostat and the simulation serves to determine its average temperature 7>, i. If the deviation from equilibrium measured by 17V ; — rs;ml is less than an acceptable tolerance ATp, then the noninvasive strategy P.2 is applicable, and one may directly continue the simulation for data production f).
The necessary simulation time tl for the test depends on the tolerable uncertainty ofp, of the measured solute temperature Tpt i, which forms an upper bound for the uncertainty 6jr in the prediction of the production run temperature Tp. If ATp is the accuracy required for the prediction, we should make sure that ófp , < ATp. By eq 9 the simulation time then is t\ = 2xcOjJATp, where rc is the temperature autocorrelation time of the solute, and OjP is the standard deviation of its temperature fluctuations, which were observed during the test run. One typically obtains simulation times of several nanoseconds.
d) Relaxation phase II: The solute is coupled to a separate thermostat with a coupling time tp > 500 ps intended for the P.3 production run. Using an estimate for tsp (e.g., 1 ps), a reasonable choice for the target temperature is given by To, p, 2 — — tpltsp' ITp, i — rsiml (leading to 2-fold overcompensation if tsp was exact). The duration of this relaxation phase is the same as in step b).
e) Test simulation II: The average temperature 7>2 is determined. The simulation time h should be equal to t\ in step c).
f) Production simulation: If strategy P.2 turned out to be applicable in step c), the settings in this simulation are chosen identically (in fact, one may regard the test run as the initial part of the production simulation). Otherwise, the target
temperature Tq,p for a P.3 simulation is determined from the two test simulations by
T   — T
1 0,p     1 sim
Tq,p,2 Tp2
(1
(2;
(3
(4;
(5
(6; a
(8; (9;
(io; (li
(12;
(13 (14
(15
(i6;
in; (is;
(i9;
(2o;
(21
1 p,2
(^sim ^P,l)
(22)
References
Karplus, M.; McCammon, J. A. Nat. Struct. Biol. 2002, 9, 646-652.
Gnanakaran, S.; Nymeyer, H.; Portman, J.; Sanbonmatsu, K. Y.; Garca, A. E. Curr. Opin. Struct. Biol. 2003,13, 168-174.
Grubmüller, H.; Heymann, B.; Tavan, P. Science 1996, 271, 997-999.
Rief, M.; Oesterhelt, F.; Heymann, B.; Gaub, H. E. Science 1997, 275, 1295-1297.
Brockwell, D. J.; Pad, E.; Zinober, R. C.; Beddard, G. S.; Olmsted, P. D.; Smith, D. A.; Perham, R. N.; Radford, S. E. Nat. Struct. Biol. 2003, 10, 731-737.
Kucera, K.; Lambry, J. C.; Martin, J. L.; Karplus, M. Proc. Natl. Acad. Sei. U.S.A. 1993, 90, 5805-5807.
Spörlein, S.; Carstens, FL; Satzger, FL; Renner, C.; Behrendt, R.; Moroder, L.; Tavan, P.; Zinth, W.; Wachtveitl, J. Proc. Natl. Acad. Sei. U.S.A. 2002, 99, 7998-8002.
Hirschberger, T.; Stork, M.; Schropp, B.; Winklhofer, K. F.; Tatzelt, J.; Tavan, P. Biophys. J. 2006, 90, 3908-3918.
Mayor, U.; Guydosh, N. R.; Johnson, C. M.; Grossmann, J. G.; Sato, S.; Jas, G. S.; Freund, S. M. V.; Alonso, D. O. V.; Daggett, V.; Fersht, A. R. Nature 2003, 421, 863-867.
Shea, J.-E.; Brooks, C. L. I. Annu. Rev. Phys. Chem. 2001, 52, 499-535.
Privalov, P. L. Physical basis of the stability of the folded conformations of Proteins. In Protein Folding; Creighton, T. E., Ed.; W. H. Freeman: San Francisco, 1992.
Munishkina, L. A.; Phelan, C.; Uversky, V. N.; Fink, A. L. Biochemistry 2003, 42, 2720-2730.
Vitkup, D.; Ringe, D.; Petsko, G. A.; Karplus, M. Nat. Struct. Biol. 2000, 7, 34-38.
Tavan, P.; Carstens, FL; Mathias, G. Molecular Dynamics Simulations of Proteins and Peptides: Problems, Achievements, and Perspectives. In Protein Folding Handbook. Part 1; Büchner, J., Kiefhaber, T., Eds.; Wiley-VCH: 2005.
Frauenfelder, FL; Petsko, G. A.; Tsernoglou, D. Nature 1979, 280, 558-563.
Ishima, R.; Torchia, D. A. Nat. Struct. Biol. 2000, 7, 740-743.
Ichiye, T.; Karplus, M. Biochemistry 1983, 22, 2884-2893.
Lazaridis, T.; Karplus, M. Biophys. Chem. 2003, 100, 367-395.
Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999, 314, 141-151.
Hansmann, U. H. E. Chem. Phys. Lett. 1997, 281, 140-150.
Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Proc. Natl. Acad. Sei. U.S.A. 2005, 102, 13749-13754.
(22) Carstens, H. Konformationsdynamik lichtschaltbarer Peptide: Molekulardynamiksimulationen und datengetriebene Modell-
1306   J. Chem. Theory Comput., Vol. 4, No. 8, 2008
Lingenheil et al.
(23 (24 (25
(26
(27
(28
(29;
(30;
(31
(32
(33
(34 (35 (36 (37
(38
(39;
(40 (41 (42 (43
(44
(45
(46
(47
bildung, Dissertation, Fakultät für Physik, Ludwig-Maximil-lians-Universität München, 2004.
Köper, G. J. M.; Reiss, H. J. Phys. Chem. 1996, 100, 422-432.
Swope, W. C; Pitera, J. W.; Suits, F. J. Phys. Chem. B 2004, 108, 6571-6581.
Swope, W. C; Pitera, J. W.; Suits, F.; Pitman, M.; Eleftheriou, M.; Fitch, B. G.; Germain, R. S.; Rayshubski, A.; Ward, T. J. C.; Zhestkov, Y.; Zhou, R. J. Phys. Chem. B 2004, 108, 6582-6594.
Nutt, D. R.; Smith, J. C. J. Chem. Theory Comput. 2007, 3, 1550-1560.
Mathias, G.; Egwolf, B.; Nonella, M.; Tavan, P. J. Chem. Phys. 2003, 118, 10847-10860.
Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089-10092.
Grubmüller, H.; Tavan, P. J. Comput. Chem. 1998, 19, 1534-1552.
Kraeutler, V.; van Gunsteren, W. F.; Hünenberger, P. H. J. Comput. Chem. 2001, 22, 501-508.
Sagui, C.; Darden, T. A. Annu. Rev. Biophys. Biomol. Struct. 1999, 28, 155-179.
Oda, K.; Miyagawa, H.; Kitamura, K. Mol. Simul. 1996, 16, 167-177.
Cheng, A.; Merz, K. M. J. Phys. Chem. 1996, 100, 1927-1937.
Guenot, J.; Kollman, P. A. Protein Sei. 1992, 1, 1185-1205.
Arnold, G. E.; Ornstein, R. L. Proteins 1994, 18, 19-33.
Hünenberger, P. H. Adv. Polym. Sei. 2005, 173, 105-149.
Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Di Nola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684-3690.
Harvey, S. C.; Tan, R. K.-Z.; Cheatham, T. E. J. Comput. Chem. 1998, 19, 726-740.
Chui, S.-W.; Clark, M.; Subramaniam, S.; Jakobsson, E. J. Comput. Chem. 2000, 21, 121-131.
Morishita, T. J. Chem. Phys. 2000, 113, 2976-2982.
Nose, S. J. Chem. Phys. 1984, 81, 511-519.
Hoover, W. G. Phys. Rev. A 1985, 31, 1695-1697.
Tuckerman, M. E.; Liu, Y.; Ciccotti, G.; Martyna, G. J. J. Chem. Phys. 2001, 115, 1678-1702.
Holian, B. L.; Voter, A. F.; Ravelo, R. Phys. Rev. E 1995, 52, 2338-2347.
Posch, H. A.; Hoover, W. G.; Vesely, F. J. Phys. Rev. A 1986, 33, 4253^1265.
D'Alessandro, M.; Tenenbaum, A.; Amadei, A. J. Phys. Chem. B 2002, 106, 5050-5057.
Tobias, D. J.; Martyna, G. J.; Klein, M. L. J. Phys. Chem. 1993, 97, 12959-12966.
(48) Martyna, G. J.; Klein, M. L.; Tuckerman, M. J. Chem. Phys. 1992, 97, 2635-2643.
(49) Bussi, G.; Donadio, D.; Parrinello, M. J. Chem. Phys. 2007, 126, 014101(1-7)
(50) Allen, M. P.; Tildesley, D. J. Computer Simulations of Liquids; Oxford University Press: Oxford, 1987.
(51) Stange, K. Angewandte Statistik; Springer: Heidelberg, 1970.
(52) Weber, W.; Hünenberger, P. H.; McCammon, J. A. J. Phys. Chem. B 2000, 104, 3668-3675.
(53) Daura, X.; Gademann, K.; Schäfer, H.; Jaun, B.; Seebach, D.; van Gunsteren, W. F. J. Am. Chem. Soc. 2001, 123, 2393-2404.
(54) Villareal, M. A.; Montich, G. G. J. Biomol. Struct. Dyn. 2005, 23, 135-142.
(55) van den Bosch, M.; Swart, M.; Snijders, J. G.; Berendsen, H. J.; Mark, A. E.; Oostenbrink, C; van Gunsteren, W. F.; Canters, G. W. ChemBioChem 2005, 6, 738-746.
(56) Monticelli, L.; Simoes, C; Belvisi, L.; Colombo, G. J. Phys.: Condens. Matter 2006, 18, S329-S345.
(57) Fox, T.; Kollman, P. A. Proteins 1996, 25, 315-334.
(58) Kong, Y.; Ma, J.; Karplus, M.; Lipscomb, W. N. J. Mol. Biol. 2006, 356, 237-247.
(59) Lindahl, E.; Hess, B.; van der Spoel, D. J. Mol. Model. 2001, 7, 306-317.
(60) Niedermeier, C; Tavan, P. J. Chem. Phys. 1994, 101, 734-748.
(61) Niedermeier, C; Tavan, P. Mol. Simul. 1996, 17, 57-66.
(62) Mathias, G.; Tavan, P. J. Chem. Phys. 2004, 120, 4393-4403.
(63) Eichinger, M.; Grubmüller, H.; Heller, H.; Tavan, P. J. Comput. Chem. 1997, 18, 1729-1749.
(64) Scott, W. R. P.; Hünenberger, P. H.; Tironi, I. G.; Mark, A. E.; Billeter, S. R.; Fennen, J.; Torda, A. E.; Huber, T.; Krüger, P.; van Gunsteren, W. F. J. Phys. Chem. A 1999,103, 3596-3607.
(65) Berendsen, H. J. C; Postma, J. P. M.; van Gunsteren, W. F.; Hermans, J. Interaction Models for Water in Relation to Protein Hydration. In Intermolecular Forces; Pullman, B., Ed.; D. Reidel Publishing Company: Dordrecht, 1981.
(66) MacKerell, A. D.; et al. J. Phys. Chem. B 1998, 102, 3586-3616.
(67) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926-935.
(68) Guenot, J.; Kollman, P. A. J. Comput. Chem. 1993,14, 295-311.
(69) Gardiner, C. W. Handbook of Stochastic Methods, 2nd ed.; Springer: Berlin, 1985.
(70) Zuckerman, D. M.; Lyman, E. J. Chem. Theory Comput. 2006, 2, 1200-1202.
CT8000365