R E S E A R C H A R T I C L E
What makes it difficult to refine protein models further via
molecular dynamics simulations?
Lim Heo | Michael Feig
Department of Biochemistry and Molecular
Biology, Michigan State University, East
Lansing, Michigan
Correspondence
Michael Feig, 603 Wilson Road, Room 218
BCH, East Lansing, MI 48824, USA.
Email: feig@msu.edu
Abstract
Protein structure refinement remains a challenging yet important problem as it has the potential to
bring already accurate template-based models to near-native resolution. Refinement based on
molecular dynamics simulations has been a highly promising approach and the performance of
MD-based refinement in the Feig group during CASP12 is described here. During CASP12, sampling
was extended well into the microsecond scale, an improved force field was applied, and new
protocol variations were tested. Progress over previous rounds of CASP was found to be limited
which is analyzed in terms of the quality of the initial models and dependency on the amount of
sampling and refinement protocol variations. As current MD-based refinement protocols appear to
be reaching a plateau, detailed analysis is presented to provide new insight into the major challenges
towards more extensive structure refinement, focusing in particular on sampling with and
without restraints.
K E Y W O R D S
protein structure prediction, structure refinement, molecular dynamics simulation, CASP, Markov
state models, homology models
1 | INTRODUCTION
Computational protein structure prediction has become a valuable tool
in structural biology.1
Template-based modeling is now applicable to
ever more folds as the number of structures in the Protein Data Bank
(PDB) continues to grow rapidly, especially because of advances with
cryo-electron microscopy (Cryo-EM)2
and structure determination via
X-ray free electron lasers (XFEL).3
As a result, the chance of finding the
structure of a close homolog for a given sequence is increasing so that
accurate protein structure prediction is routinely possible.4
Moreover,
modeling based on co-evolutionary information allows high-resolution
modeling even in the absence of a homologous structure.5,6
This has
been enabled by next-generation genome sequencing (NGS), which
rapidly increased the number of known sequences for homologous proteins
in different organisms as the basis for predicting residue-residue
contacts with high accuracy.7
Complementary to the advances outlined above, protein structure
refinement methods have been increasingly successful in further
improving initial knowledge-based predictions via ab initio techniques.8
Evidence of significant protein structure refinement has first emerged
around 2010, at CASP9.9
In CASP10, refinement based on molecular
dynamics (MD) led to a significant step forward after introducing the
idea of ensemble averaging rather than attempting to select a single
snapshot from a generated structural ensemble.10–12
The progress with
MD-based methods has also been catalyzed by continuous improvements
in force fields and ability to sample extensively.11
MD-based
refinement so far is able to provide moderate and consistent structure
improvements,13
and recent protocols have been optimized to
limit the computational costs by taking advantage of GPU comput-
ing.14
MD-based refinement has also contributed to significant
improvements in local structure quality such as correct stereochemistry
and avoidance of clashes, which is almost always possible even
if the global structure cannot be improved.15
A number of alternate
refinement protocols have also been proposed: In CASP11, Della
Corte et al. devised a method where homologous structure information
was utilized to smooth the potential energy landscape and
lower the energy barriers thereby enhancing sampling during MD
simulations.16
Lee et al. and Park et al. proposed methods that
reconstruct unreliable regions such as loops and termini, followed by
global relaxation.17,18
However, all of the refinement methods available
to date, are still limited to relatively localized structural changes
and perform best when the initial model is already relatively close to
Proteins. 2018;86:177–188. wileyonlinelibrary.com/journal/prot VC 2017 Wiley Periodicals, Inc. | 177
Received: 1 August 2017 | Revised: 11 September 2017 | Accepted: 29 September 2017
DOI: 10.1002/prot.25393
the native state. It remains very challenging to consistently improve
models with significant errors.19
In this article, further tests of MD-based protein structure refinement
methods during CASP12 are described. Commonly, weak
restraints have to be applied for MD-based refinement to succeed,12,20
which limits the extent of possible refinement. Since the protein force
field has been improved further21
and GPGPU computing has become
more widely available, MD simulations were carried out over longer
time scales with weaker restraints during CASP12. The hope was to be
able to sample more broadly, and, ultimately, refine structures more
extensively. Other ideas that were explored during CASP12 involved
iterative refinement along the initial direction of refinement. However,
while consistent moderate refinement was again possible, we
could not significantly expand the extent of refinement indicating
that MD-based refinement methods using current protocols may
have reached a plateau where simply running longer simulations is
not offering further advantages. So, a main focus of this article is to
look back and summarize the lessons we have learned until now and
discuss where the key barriers are that need to be overcome to
move forward.
2 | METHODS
The general approach to protein structure refinement during CASP12
followed our previous protocol that involved long, weakly restrained
MD simulations to generate structural ensembles. From the ensembles,
a subset of structures was subsequently selected and averaged before
finishing with detailed refinement of the local stereochemistry using
locPREFMD.15
Variants of our previous protocol were tested during
CASP12. In the CASP12 protocol, shown as a flow chart in Figure 1,
initial models were first subjected to locPREFMD15
to improve the
stereochemical quality and obtain better starting structures for the MD
simulations. The resulting initial models were submitted as “Model 5”
during CASP12 as the most conservative attempt at structure refinement.
Afterwards, up to two rounds of MD-based refinement were
carried out. Generally, the same scoring and filtering scheme followed
by structure averaging was used as described previously and
applied in CASP1113
on different ensembles of structures generated
from MD trajectories under different conditions (see below). Before
submitting final models, locPREFMD was applied twice after ensemble
averaging and. depending on which MD ensemble was used, the
resulting structures were submitted as Models “1”, “2”, “3”, or “4”,
respectively.
The primary set of MD simulations in the first round consisted of
four runs at 500 ns, two runs at 400 ns, and 14 runs at 200 ns using
weak harmonic restraints applied to Ca atoms with respect to the initial
structure (force constant: 0.025 kcal/mol/Å2
) for a total of 5.6 ls of
sampling. This was about five times the amount of sampling applied
during CASP11.13
The combination of runs at different lengths was
predicated by the availability of GPU resources and the time limit for
submitting predictions during CASP. While this ambitious amount of
sampling could be reached for many targets within the CASP time
frame of a few weeks, some targets were too large and simulation
lengths had to be reduced in those cases. Each simulation was carried
out in explicit solvent starting with randomly assigned initial momenta.
The latest CHARMM force field, CHARMM36m,21
was used to take
advantage of further improvements in sampling backbone torsions,
especially in less-structure segments such as loops. Each protein was
solvated in a periodic cubic box with at least a 9 Å solvent buffer to
the box edge. The TIP3 water model22
as implemented in CHARMM
was used and all systems were neutralized by adding either sodium
(Na1
) or chloride (Cl–
) counter ions, as appropriate. A 10 Å cutoff with
switching between 8 and 10 Å was applied to the Lennard-Jones
potential and the direct part of the electrostatic potential. ParticleFIGURE
1 Refinement protocol applied by the FEIG group in CASP12. Each colored box with a dashed line depicts a structural ensemble
for which a common protocol of scoring, filtering, averaging and final local refinement was applied as depicted at the bottom
178 | HEO AND FEIG
mesh Ewald summation was used to evaluate the full electrostatic
potential.23
Bonds involving hydrogen atoms were kept rigid using
holonomic constraints. Each system was prepared by minimization and
subsequent heating to 298 K. Langevin dynamics was performed under
NVT condition at 298 K with a 2 fs time step and a friction coefficient
of 0.01/ps. The ensemble resulting from this set of simulations was filtered
and averaged to result in the structure submitted as “Model 3.”
To further enhance the initial conformational ensembles,
200,000 conformations were extracted from the first four 500 ns
MD simulations and subsequently clustered into five clusters using
the k-center algorithm implemented in MSMBuilder24
using the differences
in the pair-wise Ca distance matrix as the distance measure.
For each cluster, an averaged structure was generated from
which an additional 100 ns of MD simulations were performed in
the same manner as explained above except that restraints were
applied with respect to the cluster-averaged structure using a
reduced force constant of 0.005 kcal/mol/Å2
. The ensemble from
the five cluster-initiated simulations was filtered and averaged to
result in the structure submitted as “Model 4. The combined set of
structures resulting from the initial simulations plus the additional
simulations after clustering was used to generate the structure submitted
as “Model 1” In some cases, there was not enough time to
run the additional set of simulations after clustering. In this case,
“Model 3” was submitted also as “Model 1.”
A second round of refinement began with “Model 1” and involved
an additional MD simulation >200 ns without restraints. From the
resulting ensemble, structures were selected that advanced further
along the direction of the initial refinement with the idea that those
changes would bring the structure closer to the native state. The direction
of the initial refinement was determined by comparing “Model 1”
with the initial model and the conformations extracted from the second
round of MD were accordingly scored using both RWplus25
and the
vectorial direction similarity measure Sk calculated for a structure k
based on Equations 1 and 2.
Sk5
1
NCa
XNCa
i
vrefined
i Á vk
i (1)
vk
i 5
1
3
X
j2f21;0;1g
rk
i1j2rinitial
i1j
 
(2)
where the dot products of unit vectors of Ca deviation directions
between the refined model and a given structure k were averaged over
all NCa atoms. The Ca deviation direction was defined as the averaged
vector between a given Ca and its two adjacent Ca atoms in a structure
k from the initial model after overall least-squares structural superposition
according to Equation 2. vrefined
i was calculated in the same
manner.
Only structures generated during round two with Sk>0:4 and
r2r
rr
<22, where r denotes the RWplus score, were selected for structure
averaging, to be submitted as “Model 2.” If no such structures
were found, that is, the additional second round sampling did not
advance further along the initial direction of refinement, “Model 1” was
submitted also as “Model 2.”
The MD production runs were conducted on GPUs using
CHARMM26
with OpenMM27
via API integration, except for the
second-round single 200 ns simulation that was run using NAMD.28
Other MD simulations for equilibration, structure averaging, and locPREFMD
were carried out using CHARMM. Structure preparation,
selection, and analysis tasks relied on the MMTSB Tool Set.29
3 | RESULTS AND DISCUSSION
3.1 | Overall performance in CASP12
The overall results during CASP12 with the refinement protocol shown
in Figure 1 are given in Table 1. For one target (TR879), the refinement
protocol was applied incorrectly and the initial structure disintegrated
as a result. Because the results for this target do not reflect the
described protocol, the results were excluded from further analysis.
Based on all other targets, GDT-HA scores were improved by 1.6 units,
and four out of 36 targets could be refined by >5 GDT-HA units.
Refinement was highly consistent as structures were refined for 30 out
of 36 targets (83%). The performance was overall similar to previous
years, but the extent of refinement was actually a bit less than in
CASP1113
and CASP1011
despite the additional sampling and force
field improvements.
The new protocols tested in CASP12 did not provide significant
advantages. Although “Model 1” submissions (that included the snapshots
from the enhanced sampling following clustering, see above)
seemed to be slightly better than just following the CASP11 protocol,
statistical analysis indicates that none of the MD-refinement methods
were significantly different from each other (based on P values > 0.05
from the paired t test analysis on the common targets). An interesting
observation is that the refined models using just the snapshots from
the enhanced sampling following clustering (“Model 4”) showed a
greater variation with some models much better and others much
worse than the “Model 1” structures. Pearson’s correlation coefficients
for GDT-HA improvements between “Model 1” and “Model 3” and
“Model 4” are 0.96 and 0.68, respectively. This reflects broader sampling
as a result of the weaker restraints and the application of
restraints with respect to structures generated in the first round of
sampling. However, overall, the extent of refinement did not improve
as structures moved away from the native as much as they came
closer.
The second round of the refinement protocol produced models
only for about half of the targets, either because there was not enough
time to run additional sampling after the first round was complete or
because no structures were generated along the direction of the initial
refinement (see Methods). In general, where available, models after the
second round were refined to a similar extent as after the first round.
Although restraints were not used in the second round sampling, the
resulting models did not drift away from the native structure except for
two targets (TR891 and TR922-D1, where >2 GDT-HA units were lost
during the second round). This confirms an earlier conclusion11
that
after initial refinement, models are in a deep energy minimum from
which escape and additional refinement are very challenging.
HEO AND FEIG | 179
TABLE 1 Overall results for CASP12 refinement targets
Target Improvements for model 1 DGDT-HA for each modela
DGDT-HA –DRMSD (Å) DGDC-SC MolProbityb
1 2 3 4 5
TR862 2.96 0.06 20.85 0.51 2.96 3.50 2.96 1.61 20.81
TR866-D1 0.72 0.18 21.06 6.00 0.72 2.64 1.68 2.89 21.69
TR868-D1 21.91 20.02 22.41 0.83 –1.91 22.14 22.14 24.28 20.24
TR869 0.00 20.12 20.10 0.80 0.00 20.49 0.00 21.92 20.24
TR870-D1 0.69 20.10 22.20 0.97 0.69 0.00 1.38 0.91 0.46
TR872 21.41 20.04 20.07 0.57 –1.41 20.56 21.13 3.42 0.57
TR874 1.35 20.06 0.33 0.50 1.35 1.35 1.57 3.60 0.22
TR875 1.72 20.02 1.66 0.60 1.72 1.08 2.37 2.37 20.21
TR876-D1 0.74 0.04 20.29 0.55 0.74 0.74 0.49 0.98 0.00
TR877 2.11 0.02 21.63 0.53 2.11 0.17 1.94 20.35 0.35
TR879 218.97 20.19 216.36 2.21 –18.97 218.86 218.18 236.36 21.93
TR881 2.35 0.01 5.80 0.85 2.35 2.72 2.72 0.74 0.50
TR882 3.80 0.09 20.52 0.60 3.80 4.12 3.80 5.39 0.64
TR884 5.63 20.05 0.98 0.50 5.63 5.28 5.98 4.93 20.70
TR885-D1 3.13 0.05 3.03 0.50 3.13 1.69 3.13 1.69 0.97
TR887-D9 5.43 0.12 4.88 0.50 5.43 4.19 4.50 3.26 0.47
TR890 20.54 20.12 21.25 0.95 –0.54 - 0.26 24.13 20.40
TR891 0.22 0.08 21.06 0.72 0.22 22.68 0.00 1.34 0.89
TR893 0.00 20.03 211.62 1.42 0.00 - 21.92 21.77 0.00
TR894 1.85 0.03 6.16 0.50 1.85 - 1.39 8.80 1.39
TR594 1.13 20.05 1.01 0.50 1.13 - 0.57 2.81 20.56
TR694 1.24 20.05 1.78 0.85 - - 1.24 - 20.38
TR895 3.96 0.01 7.14 0.69 3.96 - 5.42 4.17 0.42
TR896 21.45 20.06 0.35 0.90 –1.45 23.19 21.45 22.03 0.29
TR898 2.37 20.05 0.25 0.50 2.37 2.13 0.48 0.95 20.23
TR901 21.01 20.04 0.78 0.65 –1.01 - 20.56 21.34 21.12
TR905 0.30 0.02 22.05 1.68 0.30 - 20.52 21.76 20.83
TR909 1.65 0.00 1.14 1.47 1.65 - 1.87 22.18 20.15
TR910 1.87 0.02 21.74 0.69 - - 1.87 - 20.37
TR912 1.21 0.07 1.39 0.85 1.21 - 2.53 5.43 0.84
TR913 2.29 0.01 1.56 1.11 2.29 - 2.51 211.32 0.37
TR917 5.43 0.08 5.88 0.50 5.43 - 5.37 29.84 20.06
TR920 0.79 0.04 1.21 0.50 - - 0.79 - 20.35
TR520 1.40 0.05 21.37 0.98 - - 1.40 - 20.15
TR921 3.98 20.04 1.12 0.98 3.98 4.53 3.44 5.79 0.54
TR922-D1 3.23 0.00 20.61 0.71 3.23 1.21 2.83 3.23 1.61
TR928 22.79 20.15 0.45 1.56 - - –2.79 - 21.32
TR942 2.72 20.09 3.11 0.66 - - 2.72 - 0.33
(Continues)
180 | HEO AND FEIG
Finally, models submitted as “Model 5” for which only the local
stereochemistry was refined with locPREFMD did not change much in
terms of GDT-HA and RMSD scores. However, although locPREFMD
does not target global structure refinement, there were still small
improvements in GDT-HA scores suggesting that just the improvement
of local structure may also slightly improve the global structure.
3.2 | Sampling vs refinement
Under the assumption that the force field is good enough to distinguish
the native state as the global free energy minimum, sampling is the limiting
factor for being able to traverse the energy landscape until the
native state is found. While it remains unclear how much sampling is
exactly needed to reach the native state from a nearby initial model on
a given landscape, the general expectation is that more sampling (more
trajectories as well as longer trajectories) would improve the chances
for refinement. In CASP12, we extended the simulation time to >5.6
ls per target. At the same time, the restraint force constant was
reduced by half and we expected to sample more diverse structures on
non-canonical regions (that is, loops) by introducing the CHARMM36m
force field.
As already mentioned above, the more extended sampling did not
result in better refinement compared to previous rounds of CASP, but
we performed post-analysis based on simulation subsets to determine
how the simulation length affected the results just within CASP12. Figure
2 illustrates how GDT-HA improvements depend on the amount of
sampling. This analysis is based on averages over 24 targets for which
we have generated at least 20 trajectories over 200 ns. Generally,
increased sampling resulted in higher GDT scores, consistent with
expectations, and using the full 4 ls resulted in the maximum improvements.
However, the GDT-HA scores for the top 5% percentile of the
distribution was almost converged after 1–2 ls of sampling, and additional
sampling did not make much of a difference.
The next question we investigated was to what degree RMSD and
GDT-HA metrics as well as the RWplus score improved with increasing
simulation time. As shown in Figure 3, average RMSD values did not
change much but the distribution of GDT-HA scores shifted to larger
values and RWplus scores moved to lower values up until 50–100 ns,
after which there was little change. The significant shift in the RWplus
score toward lower energies implies that the interaction within proteins
and internal packing were getting better as the simulations progressed.
Therefore, it appears that extending simulations beyond 200 ns would
not provide much benefits, at least in the presence of positional
restraints.
The optimal balance between the number and length of simulation
with the same amount of sampling is another issue. Many short simulations
may explore different regions of conformational space based on
different initial momenta and are better suited to typical parallel computing
resources than a few long simulations. However, it is more difficult
to cross significant kinetic barriers with short simulations so that
an optimal balance may be a compromise of a moderate number of
moderate-length simulations. Figure 2 confirms this as 5 3 200 ns (1
ls in total) or 10 3 200 ns (2 ls in total) provide slightly better results
than 20 3 50 ns or 20 3 100 ns. As there is little improvement in
GDT-HA and RWplus scores during 100–200 ns (see Figure 3), the
optimal sampling strategy seems to be to run as many as possible 200
ns simulations within the CASP12 refinement protocol.
3.3 | Scoring, filtering, and structure averaging
The general approach toward scoring, filtering, and structure averaging
was established first during CASP10. In CASP11, DFIRE30
was replaced
with RWplus25
and the filtering criteria were slightly adjusted.13
In
CASP12 the same protocol was applied as in CASP11. Briefly, the filtering
step involves the selection of a subset of structures that have
both low RWplus scores and are closer to the initial model. In this section,
we reassessed whether this scoring, filtering, and averaging protocol
is still optimal for the CASP12 test set. We compared refined
models with and without filtering and extracted models at the center
of a given ensemble based on the lowest Ca RMSD sum to all other
structures instead of averaging. The resulting GDT-HA improvements
are summarized in Figure 4. It can be seen, that averaging still provides
a significant benefit, either with or without filtering, but, surprisingly,
the filtering step did not offer a clear advantage anymore. With
TABLE 1 (Continued)
Target Improvements for model 1 DGDT-HA for each modela
DGDT-HA –DRMSD (Å) DGDC-SC MolProbityb
1 2 3 4 5
TR944 0.79 0.01 0.90 0.85 - - 0.79 - 0.19
TR945 0.53 20.02 0.05 0.94 - - 0.53 - 20.27
TR947 2.29 0.03 0.67 0.78 - - 2.29 - 0.44
TR948 5.38 20.07 0.94 0.71 5.38 - 5.21 0.51 2.52
Average (full)c
1.78 0.00 0.38 0.93 1.78 1.31 1.73 1.73 0.17
Average (all)d
1.61 0.00 0.58 0.91
a
Models submitted as “Model 1” are shown in bold characters.
b
MolProbity scores for refined models.
c
DGDT-HA averages for 20 targets shown in bold for which the full protocol was applied.
d
Refinement protocol was not applied correctly for this target and it was excluded from further analysis. If a method was not applied to a target, the
corresponding value is shown as a hyphen (-).
HEO AND FEIG | 181
averaging, the results were similar whether the ensemble was filtered
or not. Moreover, when averaging was not applied, the use of the filter
actually resulted in worse performance. This suggests that it may be
necessary to reassess when and how to filter MD-generated ensembles
as sampling has increased, force fields have improved, and the generation
of the initial models may have changed (see below) from when we
first optimized our refinement protocol.
3.4 | Refinement as a function of the initial model
Our finding that refinement was less successful in CASP12 than in previous
rounds of CASP despite increased sampling prompted us to
examine again13
how the degree of refinement depended on the initial
model quality. As shown in Figure 5, we found little correlation of
GDT-HA improvements with initial GDT-HA scores, although the highest
average improvement was seen for models with initial GDT-HA
scores above 70. Improvements in Ca RMSD were more strongly
dependent on the initial Ca RMSD. There was a clear trend of structures
with lower RMSD values, below 4 Å in Ca RMSD, being
improved in terms of RMSD, while structures with larger initial RMSD
values became worse in terms of RMSD after refinement. Finally, MolProbity
scores were consistently reduced to less than 1.0 in most cases
except when the initial MolProbity score was worse than 3.0, although
even in those cases, the MolProbity scores were reduced significantly
(to about 1.5). This analysis suggests that the reduced amount of
refinement seen in CASP12 vs previous rounds of CASP was probably
not because initial models had higher or lower GDT or RMSD values.
The next question we posed was whether the origin of the initial
model played a role. In CASP12, the initial models came from various
protein structure prediction servers, but models built by the Lee and
Baker groups each contributed 12 initial structures for the 42 refinement
targets. Table 2 summarizes the refinement performance as a
function of where the initial models came from. Most notably, models
originating from the LEE server could only be improved by modest
FIGURE 2 Average improvements in GDT-HA score as a function of simulation time. Improvements in GDT-HA for the best (A) and top
5% percentile (B) of the sampled structure are shown as heat maps. Blue colors indicate the largest improvements in GDT-HA while red colors
indicate the least improvement. Contours indicate the same total simulation times of 1 and 2 ls. Values in bold are given for: 5 3 200
ns, 10 3 100 ns, 20 3 50 ns (on the 1 ls line), 10 3 200 ns, 20 3 100 ns (on the 2 ls line), and 20 3 100 ns (upper right corner)
FIGURE 3 Z-score distributions for Ca RMSD (A), GDT-HA scores (B), and RWplus scores (C) averaged over 24 targets using 20 trajectories
over 200 ns as a function of simulation time (red: 0–10 ns; orange: 10–20 ns; green: 20–50 ns; cyan: 50–100 ns; blue: 100–200 ns).
Arrows indicate the direction of change toward longer simulation times
182 | HEO AND FEIG
amounts in terms of GDT-HA while more substantial GDT-HA
improvements were possible for most other methods. Furthermore, the
accuracy of sidechains, measured with the GDC-SC score, actually
deteriorated for the LEE models while other models were again
improved to different degrees.
We performed more detailed analysis of the change in per-residue
Ca RMSD from the native as well as the per-residue Ca RMSD with
respect to the initial model. As shown in Figure 6, most of the improvements
were made when the initial Ca RMSD was in the range of
1–5 Å. This suggests that residues with moderate errors are more likely
to move toward the native structure, while refinement of substantially
deviated residues is more challenging. Residues that were already close
to the native (less than 1 Å) did not improve much as there is little
room for refinement but these residues also moved less with respect
to the initial structure. This means that residues that were essentially
correct in the initial model tended to remain at that position during
refinement. This would be expected for residues already in the deep
global native minimum but residues close to the experimental structure
are also more likely part of the highly packed protein core where there
is less room for displacement even if the force field energy was not
optimal.
Comparing this analysis for models from the LEE and BAKER servers,
there are noticeable differences especially in the amount of displacement
from the initial model. With LEE models, there was much
less displacement from the initial model even for residues with high initial
RMSD values. This suggests that the LEE models were already in a
FIGURE 4 Average improvements in GDT-HA scores (see color scale) based on different protocols for generating refined models from MD
ensembles as a function of the length and number of MD trajectories. Structures obtained by averaging (A, B) are compared with structures
obtained as ensemble centers (C, D), and with (A, C) and without (B, D) ensemble filtering (see text)
HEO AND FEIG | 183
deep energetic minimum, presumably as a result of refinement using a
similar protocol and energy function as what we have applied here.
Therefore, our refinement of the LEE models was apparently more akin
to a second round refinement. This would explain the much more moderate
improvements for the LEE models compared to the refinement of
other models and the overall lower success with refinement during
CASP12 since the LEE models made up a significant fraction of refinement
targets.
3.5 | Limitations toward further progress
Moderate structure refinement via MD-based sampling is now consistently
possible and has apparently even become integrated in standard
automated modeling platforms as evidenced by the LEE server models
during CASP12. However, we seem to have reached a plateau where
further progress is difficult to achieve. An obvious issue is the use of
restraints that limits how far structures can deviate from a given initial
model and, thereby, how much initial models can be refined toward the
native state. The application of restraints stems from early lessons during
the application of MD to the structure refinement problem that
have taught us that entirely unrestrained simulations from homology
models are more likely to move away from the native state than
toward it12,20,31
although it is not entirely clear why exactly. The use of
restraints during refinement simulations has been the key to achieving
consistency in structure refinement, but it is clear that with simulations
restraint to the initial model it will not be possible to reach the native
state for any but those initial models that are already very close to the
native state. Thus, understanding why exactly unrestrained sampling
from initial homology models generally fails to reach the native state is
the key issue in our opinion for advancing structure refinement to the
next level.
In order to gain further insights, we examined one of the CASP12
targets (TR872) in detail. Since the initial model for this target was relatively
far from the native state, there was significant room for improvement
(initial Ca RMSD and GDT-HA scores were 5.59 Å and 56.8,
respectively). During CASP12, we could not improve this model using
our MD-based protocol (DCa RMSD and DGDT-HA score are 10.04 Å
and 21.4, respectively) and we therefore chose this target to understand
the limitations in our protocol better. This target had errors at
both termini and at a b-turn. There was also relatively poor packing of
FIGURE 5 Refinement performance as a function of on initial model quality. GDT-HA improvements are compared to initial GDT-HA scores
(A), RMSD improvements are compared to initial RMSD values (B), and MolProbity scores after refinement are shown as a function of initial
MolProbity scores (C). Boxplots summarize the results with orange lines inside the boxes indicating median values, while the top and bottom of
the boxes represent first and third quartiles. The minimum and maximum values within 1.5 interquartile from the first and third quartiles are
indicated as whiskers. Individual data points are overlaid as red circles
TABLE 2 Performance dependency on initial model prediction groups
Initial model quality Average improvement for model 1
Server Na
GDT-HA RMSD (Å) GDC-SC MolProbity DGDT-HA –DRMSD (Å) DGDC-SC MolProbityb
Lee 12 50.30 5.01 29.75 1.80 0.64 20.01 21.12 0.91
Baker 12 58.58 4.13 38.69 1.01 1.96 0.04 0.60 1.12
Bates_BMM 4 41.35 5.00 18.56 2.62 2.04 20.02 4.02 0.64
QUARK 3 39.86 6.48 19.20 2.28 3.77 20.03 1.08 0.56
Pcons 3 42.72 7.49 23.29 0.85 2.13 0.02 1.69 0.52
Otherc
7 43.74 7.72 25.26 2.49 1.29 20.06 0.79 1.04
a
Number of targets.
b
MolProbity scores for refined models.
c
Less than three targets each came from RaptorX, YASARA, Seok-server, myprotein-me, HHPred1, and BhageerathH-Plus. TR879 (from FFAS-3D) was
excluded from the analysis.
184 | HEO AND FEIG
sidechains between b-sheets. We complemented the simulations generated
during CASP12 with extensive unrestrained simulations that
were started either from the native structure or the initial model given
during CASP12. The additional MD simulations involved initially a set
of 40 runs over 100 ns each. These simulations were carried out in the
same manner as the MD simulations during CASP12 (except for the
lack of restraints). The generated conformations were then clustered
similarly as during CASP12 (see Methods). From each cluster another
round of 10 additional simulations over 100 ns were started. This
resulted in a total of 22 ls of additional sampling (8 ls started from the
native, 14 ls started from the initial model). The resulting conformations,
as well as the structures generated during CASP12 with
restrained sampling, were then combined and projected onto the two
principal components from time-structure independent component
analysis32
with a lag time 1 ns and using pair-wise Ca distances as the
distance metric.
Figure 7 illustrates the conformational sampling for TR872 during
our refinement simulations in CASP12 compared with the additional
unrestrained simulations started from the native structure and initial
model. When started from the native structure, the simulations explore
only a relatively narrow conformational space, despite the extensive
sampling and a protocol that encourages broad exploration of conformational
space. The only significant dynamics is found in the flexible Cterminus
and a loop connecting an internal hairpin (see also Figure 8A).
The average structure obtained from the MD simulations remains close
to the native state with a GDT-HA score of 79.0 and a Ca RMSD value
of 2.24 Å for the average structure. This indicates that the native state
coincides with a deep free energy minimum in the MD simulations as
expected from the protein folding funnel hypothesis.33
While we cannot
say for sure based on the still limited sampling whether the native
state is indeed at the global free energy minimum, the native state is at
least a prominent local minimum with the force field used here from
which escape is unlikely if it can be reached during refinement.
Our refinement simulations from CASP12 that were started from
the CASP-provided initial model but involved weak positional restraints
also did not significantly explore conformational space (see Figure 6C).
It is readily apparent from Figure 6 that the differences between the
initial model and the native state are much too large to be overcome in
the presence of the restraints. In contrast, the unbiased simulations
started from the initial model generated broad sampling where a number
of distinct states were visited (see Figures 7B and 8C-G). The state
S0, closest to the initial model, was most favorable and is already significantly
refined with respect to the native state. S0 has a GDT-HA score
of 65.3 that is about 8.5 units better than the initial model. The state
S0 is easily reachable from the initial model without restraints since the
initial model and S0 are close in terms of the tICA principal components.
However, as S0 is more than 5 Å Ca RMSD away from our
model 1 submitted during CASP12, the restraints in our CASP12 protocol
clearly prevented this state from being reached as it lies outside the
sampling radius in the restrained simulations (see Figure 7B, C). Other
states in the unrestrained simulations, S1-S4, were only slightly less
favorable than S0 and separated by kinetic barriers. One of the states,
S3, is even closer to the native state than S0 (see Figure 8F) with a
GDT-HA score of 71.0 and a Ca RMSD value of 2.59 Å. This state
FIGURE 6 Per-residue change in Ca RMSD from the native, DRMSDi,native, (A, B, C) and Ca RMSD from the initial model, RMSDi,initial, (D, E,
F) vs initial Ca RMSD from the native for a given residue i. The analysis is compared for all targets (A, D) and for targets from the LEE (B,
E) and BAKER (C, F) groups. Boxplots are drawn in the same manner as in Figure 5. Individual data points are shown as gray dots. Dashed
lines indicate perfect refinement which makes Ca deviation to zero
HEO AND FEIG | 185
closely resembles the native state except for a persistent incorrect Nterminal
helix that would need to dissolve for the structure to completely
match the native state (see Figure 8F). On the other hand, the
simulations also visit a state (S4) that is located significantly further
away from the native state than the initial model. In S4, the N-terminal
is opened up significantly (see Figure 8G). S4 is also about 5 Å Ca
FIGURE 7 Conformational sampling of TR872 during refinement projected onto the two principal coordinates obtained from timestructure
independent component analysis (tICA). Contour plots are drawn for sampling probabilities (as –log P based on probabilities P and
shifted to zero at the minimum value). The coordinates for the native and the initial model are indicated by using a blue and red “X”, respectively.
Sampling distributions for different sets of MD trajectories are shown in different panels: (A) started from the native structure without
restraints; (B) started from the initial model without restraints; and (C) started from the initial model with restraints according to our
CASP protocol. Representative states (S0–4) are indicated in panel B
FIGURE 8 Structures for target TR872 during refinement and with additional MD-based sampling. The native structure is shown in yellow
in all panels and compared with the MD-simulated average structure of the native state (blue; A), the initial model given during CASP12 for
refinement (pink; B), structures sampled via MD from the initial model and designated as states S0-S4 (cf. Figure 7, purple, C-G), and the
model submitted during CASP12 (green, H). Structures are shown in two views and for each panel, GDT-HA and Ca RMSD measures relative
to the native structure are given
186 | HEO AND FEIG
RMSD away from our model 1 and, therefore, relaxing the restraints
enough to be able to reach S0 would also allow S4 to be visited.
According to Figure 7B, a transition from S0 to S3, would appear as
the most direct path to the native state, but a transition via S2 or S1 is
kinetically more likely. Both of these states are slightly unfolded relative
to S0 and, so, in this example, further refinement from the initially
relaxed state S0 would require partial unfolding (via S1 or S2) and
refolding (to S3). The final transition to melt the N-terminal helix (which
we did not observe) probably also requires an additional partial unfolding
and repacking process.
The initial relaxation to S0 is essentially what we can reach with our
current refinement protocol as long as the restraints are weak enough to
allow large enough structural changes from the initial model. For the
example chosen here, our restraints were too strong, but weaker
restraints should make it possible to reach S0. However, further refinement
would not simply proceed downhill on a folding funnel along a
direct path. Instead, multiple cycles of partial unfolding and refolding
may be required that are hindered severely by restraints as larger conformational
changes are required but also involve the danger of unfolding
toward states that are further away from the native state if restraints are
not applied. Once states further away from the native state are sampled
(such as S4), a recovery toward the native state becomes increasingly difficult
due to the explosion of configurational space.
We believe that the example presented here typifies the general
problem of structure refinement. We appear to be well-equipped to
achieve modest refinement from an initial model toward locally relaxed
structures by using restrained MD simulations aided by conformational
averaging. However, further refinement likely requires cycles of partial
unfolding and refolding that are essentially impossible in the presence
of restraints. As such transitions require the crossing of kinetic barriers
to states with similar relative energies, the key challenge going forward
seems to be how to favor productive cycles that lead toward the native
state while, at the same time, preventing transitions to states that lead
away from the native state, but without a priori knowledge of where
the native state is located. It does appear, though, that once the native
state is reached it can be recognized based on the force field providing
a deep free energy minimum. It remains to be seen in future studies
how general these findings are and, ultimately, how to transform such
insight into successful refinement protocols that can achieve more significant
refinement than what is currently possible.
3.6 | Further challenges
While the protein structure refinement exercise within CASP focuses on
the very specific task of improving a given initial model with respect to
the experimental structure, the larger goal is the generation of meaningful
structural models of proteins in their biologically most relevant form.
This includes the modeling of complete structures in their most prevalent
oligomeric states, which involves additional challenges. Reliable templates
used to generate initial models may not cover the entire
sequence while information about the oligomeric state and especially
oligomer interfaces are often lacking. Unfortunately, neither the ab initio
modeling of missing fragments nor the application of protein-protein
docking techniques to determine likely oligomer configurations are trivial.
However, carrying out MD-based structure refinement would be
expected to benefit from having complete structures in their oligomeric
state as that would represent the most realistic physical environment for
a given protein structure. It remains to be assessed in detail, though,
what the effect of missing fragments and the neglect of oligomeric
states have on refinement success with MD-based techniques.
4 | CONCLUSIONS
The refinement of protein structures via MD simulations remains a
highly successful approach. Our protocol that combines extensive sampling
via MD followed by filtering, scoring and structure averaging
remains highly successful in providing moderate but consistent refinement.
However, the extent of refinement during CASP12 with our
MD-based protocol has seen a decline based on CASP12 targets while
we expected improved performance with increased sampling, further
improved force fields, and new refinement protocols that were tested
during CASP12. To some extent this appears to be a result of MDbased
refinement becoming a standard component of modeling pipelines,
so that initial models available in the refinement category at
CASP are more difficult to refine further. But more generally, we seem
to have reached a plateau with the current refinement protocols. A
major issue is the use of restraints during sampling which is necessary
on one hand to achieve consistency but severely limits the degree to
which structures can be refined. A detailed analysis of one of the
CASP12 targets suggests that much weaker restraints may be needed
to achieve further refinement, and, in particular, to allow partial unfolding
and refolding to reach the native state. The key issue remains, however,
that very weak, or no restraints, also allow states to be visited
that lead away from the native state while it appears that the energy
function does not provide strong guidance toward the native state until
the native state is actually reached. While extensive sampling is now
possible and force fields have become highly realistic, we see the
remaining challenge in effectively guiding sampling during refinement
to eventually reach the native state rather than drift away toward
unfolded states.
ACKNOWLEDGEMENTS
Funding was provided by the National Institute of Health Grant R01
GM084953 and computing time at NSF XSEDE facilities (TGMCB090003)
was used to carry out this research. Office of Cyberinfrastructure,
TG-MCB090003,;National Institute of General Medical
Sciences, GM084953.
ORCID
Michael Feig http://orcid.org/0000-0001-9380-6422
REFERENCES
[1] Zhang Y. Protein structure prediction: when is it useful?. Curr Opin
Struct Biol. 2009;19:145–155.
[2] Kuhlbrandt W. Cryo-EM enters a new era. Elife. 2014;3:e03678
HEO AND FEIG | 187
[3] Hirata K, Shinzawa-Itoh K, Yano N, et al. Determination of damagefree
crystal structure of an X-ray-sensitive protein using an XFEL.
Nat Methods 2014;11:734–736.
[4] Zhang Y, Skolnick J. The protein structure prediction problem could
be solved using the current PDB library. Proc Natl Acad Sci USA.
2005;102:1029–1034.
[5] Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of
residue coupling in protein families. IEEE Trans Comput Biol Bioinform.
2008;5:183–197.
[6] Nugent T, Jones DT. Accurate de novo structure prediction of large
transmembrane protein domains using fragment-assembly and correlated
mutation analysis. Proc Natl Acad Sci USA. 2012;109:E1540–
E1547.
[7] Ovchinnikov S, Park H, Varghese N, et al. Protein structure determination
using metagenome sequence data. Science. 2017;355:294–
298.
[8] Feig M. Computational structure refinement: almost there, yet still
so far to go. Wiley Interdiscip Rev Comput Mol Sci. 2017;7:e1307.
[9] MacCallum JL, Perez A, Schnieders MJ, Hua L, Jacobson MP, Dill
KA. Assessment of protein structure refinement in CASP9. Proteins.
2011;79:74–90.
[10] Nugent T, Cozzetto D, Jones DT. Evaluation of predictions in the
CASP10 model refinement category. Proteins. 2014;82:98–111.
[11] Mirjalili V, Noyes K, Feig M. Physics-based protein structure refinement
through multiple molecular dynamics trajectories and structure
averaging. Proteins. 2014;82:196–207.
[12] Mirjalili V, Feig M. Protein structure refinement through structure
selection and averaging from molecular dynamics ensembles.
J Chem Theory Comput. 2013;9:1294–1303.
[13] Feig M, Mirjalili V. Protein structure refinement via moleculardynamics
simulations: what works and what does not?. Proteins.
2016;84(Suppl 1):282–292.
[14] Heo L, Feig M. PREFMD: a web server for protein structure refinement
via molecular dynamics simulations. Bioinformatics 2017.
[15] Feig M. Local protein structure refinement via molecular dynamics
simulation with locPREFMD. J Chem Inf Model. 2016;56:1304–
1312.
[16] Della Corte D, Wildberg A, Schr€oder GF. Protein structure refinement
with adaptively restrained homologous replicas. Proteins.
2016;84(Suppl 1):302–313.
[17] Park H, Seok C. Refinement of unreliable local regions in templatebased
protein models. Proteins. 2012;80:1974–1986.
[18] Lee GR, Heo L, Seok C. Effective protein model structure refinement
by loop modeling and overall relaxation. Proteins. 2016;84
(Suppl 1):293–301.
[19] Modi V, Dunbrack RLJ. Assessment of refinement of templatebased
models in CASP11. Proteins. 2016;84(Suppl. 1):260–281.
[20] Chen J, Brooks CL III, Can molecular dynamics simulations provide
high-resolution refinement of protein structure?. Proteins. 2007;67:
922–930.
[21] Huang J, Rauscher S, Nawrocki G, et al. CHARMM36m: an
improved force field for folded and intrinsically disordered proteins.
Nat Methods. 2017;14:71–73.
[22] Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML.
Comparison of simple potential functions for simulating liquid
water. J Chem Phys. 1983;79:926–935.
[23] Darden TA, York D, Pedersen LG. Particle-Mesh Ewald: an N log(N)
method for Ewald sums in large systems. J Chem Phys. 1993;98:
10089–10092.
[24] Beauchamp KA, Bowman GR, Lane TJ, Maibaum L, Haque IS, Pande
VS. MSMBuilder2: modeling conformational dynamics at the picosecond
to millisecond scale. J Chem Theory Comput. 2011;7:3412–3419.
[25] Zhang J, Zhang Y. A novel side-chain orientation dependent potential
derived from random-walk reference state for protein fold
selection and structure prediction. PLoS One. 2010;5:e15386.
[26] Brooks BR, Brooks CL, Mackerell AD, et al. CHARMM: the biomolecular
simulation program. J Comput Chem. 2009;30:1545–1614.
[27] Eastman P, Friedrichs MS, Chodera JD, et al. OpenMM 4: a reusable,
extensible, hardware independent library for high performance
molecular simulation. J Chem Theory Comput. 2013;9:461–469.
[28] Phillips JC, Braun R, Wang W, et al. Scalable molecular dynamics
with NAMD. J Comput Chem. 2005;26:1781–1802.
[29] Feig M, Karanicolas J, Brooks IIICL. MMTSB tool set: enhanced
sampling and multiscale modeling methods for applications in structural
biology. J Mol Graph Modell. 2004;22:377–395.
[30] Zhou HY, Zhou YQ. Distance-scaled, finite ideal-gas reference state
improves structure-derived potentials of mean force for structure
selection and stability prediction. Protein Sci. 2002;11:2714–2726.
[31] Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE. Refinement of
protein structure homology models via long, all-atom molecular
dynamics simulations. Proteins. 2012;80:2071–2079.
[32] Naritomi Y, Fuchigami S. Slow dynamics in protein fluctuations
revealed by time-structure based independent component analysis:
the case of domain motions. J Chem Phys. 2011;134:065101
[33] Leopold PE, Montal M, Onuchic JN. Protein folding funnels: a
kinetic approach to the sequence-structure relationship. Proc Natl
Acad Sci U S A. 1992;89:8721–8725.
How to cite this article: Heo L, Feig M. What makes it difficult
to refine protein models further via molecular dynamics simulations?
Proteins. 2018;86:177–188. https://doi.org/10.1002/
prot.25393
188 | HEO AND FEIG