[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 882 882–888
BIOINFORMATICS ORIGINAL PAPER
Vol. 26 no. 7 2010, pages 882–888
doi:10.1093/bioinformatics/btq058
Structural bioinformatics Advance Access publication February 11, 2010
MULTICOM: a multi-level combination approach to protein
structure prediction and its assessments in CASP8
Zheng Wang1, Jesse Eickholt1 and Jianlin Cheng1,2,3,∗
1Department of Computer Science, 2Informatics Institute and 3C. Bond Life Science Center, University of Missouri,
Columbia, MO 65211, USA
Associate Editor: Burkhard Rost
ABSTRACT
Motivation: Protein structure prediction is one of the most important
problems in structural bioinformatics. Here we describe MULTICOM,
a multi-level combination approach to improve the various steps in
protein structure prediction. In contrast to those methods which look
for the best templates, alignments and models, our approach tries to
combine complementary and alternative templates, alignments and
models to achieve on average better accuracy.
Results: The multi-level combination approach was implemented
via ﬁve automated protein structure prediction servers and one
human predictor which participated in the eighth Critical Assessment
of Techniques for Protein Structure Prediction (CASP8), 2008. The
MULTICOM servers and human predictor were consistently ranked
among the top predictors on the CASP8 benchmark. The methods
can predict moderate- to high-resolution models for most templatebased
targets and low-resolution models for some template-free
targets. The results show that the multi-level combination of
complementary templates, alternative alignments and similar models
aided by model quality assessment can systematically improve both
template-based and template-free protein modeling.
Availability: The MULTICOM server is freely available at
http://casp.rnet.missouri.edu/multicom_3d.html
Contact: chengji@missouri.edu
Received on October 30, 2009; revised on February 2, 2010;
accepted on February 8, 2010
1 INTRODUCTION
Knowing protein tertiary structure is useful for determining protein–
protein interactions, protein function and evolution, and designing
drugs. At present, X-ray crystallography and nuclear magnetic
resonance (NMR) spectroscopy are the two most commonly used
experimental methods employed to determine protein structure.
However, both methods are far too expensive and time consuming
to be used to process the millions of proteins produced by highthroughput
genome sequencing (Jaravine et al., 2006; Lattman,
2004; Service, 2005). Computer-aided protein structure prediction,
in contrast, is less expensive, much faster, and able to generate
protein structures on a large scale. As a result, computational protein
structure prediction has received much attention in recent years,
particularly from those working in computer science, chemistry,
molecular biology, and molecular physics, and their efforts have
led to steady progress in the area (Kryshtafovych et al., 2005, 2007,
2009a).
∗To whom correspondence should be addressed.
Recently, the Eighth Critical Assessment of Techniques for
Protein Structure Prediction, 2008 (CASP8) (Moult et al., 2009)
assessed state-of-the-art protein modeling techniques in two
categories: template-based modeling (TBM) and template-free
modeling (FM). TBM deals with proteins for which suitable
templates can be found. It can also be referred to as comparative
modeling or fold recognition depending on the availability of
sequentially or structurally related proteins. FM deals with proteins
for which no suitable templates can be found. This category includes
both fragment-based modeling (Simons et al., 1997) and purely
‘ab initio’, or ‘de novo’ modeling, in which predictions are made
based solely on chemical and physical principles (Hinds and Levitt,
1994; Sternberg and Thornton, 1978).
TBM typically contains ﬁve steps (Baker and Sali, 2001; Cheng,
2008; Zhang, 2008; Zhang and Skolnick, 2005). The ﬁrst step is
to identify templates that have a similar structure to the protein to
be modeled (target). Once templates have been selected, the target
protein is then aligned with the templates. At this point, models
can be built from the alignments and the structural information of
each template.After model generation, the models are evaluated and
reﬁned.
Here we present a multi-level combination approach to improve
protein structure prediction during all facets of TBM. Our
approach ﬁrst attempts to combine complementary templates,
alignments, and model generation methods to produce a number of
alternative models. It then uses a novel model combination process
guided by model quality evaluation (Cheng et al., 2009; Wang
et al., 2008) to reﬁne the models. Five fully automated servers
(MULTICOM-CLUSTER, MULTICOM-REFINE, MULTICOMCMFR,
MULTICOM-RANK and MUProt) and one human predictor
(MULTICOM) implemented various forms of this approach and
participated in CASP8. The CASP8 results show that the multi-level
combination approach is effective for the full spectrum of protein
modeling, including high-accuracy TBM, hard TBM and FM.
2 METHODS AND IMPLEMENTATION
2.1 A multi-level combination pipeline for protein
structure prediction
Our multi-level combination pipeline (Fig. 1) for protein structure prediction
is generally comprised of ﬁve steps: (i) template identiﬁcation and ranking,
(ii) multi-template combination, (iii) model generation, (iv) model evaluation
and (v) model combination and reﬁnement. More speciﬁcally, our pipeline
ﬁrst uses a set of fold recognition methods to generate several lists of
templates, each one ranked by one of the fold recognition methods employed.
Then alignments between the target and one or more of the top templates in
© The Author(s) 2010. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 883 882–888
Protein structure prediction and its assessments in CASP8
each of the ranked lists are greedily combined into a multiple sequence
alignment (Cheng, 2008). This multiple sequence alignment along with
the structure for each template are fed into model generation tools which
construct models. All the models are then evaluated and ranked by model
quality assessment tools. Finally, globally similar models and/or locally
similar model fragments are combined using a novel model combination
algorithm to generate reﬁned models. These reﬁned models are the end
product of our pipeline.
2.2 MULTICOM-CLUSTER
2.2.1 Template identiﬁcation and ranking To identify and rank templates,
we used proﬁle–proﬁle alignments, proﬁle–sequence alignments and a
machine learning approach (Cheng and Baldi, 2006). In order to generate
proﬁles, PSI-BLAST, a proﬁle–sequence local alignment method, was used
to search a query protein sequence against the NCBI non-redundant protein
sequence database to build three different kinds of sequence proﬁles.
These include the position speciﬁc scoring matrix (PSSM) of PSI-BLAST,
the hidden Markov model (HMM) of hhsearch (Soding, 2005) and the
proﬁle of COMPASS (Sadreyev and Grishin, 2003). The PSSM proﬁle,
HMM and COMPASS proﬁle were searched against our in-house template
sequence database, template HMM database and template COMPASS proﬁle
database to identify homologous templates from the output generated by
PSI-BLAST, hhsearch and COMPASS, respectively. The query-template
alignments generated by PSI-BLAST, hhsearch and COMPASS were kept
Fig. 1. A multi-level combination pipeline for protein structure prediction.
in three different sets and ranked according to E-value. In addition, SPEM
(Zhou and Zhou, 2005), a global proﬁle–proﬁle alignment tool, was used
to align the query with the top 10 templates found by a sensitive machine
learning fold recognition method (Cheng and Baldi, 2006). In this way, we
combined the proﬁle–proﬁle alignments with a machine learning approach.
This resulted in a fourth set of query-template alignments.
2.2.2 Multi-template combination It has been studied that combining
multiple templates can, in most cases, improve the performance of TBM
(Cheng, 2008). This being the case, all three of our predictors which
implemented this portion of the pipeline incorporated the multi-template
combination algorithm (Cheng, 2008) (Table 1). This algorithm chose
and greedily combined the most signiﬁcant query-template alignment
(E-value <10−20 and cover ratio >75%) in each set with the rest of the
alignments from the same set. This resulted in a multiple sequence alignment
centered on the query sequence. The most signiﬁcant alignment was then
removed, and the second most signiﬁcant alignment was combined with
the remaining query-template alignments in order to generate a multiple
sequence alignment using the same algorithm. The process was repeated up
to 10 times to generate up to 10 multiple alignments from each set.
2.2.3 Model generation Models were generated in one of two ways. If
query-template alignments existed, then each query-template alignment and
its corresponding structure were fed into Modeller 7v7 (Fiser and Sali, 2003),
a widely used model generation tool, to generate 10 models. From these, the
model with minimum Modeller energy was chosen as a predicted model. If
no signiﬁcant template could be found by hhsearch (E-value <10−3) and the
length of the query protein was <120 residues, ROSETTA was executed to
generate 200 models. The 200 models were clustered by ROSETTA, and the
centroid of several large clusters were chosen as predicted models. During
CASP8, ROSETTA was executed by MULTICOM-CLUSTER to generate
models for several hard targets.
2.2.4 Model ranking The previous steps generated a large number of
models for each target. To rank all of the models, we used our model
quality assessment tool ModelEvaluator (Wang et al., 2008), which had
been evaluated during CASP8 and considered an efﬁcient and accurate
model evaluation tool (Cheng et al., 2009). ModelEvaluator compared
the secondary structure, solvent accessibility, contact map and beta-sheet
topology of a model with those predicted from its primary sequence using
the SCRATCH suite (Cheng et al., 2005). The comparison resulted in a
number of features which were fed into support vector machines (SVM) to
predict the GDT-TS score of the model. The predicted GDT-TS scores were
used to rank the models and the top ﬁve models were submitted to CASP by
MULTICOM-CLUSTER.
Table 1. Implementation details of ﬁve MULTICOM servers predictors and one MULTICOM human predictor
Steps Methods M-CLUSTER M-RANK M-CMFR M-REFINE MUProt MULTICOM
(1) Template identiﬁcation
and ranking
PSI-BLAST
√ √ √
Hhsearch
√ √
COMPASS
√
FOLDPro
√ √ √
(2) Template combination Greedy algorithm
√ √ √
(3) Model generation Modeller
√ √ √ √ √ √
ROSETTA
√
MULTICOM models
√ √
CASP8 server models
√
(4) Model evaluation ModelEvaluator (SVM)
√ √ √ √ √ √
SPIKER (clustering)
√
(5) Model combination and
reﬁnement
Global-local algorithm
√ √ √
883
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 884 882–888
Z.Wang et al.
2.3 MULTICOM-RANK and MULTICOM-CMFR
MULTICOM-RANK and MULTICOM-CMFR are two other predictors
which also implemented the ﬁrst four steps of our pipeline. The
implementation of MULTICOM-RANK and MULTICOM-CMFR are the
same as described above except a few minor differences in the template
identiﬁcation and ranking, and model generation steps (Table 1). More
speciﬁcally, both used a two-track approach for template identiﬁcation
and ranking. This is to say that for easy targets, MULTICOM-CMFR
(or MULTICOM-RANK) used only PSI-BLAST (or PSI-BLAST and then
hhsearch) to identify and rank templates according to E-value as in
MULTICOM-CLUSTER. If fewer than ﬁve signiﬁcant templates could be
found, (i.e. when working on relatively hard targets) we used our SVM-based
fold recognition method (Cheng and Baldi, 2006) to rank templates and ﬁve
other alignment tools including MUSCLE (Edgar, 2004), hhsearch, lobster
(Edgar and Sjolander, 2003), SPEM and COMPASS to generate additional
alignments between the query and each of the top 50 templates. Additionally,
when taking the track for hard targets, 250 models were generated as opposed
to just 10. This larger number for hard targets was due in part to the fact that
more alignment tools were used, and so we generated more query-template
alignments, and hence more models ended up being generated. It was also
due to our belief that since the templates available were less signiﬁcant,
generating more candidate models, or enlarging the candidate model pool,
might result in more good models.
2.4 MULTICOM-REFINE, MUProt and MULTICOM
MULTICOM-REFINE, MUProt, and MULTICOM (our human-expert
predictor) implemented the fourth and ﬁfth portions of our general pipeline
(model evaluation, model combination and reﬁnement). These predictors
made predictions by ranking and combining a number of internal and
external models via a novel global–local model combination algorithm. This
algorithm works by ﬁrst attempting a combination of models selected on
global similarity. This global model combination procedure worked well
for easy targets where many similar models were generated. For harder
targets, the algorithm falls back to more a localized approach in which it
combines models that have similar fragments. Here, we ﬁrst describe in detail
our global–local model combination algorithm that MULTICOM-REFINE,
MUProt and MULTICOM all use. We then go on to describe the differences
between the three predictors.
2.4.1 Model combination and reﬁnement For the model combination
and reﬁnement step of our pipeline, we used a novel global–local model
combination algorithm. This algorithm worked by ﬁrst selecting a seed model
from one of the top ﬁve ranked models, and then compared it against all
the other models using the structure-comparison tool TM-Score (Zhang and
Skolnick, 2004a). Those models in which at least 80% of the model could
be aligned to the seed model with a RMSD <4 Å were considered globally
similar to the seed model and selected for combination. To combine the seed
model and selected models, we fed them into Modeller 7v7, and used them
as templates to generate 10 new models for the protein. Of these new models,
the one with the minimum Modeller energy was selected as a reﬁned model.
This process was repeated up to ﬁve times to generate a reﬁned model for
each of the top ﬁve ranked models.
If no globally similar models were found, which was often the case for
hard targets, a local model combination algorithm was used to combine the
seed model with other locally similar models. To do so, the seed model was
ﬁrst compared against other models using TM-Score. Then long fragments
of models that could be aligned with the seed model with a RMSD <3 Å and
GDT-TS score >50 were selected. The minimum length of the fragments
was initially set to 80 residues and this threshold was repeatedly reduced by
ﬁve residues if no fragments could be found. The structures for the fragments
and the initial seed model were fed into Modeller to generate 10 models, and
the model with minimum energy was chosen as a reﬁned model. This process
was also repeated up to ﬁve times to produce a reﬁned model for each of the
top ﬁve ranked models.
As mentioned, MULTICOM-REFINE, MUProt and MULTICOM all
focus on model combination and reﬁnement, and all implement the global–
local model combination algorithm just described. These three predictors
differ in which models they consider for combination and reﬁnement, and
how those models are initially ranked. MULTICOM-REFINE collected
the models predicted by MULTICOM-CMFR, MULTICOM-RANK and
MULTICOM-CLUSTER, and used ModelEvaluator to predict the GDT-TS
score of each model. The top 50% of the models generated by MULTICOMCMFR
and MULTICOM-RANK in addition to all the models generated
by MULTICOM-CLUSTER were selected for model combination and
reﬁnement.MUProt also took the same set of models used by MULTICOMREFINE
as input, but before using ModelEvaluator to rank models, it used
Spicker (Zhang and Skolnick, 2004b) to cluster models, and the models
closest to the centroid of the largest cluster were ranked ﬁrst. In this way, we
combined ModelEvaluator, a SVM-based model quality assessment program
(MQAP) with a clustering-based MQAP. For MULTICOM, our human
expert predictor, the initial models came from all CASP8 server models
(not including models from human predictors). These were ranked using
ModelEvaluator according to predicted GDT-TS score.
3 RESULTS AND DISCUSSION
In order to evaluate the performance of our algorithms and
predictors, and also compare the various ways of combining different
techniques, we evaluated the six MULTICOM predictors on the
CASP8 benchmark from two perspectives. First, we developed
an automated evaluation pipeline to evaluate the MULTICOM
predictors on 120 valid CASP8 targets. For each target, the
experimental structures and predicted models were downloaded
from the CASP8 website. The sequences extracted from the
experimental structures were aligned with the CASP8 target
sequences using ClustalW (Larkin and Blackshields, 2007) to
identify residues in the target sequences that did not have coordinates
in the experimental structures (i.e. potentially disordered regions).
These residues were removed from the CASP8 structure models.
The ﬁltered models were then compared with the experimental
structures using TM-Score. This generated GDT-TS (Zemla, 2003;
Zemla et al., 1999), TM and Maxsub scores (Siew et al., 2000).
These scores ranged from 0 to 100, and were used to measure
the quality of the predicted models. In order to complement the
ofﬁcial CASP8 assessment (Cozzetto et al., 2009; Ben-David et al.,
2009; Keedy et al., 2009), our evaluation was based on the entire
structure of a target as opposed to only its domains. Second, we
downloaded the ofﬁcial GDT-TS and Z-scores for all the CASP8
models and compared the MULTICOM predictors with the other
predictors using the ofﬁcial CASP8 results (http://predictioncenter
.gc.ucdavis.edu/casp8/results.cgi). Note that the Z-score of a model
for a target was its GDT-TS score normalized by the mean and
standard error of all the models associated with the target (Cozzetto
et al., 2009).
3.1 Evaluation by our in-house pipeline
CASP allowed a predictor to submit ﬁve models for each target,
where the ﬁrst model was believed to be the best prediction
(Kryshtafovych et al., 2009b). We evaluated each MULTICOM
predictor by calculating the average TM, GDT-TS and MaxSub score
for the ﬁrst models and the best-of-ﬁve models (the model with the
highest score) on 120 CASP8 targets (Table 2). The standard error
of the average GDT-TS scores were also calculated by dividing the
884
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 885 882–888
Protein structure prediction and its assessments in CASP8
Table 2. Evaluation results of MULTICOM predictors for the ﬁrst and the
best-of-ﬁve models (inside parentheses) on 120 CASP8 targets
Predictor Avg. TM Avg. GDT-TS Avg. MaxSub GDT-TS S.E.a
MULTICOM 70 (72) 63 (65) 60 (62) 1.99 (1.94)
M-REFINE 67 (69) 60 (62) 56 (58) 2.06 (1.99)
M-CLUSTER 67 (70) 60 (62) 56 (59) 2.03 (1.99)
MUprot 67 (69) 60 (61) 59 (58) 2.04 (1.98)
M-RANK 66 (68) 59 (61) 55 (57) 2.05 (2.02)
M-CMFR 66 (68) 58 (61) 55 (57) 2.04 (2.02)
aThe standard error of the average of GDT-TS scores of the ﬁrst/best models.
standard deviation by the square root of the total number of targets
(Table 2).
According to the results, MULTICOM, a predictor which made
predictions by combining all CASP8 server models, achieved a
better performance than MULTICOM-REFINE and MUProt, which
made predictions by combining models from only three of the
MULTICOM template-based predictors. As the combination and
reﬁnement method used was exactly the same in each predictor, this
indicates that the quality of the ﬁnal model increases as the number
and quality of candidate models increase. This further proves that
our model combination algorithm can detect and combine structural
segments with better qualities and reﬁne the ﬁnal model. The similar
performance of MULTICOM-REFINE and MUProt indicates that
combining a cluster-based ranking method with ModelEvaluator did
not result in much of a change in performance when compared
to just using ModelEvaluator. MULTICOM-REFINE performed
slightly better than MULTICOM-CLUSTER and notably better than
MULTICOM-RANK and MULTICOM-CMFR. This indicates that
a well-implemented model combination approach tends to achieve
better (or at least similar) performance than (or as) the best base
predictors (i.e. those that only implement the ﬁrst four steps of our
pipeline). Also, MULTICOM-CLUSTER performs better than the
other two base predictors MULTICOM-RANK and MULTICOMCMFR,
and this indicates that combining more diverse template
identiﬁcation and fold recognition methods can improve structure
prediction. Moreover, the fact that MULTICOM-RANK performed
slightly better than MULTICOM-CMFR suggests that hhsearch may
work slightly better than PSI-BLAST for predicting structures for
each target.
To consider the overall quality of our multi-level combination
approach, we used TM-score. This tool reports a score between
0 and 1, and measures the absolute quality of a model. A TM-score
of 0.40 indicates a moderately accurate model with the correct
topology, whereas a score of 0.17 indicates a random prediction
(Zhang and Skolnick, 2004a). As we see in Table 2, the average
per-target TM-score of all the MULTICOM predictors ranged from
0.66 to 0.70. This indicates that in general the models that our
MULTICOM predictors produce are of good quality.
3.2 Comparisons with other CASP8 predictors
We compared the MULTICOM predictors with other CASP8 server
and human predictors using the ofﬁcial CASP8 assessment data and
measures. CASP8 dissects valid targets into 154 TBM domains, for
which at least one template is available, and 13 FM domains, for
which no template is available. From the 154 TBM domains, 50 are
Table 3. Top 10 server predictors evaluated on the ﬁrst models (out of ﬁve
submissions) of the 50 TBM-HA domains
Predictor Domain
No.
Sum Z-score
GDTa
Avg. GDT-TSb
Zhang-Serverc 50 27 88
RAPTORd 50 25 87
MULTICOM-REFINE 50 24 87
MUProt 50 24 87
Phyre_de_novoe 50 24 87
MULTICOM-CLUSTER 50 23 87
HHpred5f 50 23 85
MULTICOM-RANK 50 22 86
HHpred2f 50 22 86
pro-sp3-TASSERg 50 21 86
The standard error of the average GDT-TS scores of MULTICOM-REFINE, MUProt,
MULTICOM-CLUSTER and MULTICOM-RANK are 0.84, 0.81, 0.81 and 0.96,
respectively. This analysis only considers predictors that predicted more than 46
domains. For details about each CASP8 server, please refer to CASP8 meeting abstract
(http://www.predictioncenter.org/casp8/doc/CASP8_book.pdf).
aSum of the Z-scores of GDT-TS.
bAverage GDT-TS score.
cZhang (2009).
dXu et al. (2009).
eKelley et al. (2008).
f Hildebrand et al. (2009).
gZhou et al. (2009).
classiﬁed as template-based high accuracy (TBM-HA) domains, for
which at least one model with a GDT-TS score >80 was predicted.
In our comparison, we took models predicted by 237 predictors (121
server predictors and 116 human predictors), and assessed them from
various perspectives (Tables 3–5). Our comparisons serve only as a
comparative study of our methods with respect to the state-of-the-art.
Readers should refer to the CASP8 articles (Ben-David et al., 2009;
Cozzetto et al., 2009; Keedy et al., 2009) for the ofﬁcial CASP8
results accredited by protein structure experts.
Table 3 reports the top 10 server predictors on 50 TBM-HA
domains. Predictors were evaluated by the cumulative GDT-TS
Z-scores and the average GDT-TS scores of the ﬁrst models. The
GDT-TS Z-score of a domain was calculated by (x−µ)/σ, where x
is the GDT-TS score of the target model, µ and σ, respectively, are
the mean and standard deviation of the GDT-TS scores of all the
models of the domain from the various predictors (Kryshtafovych
et al., 2009b). On 50 TBM-HA domains, MULTICOM, our human
predictor, achieved a cumulative GDT-TS Z-score of 27.41 and
average GDT-TS score of 88.80, which is higher than the best server
predictor. As MUTICOM generates models by combining all the
CASP8 server models, this clearly shows the value and contribution
of our model combination algorithm, at least on the easy targets.
Furthermore, four of the ﬁve MULTICOM server predictors were
ranked within the top 10 of the 121 server predictors (Table 3).
This demonstrates that the multi-level combination approach works
competitively well on high accuracy easy targets.
Table 4 shows the top 10 predictors on the 64 human/server TBM
domains. Most of these domains were considered hard templatebased
cases as only weak templates could be found. MULTICOM
was ranked within the top 10 predictors in terms of cumulative
Z-scores, but ranked below the best server. This may indicate that
the model selection component of MULTICOM was not able to
885
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 886 882–888
Z.Wang et al.
Table 4. Top 10 human and server predictors and MULTICOM predictors
on the ﬁrst models (of the ﬁve possible submissions) of 64 TBM domains
from human/server targets
Predictor Domain number Sum Z-score GDT Average GDT-TS
IBT_LTa 64 67 65
DBAKERb 64 64 64
Zhangc 64 56 64
fams-ace2d 64 52 63
Zhang-servere 64 52 63
TASSERf 64 51 63
SAM-T08-humang 62 51 62
ZicoFullSTPh 64 50 61
Zicoi 64 48 61
MULTICOM 64 48 61
The standard error of the average GDT-TS scores of MULTICOM on these domains is
2.37. This analysis only considers predictors that predicted more than 60 domains.
aVenclovas and Margelevicius (2009).
bRaman et al. (2009).
cZhang (2009).
dTerashi et al. (2008).
eIndicates a server predictor; otherwise, it is a human predictor.
f Zhou et al. (2009).
gKarplus (2008).
hGirgis et al. (2008).
iGirgis and Fischer (2008).
Table 5. Top 10 CASP8 server predictors on the ﬁrst models (of ﬁve possible
submissions) of 154 TBM domains
Predictor Domain No. Sum Z-score GDT Avg. GDT-TS
Zhang-server 154 104 71
RAPTOR 154 86 69
Pro-sp3-TASSER 154 81 68
Phyre_de_novo 154 79 68
HHpred5 154 79 66
BAKER-ROBETTAa 154 76 67
METATASSERb 154 75 67
HHpred4c 154 75 67
MULTI-CLUSTER 154 73 67
MULTI-REFINE 154 71 67
The standard error of the average GDT-TS scores of MULTICOM-CLUSTER and
MULTICOM-REFINE are 1.66 and 1.69, respectively.
This analysis only considers predictors that predicted more than 150 domains.
aRaman et al. (2009).
bZhou et al. (2009).
cHildebrand et al. (2009).
select the top models for the hard TBM targets. Table 5 reports
the results of the top 10 server predictors on 154 template-based
domains. Two of our server predictors (MULTICOM-CLUSTER,
MULTICOM-REFINE) were ranked within top 10 in terms of both
Z-scores and average GDT-TS scores. Their performance in terms
of average GDT-TS is close to the second best predictor, indicating
the multi-level combination is competitive in this category.
In the category of template free modeling, the CASP8 ofﬁcial
assessment (Ben-David, 2009) mainly used two measures to
evaluate predictors in 13 FM domains: scoring scheme A and M.
Scoring scheme A indicates the total number of models with high
Fig. 2. Comparisons between the experimental structure (a) of domain 1
of T0435, the ﬁrst model of MULTICOM (b), and four models combined
by MULTICOM (c–f). The GDT-TS scores are listed inside parentheses.
MULTICOM model (b) is the best model among all the server and human
models for this domain. (c) The second best sever model and the best model
selected and combined by MULTICOM. MULTICOM correctly predicted
a beta-strand [green in (b)], which was not correctly predicted by any of
the four models it combined [green, (c–f)]. Furthermore, a helix (red) was
correctly modeled in (b) and (c), but not in any of the other models [red,
(d–f)]. This indicates that the model combination algorithm can detect and
combine portions with good qualities, and further reﬁne structural portions
to achieve a better overall quality.
quality (within top 3), whereas scoring scheme M highlights the
number of targets, in which a group generated high quality models.
The scheme A and M scores of MULTICOM on 13 FM domains
are 24 and 7, respectively (Ben-David, 2009). Both of them were
ranked ﬁrst among all human and server predictors, which clearly
indicates its strength, and further highlights that the evaluation
guided model combination approach can effectively select and reﬁne
low-resolution models generated on hard FM targets. The average
GDT-TS scores of our server predictors ranged from ∼ 29 to 31 (data
886
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 887 882–888
Protein structure prediction and its assessments in CASP8
Fig. 3. Comparisons between the experimental structure (a) of domain 2 of T0501, the ﬁrst MULTICOM model (b), and eight of the 20 models MULTICOM
combined (c–j). The GDT-TS scores are listed inside parentheses. (b) The best model among all the server and human models for this domain. (c) The best
server model. METATASSER did not rank its best model (c) as the top one model, but this model was included into the combination process of MULTICOM.
In this case, the combined model (b) achieved a better quality than all the models it did or did not combine.
not shown), lower than 40 of the MULTICOM human predictor. The
reason is that the human predictor used a large pool of input models,
which contained some good quality third-party models for the FM
targets.
3.3 A deeper look into model combination
The CASP8 ofﬁcial evaluations have statistically shown the good
performance of our model combination approach. To delve further
into the effectiveness of our approach, we examined several of the
models generated by MUTLICOM and the source of these models
(i.e. those models it chose to combine). We found that multi-model
combination can improve structure prediction in two ways. First,
it can combine complementary good regions from multiple models
to generate a model that is better than all the models it combined
(see Fig. 2 for an example). Second, it can include good models
that were not originally ranked as the ﬁrst model and combine these
models or portions of them to generate a model that is better than
the ﬁrst model (see Fig. 3 for an example). In general, the model
combination process is a selective averaging process, which can
produce a model that is on average better than or as good as the
top model among combined models. On 11 CASP8 domains, for
instance, the combined models generated by MULTICOM achieved
the best qualities among all the server and human models. However,
the performance of the approach relies on the selection of the
good models for combination. This explains why MULTICOM
achieved better performance than the best server on high-accuracy
and free-modeling targets, but not on hard template-based targets,
whose models often contain both a good structure core and bad
local regions (e.g. unfolded tails) that may make ModelEvalutor to
underestimate their quality. Our CASP8 experiments demonstrate
the overall success of MULTICOM although some parts of it, such
as its model selection abilities on hard template-based targets may
need improvement.
4 CONCLUSIONS
We described a comprehensive and effective approach to combine
multiple templates, alternative alignments, and similar models under
the guidance of model quality assessment. This approach was
successfully applied to protein structure modeling during the recent
CASP8 experiments. Our results show that our approach is effective
for the full spectrum of protein modeling, particularly for highaccuracy
TBM and FM. Compared with most existing protein
structure prediction systems, our approach contains a unique and
novel model combination step that can reﬁne protein models by
averaging complementary good models or fragments. The general
combination approach can be further improved at each modeling step
(e.g. model ranking) or by integrating complementary techniques.
We are currently improving the performance of the method for
hard template-based targets by increasing the accuracy of model
ranking and integrating ab initio modeling with TBM to enhance
model generation. We plan to test our improved systems that are
largely based on MULTICOM-CLUSTER, MULTICOM-REFINE
and MULTICOM in the CASP9 experiment.
Funding: This work is partially supported by an University of
Missouri (UM) research board grant and an University of Missouri,
Columbia (MU) research council grant to J.C.
Conﬂict of Interest: none declared.
REFERENCES
Baker,D. and Sali,A. (2001) Protein structure prediction and structural genomics.
Science, 294, 93–96.
Ben-David,M. et al. (2009) Assessment of CASP8 structure predictions for template
free targets. Proteins, 77, 50–65.
Cheng,J. (2008) A multi-template combination algorithm for protein comparative
modeling. BMC Struct. Biol., 8, 18.
Cheng,J. and Baldi,P. (2006) A machine learning information retrieval approach to
protein fold recognition. Bioinformatics, 22, 1456–1463.
887
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom
[15:24 17/3/2010 Bioinformatics-btq058.tex] Page: 888 882–888
Z.Wang et al.
Cheng,J. et al. (2005) SCRATCH: a protein structure and structural feature prediction
server. Nucleic Acids Res., 33, W72–W76.
Cheng,J. et al. (2009) Prediction of global and local quality of CASP8 models by
MULTICOM series. Proteins, 77, 181–184.
Cozzetto,D. et al. (2009) Evaluation of template-based models in CASP8 with standard
measures. Proteins, 77, 18–28.
Edgar,R. (2004) MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res., 32, 1792–1797.
Edgar,R. and Sjolander,K. (2003) SATCHMO: sequence alignment and tree
construction using hidden Markov models. Bioinformatics, 19, 1404–1411.
Fiser,A. and Sali,A. (2003) Modeller: generation and reﬁnement of homology-based
protein structure models. Meth. Enzymol., 374, 461–491.
Girgis,H.Z. and Fischer,D. (2008) Hierarchy of general linear models for selecting
and ranking the best predicted protein structures. In proceedings of the critical
assessment of techniques for protein structure prediction - eighth meeting, Cagliari,
Sardinia, Italy, pp. 120–121.
Girgis,H.Z. et al. (2008) On-line hierarchy of general linear models for selecting
and ranking the best predicted protein structures. In proceedings of the critical
assessment of techniques for protein structure prediction - eighth meeting, Cagliari,
Sardinia, Italy, pp. 122–123.
Hildebrand,A. et al. (2009) Fast and accurate automatic structure prediction with
HHpred. Proteins, 77, 128–132.
Hinds,D.A. and Levitt,M. (1994) Exploring conformational space with a simple lattice
model for protein structure. J. Mol. Biol., 243, 668–682.
Jaravine,V. et al. (2006) Removal of a time barrier for high-resolution multidimensional
NMR spectroscopy. Nature Methods, 3, 605–607.
Karplus,K. (2008) SAM-T08-human. In proceedings of the critical assessment of
techniques for protein structure prediction - eighth meeting, Cagliari, Sardinia,
Italy, pp. 95.
Keedy,D. et al. (2009) The other 90% of the protein: assessment beyond the Calphas
for CASP8 template-based and high-accuracy models. Proteins, 77, 29–49.
Kelley,L.A. et al. (2008) From comparative modeling to de novo folding with Phyre,
Poing and Phragment. In proceedings of the critical assessment of techniques for
protein structure prediction - eighth meeting, Cagliari, Sardinia, Italy, pp. 111–112.
Kim,D.E. et al. (2008) Robetta de novo and homology modeling in CASP8. In
proceedings of the critical assessment of techniques for protein structure prediction
- eighth meeting, Cagliari, Sardinia, Italy, pp. 7–8.
Kryshtafovych,A. et al. (2005) Progress over the ﬁrst decade of CASP experiments.
Proteins, 7, 225–236.
Kryshtafovych,A. et al. (2007) Progress from CASP6 to CASP7. Proteins, 69, 194–207.
Kryshtafovych,A. et al. (2009a) CASP8 results in context of previous experiments.
Proteins, 77, 217–228.
Kryshtafovych,A. et al. (2009b) Protein Structure Prediction Center in CASP8. Proteins,
77, 5–9.
Larkin,M.A. and Blackshields,G. (2007) ClustalW and ClustalX version 2.0.
Bioinformatics, 23, 2947–2948.
Lattman,E. (2004) The state of the protein structure initiative. Proteins, 54, 611–615.
Moult,J. et al. (2009) Critical assessment of methods of protein structure prediction
(CASP)-round VIII. Proteins, 77(Suppl. 9) 1–4.
Pandit,S.B. et al. (2008) METATASSER: a 3D-jury threading approach with TASSER
model assembly/reﬁnement. In proceedings of the critical assessment of techniques
for protein structure prediction - eighth meeting, Cagliari, Sardinia, Italy, pp. 63–64.
Sadreyev,R. and Grishin,N. (2003) COMPASS: a tool for comparison of multiple protein
alignments with assessment of statistical signiﬁcance. J. Mol. Biol., 326, 317–336.
Service,R. (2005) STRUCTURAL BIOLOGY: structural genomics, round 2. Science,
307, 1554–1558.
Siew,N. et al. (2000) MaxSub: an automated measure for the assessment of protein
structure prediction quality. Bioinformatics, 16, 776–785.
Simons,K. et al. (1997) Assembly of protein tertiary structures from fragments with
similar local sequences using simulated annealing and Bayesian scoring functions.
J. Mol. Biol., 268, 209–225.
Soding,J. (2005) Protein homology detection by HMM-HMM comparison.
Bioinformatics, 21, 951–960.
Sternberg,M. and Thornton,J. (1978) Prediction of protein structure from amino acid
sequence. Nature, 271, 15–20.
Terashi,G. et al. (2008) Structure evaluation program using the local consensus-based
similarity and circle quality assessment method. In proceedings of the critical
assessment of techniques for protein structure prediction - eighth meeting, Cagliari,
Sardinia, Italy, pp 27–28.
Thompson,J. et al. (2008) Comparative modeling of protein structures in CASP8 using
full-atom Rosetta reﬁnement and manual alignment selection. In proceedings of the
critical assessment of techniques for protein structure prediction - eighth meeting,
Cagliari, Sardinia, Italy, pp. 21–22.
Venclovas,C. and Margelevicius,M. (2009) The use of automatic tools and human
expertise in template-based modeling of CASP8 target proteins. Proteins, 77, 81–88.
Wang,Z. et al. (2008) Evaluating the absolute quality of a single protein model using
structural features and support vector machines. Proteins, 75, 638–647.
Xu,J. et al. (2009) Template-based and free modeling by RAPTOR++ in CASP8.
Proteins, 77, 133–137.
Zemla,A. (2003) LGA: a method for ﬁnding 3D similarities in protein structures.
Nucleic Acids Res., 31, 3370–3374.
Zemla,A. et al. (1999) Processing and analysis of CASP3 protein structure predictions.
Proteins, 37, 22–29.
Zhang,Y. (2008) Progress and challenges in protein structure prediction. Curr. Opin.
Struct. Biol., 18, 342–348.
Zhang,Y. and Skolnick,J. (2004a) Scoring function for automated assessment of protein
structure template quality. Proteins, 57, 702–710.
Zhang,Y. and Skolnick,J. (2004b) SPICKER: a clustering approach to identify nearnative
protein folds. J. Comput. Chem., 25, 865–871.
Zhang,Y. and Skolnick,J. (2005) The protein structure prediction problem could be
solved using the current PDB library. Proc. Natl. Acad. Sci., 102, 1029–1034.
Zhang,Y. (2009) I-TASSER: fully automated protein structure prediction in CASP8.
Proteins, 77, 100–113.
Zhou,H. et al. (2009) Performance of the Pro-sp3-TASSER server in CASP8. Proteins,
77, 123–127.
Zhou,H. and Zhou,Y. (2005) SPEM: improving multiple sequence alignment
with sequence proﬁles and predicted secondary structures. Bioinformatics, 21,
3615–3621.
Zhou,H. et al. (2008) TASSER-based protein structure prediction in CASP8. In
proceedings of the critical assessment of techniques for protein structure prediction
- eighth meeting, Cagliari, Sardinia, Italy, pp. 115–116.
888
byguestonOctober11,2015http://bioinformatics.oxfordjournals.org/Downloadedfrom