|
|
||||||||
1 BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland
2 Center of Excellence in Bioinformatics and Computer Science and Engineering, University at Buffalo, Buffalo, New York 14203, USA
3 Bioinformatics, Department of Computer Science, Ben Gurion University Beer-Sheva 84015, Israel
Reprint requests to: Daniel Fischer, Center of Excellence in Bioinformatics and Computer Science and Engineering, University at Buffalo, 901 Washington St., Suite 300, Buffalo, NY 14203, USA; e-mail: dfischer{at}bioinformatics.buffalo.edu; fax: (716) 849-6747.
(RECEIVED May 25, 2004; FINAL REVISION September 23, 2004; ACCEPTED September 23, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: protein structure prediction; LiveBench; CAFASP
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04888805.
| Introduction |
|---|
|
|
|---|
LB continuously assesses the capabilities of automated servers using a relatively large number of prediction targets compiled every week from newly released protein structures, and provides an assessment of the servers capabilities approximately every half year. LB thus complements the CASP/CAFASP experiments (Fischer et al. 2003; Lattman 2003) which are held every two years using a significantly smaller number of prediction targets. Another large-scale evaluation project that focuses on other aspects of structure prediction is EVA (Rost and Eyrich 2001).
The last LB and CAFASP experiments (Fischer et al. 2003; Lattman 2003; Rychlewski et al. 2003) demonstrated that the so-called meta-serversdefined as servers that need as input the results of other participating serversoutperform all the individual, autonomous servers, and are already challenging most human expert predictors. Since then, new servers and meta-servers have been developed and evaluated in subsequent LB rounds. To obtain an updated snapshot of the predicting capabilities of current servers, and of their expected performance in future experiments, we report the main results from the recently completed LB-8.
| LiveBench-8 |
|---|
|
|
|---|
The LiveBench Web site is a comprehensive, interactive repository of LB results. Because LB considers a number of evaluation methods, the LB Web site allows the user to select the evaluation method to be used as well as the way the results are presented. In addition, the LB site includes data for server predictions that were not submitted immediately upon release of the targets. The LB Web site also allows the user to select the set of servers to be considered. Because of the versatility of the interactive system, there are many ways that one can interpret the LB results. Consequently, understanding the meaning of the LB results may not be straightforward for an outsider. To aid in the interpretation of the LB data, here we report the LB-8 results using the same simplified approach as the one used two years ago in our LB-4 report (Fischer and Rychlewski 2003). This approach is based on the measures of "overall sensitivity" and "overall specificity" (described below) as assessed by the evaluation method MaxSub (Siew et al. 2000). MaxSub is a program that measures the quality of a prediction, by assigning scores between 0.0 (an incorrect prediction) to 1.0 (a perfect prediction). A positive MaxSub score is considered to be a (partially) correct prediction. To further simplify the presentation of the LB results, we discuss the performance of the individual or autonomous servers separately from that of the meta-predictors, and consider only those servers that submitted predictions every week for all 172 targets. Consequently, some of the servers that submitted predictions at a later time, including those submitted after the preparation of this manuscript, are excluded from the results presented here. We refer to the servers names using their LBs four-letter abbreviation and refer the reader to the LB-8 Web pages for their full names.
Sensitivity
Table 1
lists the overall sensitivities of the top performing autonomous servers. Overall sensitivity is defined as the percentage of targets for which a correct prediction is obtained (i.e., a positive MaxSub score; see above). The second column lists the overall sensitivity over all 172 targets, and the following two columns list the overall sensitivity when considering the easy and hard targets separately. In this and other tables, we highlight the three highest numbers in bold.
|
The sensitivities of the best performing autonomous servers in LB-4 were just above 50%, suggesting that there may be a slight improvement in LB-8 over the "old" LB-4 servers. However, because the sensitivities of some of the servers that participated both in LB-4 and in LB-8 are also higher in LB-8, such an improvement may be due in part to the differences in the test sets (LB-8 may have included more "easier" targets), to the growth of the sequence and structural databases, or both.
The autonomous servers with highest overall sensitivity were the recently developed series of "Meta-BASIC" servers (unpublished): BASD, BASP, and MBAS. BASD (Distal-BASIC) and BASP (Proximal-BASIC) are profile-comparison methods. BASD uses two versions of low stringency profiles generated after 5 PSI-BLAST iterations combined with RPS-BLAST searches, and BASP uses profiles generated after three iterations. MBAS (Meta-BASIC) is a local, autonomous meta-predictor, which uses six different versions of profile-alignment methods. We notice that five of the top ranks are now occupied by newly developed servers (BASD, BASP, MBAS, SFST, and STMP). The other ranks are occupied by servers that have also ranked among the top performers in previous experiments: SHGU (Fischer 2003), ORF2 (Rychlewski et al. 2000), FFA3 (Pawlowski et al. 2001), and ORF-s (Rychlewski et al. 2000). Because of the excellent performance of the new servers, other older servers that had ranked among the top performers in previous experiments; e.g., 3DPS (Kelley et al. 2000), INBG (Fischer 2000), MGTH (McGuffin and Jones 2003), ST99 (Karplus and Hu 2001), and the two versions of FUGUE (Shi et al. 2001), now occupy lower ranks. This suggests that there have been positive developments in the field, and that the new servers appear to entail an improvement over the older ones.
The sum of MaxSub scores is an additional sensitivity indicator, which assesses the quality (or completeness) of the generated models (Table 2
). Using this measure, the most sensitive servers are SHGU (an old server from LB-4 that applies the 3D-SHOTGUN meta-prediction approach on locally generated data from INBGU) (Fischer 2003), BASD and MBAS. For comparison, PDBB scores 45% lower than the best servers. The sensitivities of the top servers among the easy targets are very similar, with the two recently developed commercial servers, SFST and STMP, being at the top following SHGU. SFST and STMP are two versions of a profileprofile alignment method developed by the same group, which uses specific gap penalties and composition-based statistics. Among the hard targets, the most sensitive servers are again BASD, BASP, and MBAS, the same servers scoring at the top on overall sensitivity. The difference in scores between the rank-1 and rank-9 servers is 9% and 19%, among the easy and hard targets, respectively. PDBBs performance on the hard targets is significantly lower than that of the top servers, indicating the value of FR servers are for these cases.
|
|
|
The overall specificities in LB-8 are also higher than those in LB-4 (the most specific servers in LB-4 had an overall specificity just below 50%), possibly suggesting that the new servers are slightly more specific, but also reflecting differences in the set of targets and/or the growth of the databases. As with the overall sensitivity results, the differences among the most specific servers are only slight.
Finally, to obtain an overall, single ranking we have computed the average rank that each server receives in each of the assessment categories: sensitivity on "easy," sensitivity on "hard," and specificity, using three different LB evaluation methods. Figure 1
depicts the average ranks of the top 14 autonomous servers plus that of PDBB, using each of the three evaluation methods. The figure confirms that the exact relative rankings can change slightly depending on the evaluation method used, but it demonstrates that the same top performing servers are identified regardless of how they are evaluated.
Meta-servers
Recent LB and CAFASP experiments have demonstrated that meta-servers clearly outperform the individual, autonomous servers. This is not surprising, since a well-designed meta-predictor should perform at least as good as the best of its input components. During LB-8, only three series of reliable, highly available meta-predictors were assessed: the PCONS/PMOD series (Lundstrom et al. 2001), the 3D-SHOTGUN series (Fischer 2003), and the newer 3D-JURY series (Ginalski et al. 2003). Each of these series includes a number of variants, totaling 15 different meta-servers. The top meta-predictors include representatives of each of the series, and have very similar performances, both in sensitivity and in specificity. Roughly, the best meta-predictors are about 7% more sensitive and more specific than the best of the individual servers. The difference in sensitivity is lower among the easy targets and more significant among the hard targets, of which, half are correctly predicted. The 3D-JURY series of meta-predictors, based on principles very similar to those of the PCONS/PMOD and 3D-SHOTGUN series, were developed during the last CAFASP experiment. While the PCONS/PMOD and 3D-SHOTGUN series use a small, fixed number of other autonomous servers as input (e.g., the 3DS3 3D-SHOTGUN meta-predictor uses as few as two external servers), the 3D-JURY servers are in fact meta-meta-predictors because they can use all the available information from other servers and other meta-servers, including those of the PCONS/PMOD and 3D-SHOTGUN series. Consequently, in LB-8, some of the meta-meta-predictors from the 3D-JURY series appear to be slightly superior to the others (see the LB-8 Web-pages for details). Despite the superior performance of meta-predictors, their utility is hampered by their dependence on external services and by their slow response time, sometimes requiring days before they can return a prediction. Local, autonomous, and fast servers such as SHGU and Meta-BASIC, that apply the meta-prediction principles on locally generated data, overcome some of these limitations because they thus provide the user with an increased performance both in correctness of the predictions and in response time.
Other servers
There were a number of other new, autonomous servers that participated in LB-8 for the first time, but did not rank at the top. Some of these were "unofficial" servers that entered LB-8 late, and that could not be properly evaluated because at the time LB-8 was closed they were not yet fully integrated into the LB communication protocol, or because only a small number of their results could be collected. There were also a number of servers that ranked at the top in previous LB or CAFASP experiments that did not participate in LB-8 or simply got lower ranks in LB-8.
| Live Bench and CAFASP |
|---|
|
|
|---|
Based of the (slight) progress observed in LB-8, we expect that this year, the ongoing LB-9 and CAFASP-4 experiments will demonstrate further progress in automatic structure prediction. The past success of meta-predictors will probably result in the proliferation of new and better meta-predictors, which will continue to challenge the best human predictors. However, meta-predictors can only be as good as their components. As in LB-8, we expect that new, better, autonomous servers will continue to be developed. Probably one of the main lessons from LB-8 is that the newly developed, top-ranking autonomous servers BASD, BASP, MBAS, SFST, and STMP apply mainly sequence-based methods. This may suggest that most of the progress observed in LB-8 is focused on better recognition and modeling of relatively close family members. Future experiments may help identify whether any progress exists for the harder cases.
To extend the scope of CAFASPs assessment, CAFASP-4 experiment this year introduced two new subcategories: assessment of domain prediction servers (DP), and model quality assessment programs (MQAPs). Identifying domains in a protein sequence is an essential component of the structure (and function) prediction process. The DP subcategory is aimed at evaluating the performance of current methods. The MQAP subcategory is aimed at evaluating the performance of methods that assign energies, pseudoenergies, or simply scores to a given model. Being able to identify near-native models is an important aspect of structure prediction, not only for ab initio methods, but also for refinement procedures. Past evaluations have demonstrated that MQAPs are not very good at this task, and this new subcategory will attempt to identify the strengths and limitations of current methods. Further information about the upcoming CAFASP-4 experiment can be obtained at http://www.cs.bgu.ac.il/~dfischer/CAFASP4.
| Disclaimer |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403410.[CrossRef][Medline]
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 33893402.
Bujnicki, J.M., Elofsson, A., Fischer, D., and Rychlewski, L. 2001. LiveBench-1: Continuous benchmarking of protein structure prediction servers. Protein Sci. 10: 352361.
Fischer, D. 2000. Hybrid fold recognition: Combining sequence derived properties with evolutionary information. Pac Symp. Biocomput. 119130.
. 2003. 3D-SHOTGUN: A novel, cooperative, fold-recognition meta-predictor. Proteins 51: 434441.[CrossRef][Medline]
Fischer, D. and Eisenberg, D. 1997. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc. Natl. Acad. Sci. 94: 1192911934.
Fischer, D. and Rychlewski, L. 2003. The 2002 Olympic Games of protein structure prediction. Protein Eng. 16: 157160.
Fischer, D., Elofsson, A., and Rychlewski, L. 2000. The 2000 Olympic Games of protein structure prediction; fully automated programs are being evaluated vis-a-vis human teams in the protein structure prediction experiment CAFASP2. Protein Eng. 13: 667670.
Fischer, D., Baker, D., and Moult, J. 2001. We need both computer models and experiments. Nature 409: 558.
Fischer, D., Rychlewski, L., Dunbrack Jr., R.L., Ortiz, A.R., and Elofsson, A. 2003. CAFASP3: The third critical assessment of fully automated structure prediction methods. Proteins 53(Suppl 6): 503516.
Fischer, D., Pas, J., and Rychlewski, L. 2004. The PDB-preview database: A repository of in-silico models of "on-hold" PDB entries. Bioinformatics 20: 24822484.
Ginalski, K., Elofsson, A., Fischer, D., and Rychlewski, L. 2003. 3D-Jury: A simple approach to improve protein structure predictions. Bioinformatics 19: 10151018.
Holm, L. and Sander, C. 1995. Dali: A network tool for protein structure comparison. Trends Biochem. Sci. 20: 478480.[CrossRef][Medline]
Karplus, K. and Hu, B. 2001. Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set. Bioinformatics 17: 713720.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299: 499520.[Medline]
Lattman, E.E. 2003. Fifth Meeting on the Critical Assessment of Techniques for Protein Structure Prediction. Proteins 53(Suppl. 6): 33.[CrossRef][Medline]
Lundstrom, J., Rychlewski, L., Bujnicki, J., and Elofsson, A. 2001. Pcons: A neural-network-based consensus predictor that improves fold recognition. Protein Sci. 10: 23542362.
McGuffin, L.J. and Jones, D.T. 2003. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 19: 874881.
Pawlowski, K., Rychlewski, L., Zhang, B., and Godzik, A. 2001. Fold predictions for bacterial genomes. J. Struct. Biol. 134: 219231.[CrossRef][Medline]
Rost, B. and Eyrich, V.A. 2001. EVA: Large-scale analysis of secondary structure prediction. Proteins Suppl 5: 192199.[Medline]
Rychlewski, L., Jaroszewski, L., Li, W., and Godzik, A. 2000. Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9: 232241.[Abstract]
Rychlewski, L., Fischer, D., and Elofsson, A. 2003. LiveBench-6: Large-scale automated evaluation of protein structure prediction servers. Proteins 53 (Suppl 6): 542547.
Shi, J., Blundell, T.L., and Mizuguchi, K. 2001. FUGUE: Sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. 310: 243257.[CrossRef][Medline]
Siew, N., Elofsson, A., Rychlewski, L., and Fischer, D. 2000. MaxSub: An automated measure for the assessment of protein structure prediction quality. Bioinformatics 16: 776785.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
A. Poleksic and M. Fienup Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms Bioinformatics, May 1, 2008; 24(9): 1145 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Przybylski and B. Rost Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments Nucleic Acids Res., April 1, 2007; 35(7): 2238 - 2246. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chivian and D. Baker Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection Nucleic Acids Res., October 18, 2006; 34(17): e112 - e112. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. N. Offman, P. W. Fitzjohn, and P. A. Bates Developing a move-set for protein model refinement Bioinformatics, August 1, 2006; 22(15): 1838 - 1845. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Eramian, M.-y. Shen, D. Devos, F. Melo, A. Sali, and M. A. Marti-Renom A composite score for predicting errors in protein structure models Protein Sci., July 1, 2006; 15(7): 1653 - 1666. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pellegrini-Calace and A. Tramontano Identification of a novel putative mitogen-activated kinase cascade on human chromosome 21 by computational approaches Bioinformatics, April 1, 2006; 22(7): 775 - 778. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Arnold, L. Bordoli, J. Kopp, and T. Schwede The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling Bioinformatics, January 15, 2006; 22(2): 195 - 201. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Wallner and A. Elofsson All are not equal: A benchmark of different homology modeling programs Protein Sci., May 1, 2005; 14(5): 1315 - 1327. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |