Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by THOMPSON, M. J.
Right arrow Articles by GOLDSTEIN, R. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by THOMPSON, M. J.
Right arrow Articles by GOLDSTEIN, R. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Protein Science, Vol 6, Issue 9 1963-1975, Copyright © 1997 by Cold Spring Harbor Laboratory Press


ARTICLE

Predicting protein secondary structure with probabilistic schemata of evolutionarily derived information

M. J. THOMPSON and R. A. GOLDSTEIN
Biophysics Research Division, University of Michigan, Ann Arbor, Michigan 48109-1055

We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than ``black box'' methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed ``substitution schemata,'' which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike ``expert heuristic'' methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features, and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
J. Biol. Chem.Home page
J. Selent, J. Kaleta, Z. Li, G. Lalmanach, and D. Bromme
Selective Inhibition of the Collagenase Activity of Cathepsin K
J. Biol. Chem., June 1, 2007; 282(22): 16492 - 16501.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
D. D. Pollock, J. A. Eisen, N. A. Doggett, and M. P. Cummings
A Case for Evolutionary Genomics and the Comprehensive Examination of Sequence Biodiversity
Mol. Biol. Evol., December 1, 2000; 17(12): 1776 - 1788.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1997 by The Protein Society.