Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print September 25, 2006, 10.1110/ps.062125306
Protein Science (2006), 15:2499-2506. Published by Cold Spring Harbor Laboratory Press. Copyright © 2006 The Protein Society
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
ps.062125306v1
15/11/2499    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sen, T. Z.
Right arrow Articles by Jernigan, R. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sen, T. Z.
Right arrow Articles by Jernigan, R. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining

Taner Z. Sen1,2, Haitao Cheng1,2, Andrzej Kloczkowski2, and Robert L. Jernigan2

1 Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-3020, USA
2 L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa 50011-3020, USA

(RECEIVED January 30, 2006; FINAL REVISION May 11, 2006; ACCEPTED July 31, 2006)

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q3 ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.

Keywords: secondary structure prediction; GOR; Fragment Database Mining; structural fragments; multiple sequence alignments; PSIPRED



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Cheng, T. Z. Sen, R. L. Jernigan, and A. Kloczkowski
Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM)
Bioinformatics, October 1, 2007; 23(19): 2628 - 2630.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2006 by The Protein Society.