|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nestlé Research Center, BioAnalytical Science, CH-1000 Lausanne 26, Switzerland
(RECEIVED August 4, 2004; FINAL REVISION October 5, 2004; ACCEPTED October 14, 2004)
Here, we report a novel protein sequence descriptor-based remote homology identification method, able to infer fold relationships without the explicit knowledge of structure. In a first phase, we have individually benchmarked 13 different descriptor types in fold identification experiments in a highly diverse set of protein sequences. The relevant descriptors were related to the fold class membership by using simple similarity measures in the descriptor spaces, such as the cosine angle. Our results revealed that the three best-performing sets of descriptors were the sequence-alignment-based descriptor using PSI-BLAST e-values, the descriptors based on the alignment of secondary structural elements (SSEA), and the descriptors based on the occurrence of PROSITE functional motifs. In a second phase, the three top-performing descriptors were combined to obtain a final method with improved performance, which we named DescFold. Class membership was predicted by Support Vector Machine (SVM) learning. In comparison with the individual PSI-BLAST-based descriptor, the rate of remote homology identification increased from 33.7% to 46.3%. We found out that the composite set of descriptors was able to identify the true remote homolog for nearly every sixth sequence at the 95% confidence level, or some 10% more than a single PSI-BLAST search. We have benchmarked the DescFold method against several other state-of-the-art fold recognition algorithms for the 172 LiveBench-8 targets, and we concluded that it was able to add value to the existing techniques by providing a confident hit for at least 10% of the sequences not identifiable by the previously known methods.
Keywords: remote homology; sequence descriptor; secondary structure; sequence alignment; sequence motif; support vector machine
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041035505.
Reprint requests to: Ziding Zhang, Nestlé Research Center, BioAnalyti-cal Science, CH-1000 Lausanne 26, Switzerland; e-mail: Ziding. Zhang{at}rdls.nestle.com; fax: +41-21-785-9486.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
Y.-R. Tang, Y.-Z. Chen, C. A. Canchaya, and Z. Zhang GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network Protein Eng. Des. Sel., August 1, 2007; 20(8): 405 - 412. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |