|
|
||||||||
1 Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA
2 Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, California 94143, USA
Reprint requests to: Andrej Sali, Mission Bay Genentech Hall, Ste. N472D, 600 16th St., University of California at San Francisco, San Francisco, CA 94143, USA; e-mail: sali{at}salilab.org; fax: (415) 514-4231.
We developed a variant of the intermediate sequence search method (ISSnew) for detection and alignment of weakly similar pairs of protein sequences. ISSnew relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISSnew performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISSnew assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISSnew may be able to assign the folds of domains in ~29,000 of the ~500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISSnew.
Keywords: protein homology; protein evolution; sequence alignment; comparative protein structure modeling; fold assignment
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
H. Li, X. Dai, and X. Zhao A nearest neighbor approach for automated transporter prediction and categorization from protein sequences Bioinformatics, May 1, 2008; 24(9): 1129 - 1136. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Darewicz, J. Dziuba, and P. Minkiewicz Computational Characterisation and Identification of Peptides for in silico Detection of Potentially Celiac-Toxic Proteins Food Science and Technology International, April 1, 2007; 13(2): 125 - 133. [Abstract] [PDF] |
||||
![]() |
J. Sim, S.-Y. Kim, and J. Lee Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method Bioinformatics, June 15, 2005; 21(12): 2844 - 2849. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Espadaler, R. Aragues, N. Eswar, M. A. Marti-Renom, E. Querol, F. X. Aviles, A. Sali, and B. Oliva Detecting remotely related proteins by their interactions and sequence similarity PNAS, May 17, 2005; 102(20): 7151 - 7156. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |