Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by PEARSON, W. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by PEARSON, W. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Protein Science, Vol 4, Issue 6 1145-1160, Copyright © 1995 by Cold Spring Harbor Laboratory Press


ARTICLE

Comparison of methods for searching protein sequence databases

W. R. PEARSON
Department of Biochemistry, University of Virginia, Charlottesville, Virginia 22908

We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (ln()-scaling). With the best modern scoring matrix (BLOSUM55 or JO93) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -1), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUM50 and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or ln()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and ln()-scaling.
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
X. Huang and D. L. Brutlag
Dynamic use of multiple parameter sets in sequence alignment
Nucleic Acids Res., January 28, 2007; 35(2): 678 - 686.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
M. H. Serres and M. Riley
Genomic Analysis of Carbon Source Metabolism of Shewanella oneidensis MR-1: Predictions versus Experiments
J. Bacteriol., July 1, 2006; 188(13): 4601 - 4609.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. G. Leonardi
A generalization of the PST algorithm: modeling the sparse nature of protein sequences
Bioinformatics, June 1, 2006; 22(11): 1302 - 1307.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y.-m. Huang and C. Bystroff
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions
Bioinformatics, February 15, 2006; 22(4): 413 - 422.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. A. Price, G. E. Crooks, R. E. Green, and S. E. Brenner
Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap
Bioinformatics, October 15, 2005; 21(20): 3824 - 3831.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Nozaki and M. Bellgard
Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties
Bioinformatics, April 15, 2005; 21(8): 1421 - 1428.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Chevreux, T. Pfisterer, B. Drescher, A. J. Driesel, W. E.G. Muller, T. Wetter, and S. Suhai
Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs
Genome Res., June 1, 2004; 14(6): 1147 - 1159.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
M. L. Sierk and W. R. Pearson
Sensitivity and selectivity in protein structure comparison
Protein Sci., March 1, 2004; 13(3): 773 - 785.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
R. Nair and B. Rost
Sequence conserved for subcellular localization
Protein Sci., December 1, 2002; 11(12): 2836 - 2847.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
P. Liang, B. Labedan, and M. Riley
Physiological genomics of Escherichia coli protein families
Physiol Genomics, April 10, 2002; 9(1): 15 - 26.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B.-J. M. Webb, J. S. Liu, and C. E. Lawrence
BALSA: Bayesian algorithm for local sequence alignment
Nucleic Acids Res., March 1, 2002; 30(5): 1268 - 1277.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
A. J. Mackey, T. A. J. Haystead, and W. R. Pearson
Getting More from Less: Algorithms for Rapid Protein Identification with Multiple Short Peptide Sequences
Mol. Cell. Proteomics, February 1, 2002; 1(2): 139 - 147.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
J. Lundstrom, L. Rychlewski, J. Bujnicki, and A. Elofsson
Pcons: A neural-network-based consensus predictor that improves fold recognition
Protein Sci., November 1, 2001; 10(11): 2354 - 2362.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. A. Schaffer, L. Aravind, T. L. Madden, S. Shavirin, J. L. Spouge, Y. I. Wolf, E. V. Koonin, and S. F. Altschul
Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements
Nucleic Acids Res., July 15, 2001; 29(14): 2994 - 3005.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
A. Prlic, F. S. Domingues, and M. J. Sippl
Structure-derived substitution matrices for alignment of distantly related sequences
Protein Eng. Des. Sel., August 1, 2000; 13(8): 545 - 550.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. G. Reese, G. Hartzell, N. L. Harris, U. Ohler, J. F. Abril, and S. E. Lewis
Genome Annotation Assessment in Drosophila melanogaster
Genome Res., April 1, 2000; 10(4): 483 - 501.
[Abstract] [Full Text]


Home page
Hum Mol GenetHome page
T. Sapir, D. Horesh, M. Caspi, R. Atlas, H. A. Burgess, S. G. Wolf, F. Francis, J. Chelly, M. Elbaum, S. Pietrokovski, et al.
Doublecortin mutations cluster in evolutionarily conserved functional domains
Hum. Mol. Genet., March 22, 2000; 9(5): 703 - 712.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
A. C.W. May
Towards more meaningful hierarchical classification of amino acid scoring matrices
Protein Eng. Des. Sel., September 1, 1999; 12(9): 707 - 712.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
V. Geetha, V. Di Francesco, J. Garnier, and P. J. Munson
Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs
Protein Eng. Des. Sel., July 1, 1999; 12(7): 527 - 534.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
S. R. Sunyaev, F. Eisenhaber, I. V. Rodchenkov, B. Eisenhaber, V. G. Tumanyan, and E. N. Kuznetsov
PSIC: profile extraction from sequence alignments with position-specific counts of independent observations
Protein Eng. Des. Sel., May 1, 1999; 12(5): 387 - 394.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
A. A. Salamov, M. Suwa, C. A. Orengo, and M. B. Swindells
Combining sensitive database searches with multiple intermediates to detect distant homologues
Protein Eng. Des. Sel., February 1, 1999; 12(2): 95 - 100.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
F. Kragler, G. Lametschwandtner, J. Christmann, A. Hartig, and J. J. Harada
Identification and analysis of the plant peroxisomal targeting signal 1 receptor NtPEX5
PNAS, October 27, 1998; 95(22): 13336 - 13341.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. E. Brenner, C. Chothia, and T. J. P. Hubbard
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
PNAS, May 26, 1998; 95(11): 6073 - 6078.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R F Smith
Perspectives: sequence data base searching in the era of large-scale genomic sequencing.
Genome Res., August 1, 1996; 6(8): 653 - 660.
[Abstract] [PDF]


Home page
J. Biol. Chem.Home page
K. J. Chave, I. E. Auger, J. Galivan, and T. J. Ryan
Molecular Modeling and Site-directed Mutagenesis Define the Catalytic Motif in Human gamma -Glutamyl Hydrolase
J. Biol. Chem., December 15, 2000; 275(51): 40365 - 40370.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. E. Graham, R. Overbeek, G. J. Olsen, and C. R. Woese
An archaeal genomic signature
PNAS, March 28, 2000; 97(7): 3304 - 3308.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
P. Liang, B. Labedan, and M. Riley
Physiological genomics of Escherichia coli protein families
Physiol Genomics, April 10, 2002; 9(1): 15 - 26.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 1995 by The Protein Society.