|
|
||||||||
Protein Science, Vol 7, Issue 2 445-456, Copyright © 1998 by Cold Spring Harbor Laboratory Press
ARTICLE |
M. GERSTEIN and M. LEVITT
Molecular Biophysics & Biochemistry Department, P.O. Box 208114, Yale University, New Haven, Connecticut 06520-8114
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds. Supplementary material is available at http://bioinfo.mbb.yale.edu/align.
This article has been cited by other articles:
![]() |
H. Viklund and A. Elofsson OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar Bioinformatics, August 1, 2008; 24(15): 1662 - 1668. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Pascual, D. Wang, R. Yang, L. Shi, H. Yang, and D. C. De Vivo Structural Signatures and Membrane Helix 4 in GLUT1: INFERENCES FROM HUMAN BLOOD-BRAIN GLUCOSE TRANSPORT MUTANTS J. Biol. Chem., June 13, 2008; 283(24): 16732 - 16742. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. R. Dalton and R. M. Jackson An evaluation of automated homology modelling methods at low target template sequence similarity Bioinformatics, August 1, 2007; 23(15): 1901 - 1908. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lerman and B. E. Shakhnovich Defining functional distance using manifold embeddings of gene ontology annotations PNAS, July 3, 2007; 104(27): 11334 - 11339. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Pandini, G. Mauri, A. Bordogna, and L. Bonati Detecting similarities among distant homologous proteins by comparison of domain flexibilities Protein Eng. Des. Sel., June 30, 2007; (2007) gzm021v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Strickland, E. Barnes, and J. S. Sokol Optimal Protein Structure Alignment Using Maximum Cliques Operations Research, May 1, 2005; 53(3): 389 - 402. [Abstract] [PDF] |
||||
![]() |
K. Julenius, A. Molgaard, R. Gupta, and S. Brunak Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites Glycobiology, February 1, 2005; 15(2): 153 - 164. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Blades, J. C. Ison, R. Ranasinghe, and J. B.C. Findlay Automatic generation and evaluation of sparse protein signatures for families of protein structural domains Protein Sci., January 1, 2005; 14(1): 13 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ye and A. Godzik Database searching by flexible protein structure alignment Protein Sci., July 1, 2004; 13(7): 1841 - 1850. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. John and A. Sali Detection of homologous proteins by an intermediate sequence search Protein Sci., January 1, 2004; 13(1): 54 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Blankenbecler, M. Ohlsson, C. Peterson, and M. Ringner Matching protein structures with fuzzy alignments PNAS, October 14, 2003; 100(21): 11936 - 11940. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Day, D. A.C. Beck, R. S. Armen, and V. Daggett A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary Protein Sci., October 1, 2003; 12(10): 2150 - 2160. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Rogen and B. Fain Automatic classification of protein structure by using Gauss integrals PNAS, January 7, 2003; 100(1): 119 - 124. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Qian, B. Stenger, C. A. Wilson, J. Lin, R. Jansen, S. A. Teichmann, J. Park, W. G. Krebs, H. Yu, V. Alexandrov, et al. PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information Nucleic Acids Res., April 15, 2001; 29(8): 1750 - 1764. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Balaji and N. Srinivasan Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins Protein Eng. Des. Sel., April 1, 2001; 14(4): 219 - 226. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Jung and B. Lee Protein structure alignment using environmental profiles Protein Eng. Des. Sel., August 1, 2000; 13(8): 535 - 543. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lin and M. Gerstein Whole-genome Trees Based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels Genome Res., June 1, 2000; 10(6): 808 - 818. [Abstract] [Full Text] |
||||
![]() |
W. G. Krebs and M. Gerstein SURVEY AND SUMMARY: The morph server: a standardized system for analyzing and visualizing macromolecular motions in a database framework Nucleic Acids Res., April 15, 2000; 28(8): 1665 - 1675. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Brenner, P. Koehl, and M. Levitt The ASTRAL compendium for protein structure and sequence analysis Nucleic Acids Res., January 1, 2000; 28(1): 254 - 256. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. V. Grigoriev and S.-H. Kim Detection of protein fold similarity based on correlation of amino acid properties PNAS, December 7, 1999; 96(25): 14318 - 14323. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Levitt and M. Gerstein A unified statistical framework for sequence comparison and structure comparison PNAS, May 26, 1998; 95(11): 5913 - 5920. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |