|
|
||||||||
1 Department of Biochemistry and Molecular Biology, University College London, University of London, London WC1E 6BT, UK
2 Department of Crystallography, Birkbeck College, University of London, London WC1E 7HX, UK
Reprint requests to: Dr. Frances Pearl, Department of Biochemistry and Molecular Biology, University College London, University of London, Gower Street, London WC1E 6BT, UK; e-mail: frances{at}biochem.ucl.ac.uk; fax: 44 (0) 20 7679 7193.
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.
Keywords: Structural genomics; fold assignment; homology; sequence profiles; CATH
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
A. A. Rodriguez, T. Bompada, M. Syed, P. K. Shah, and N. Maltsev Evolutionary analysis of enzymes using Chisel Bioinformatics, November 15, 2007; 23(22): 2961 - 2968. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Sillitoe, M. Dibley, J. Bray, S. Addou, and C. Orengo Assessing strategies for improved superfamily recognition Protein Sci., July 1, 2005; 14(7): 1800 - 1810. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, A. Grant, D. Lee, et al. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis Nucleic Acids Res., January 1, 2005; 33(suppl_1): D247 - D251. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley, P. Bork, et al. The InterPro Database, 2003 brings increased coverage and new features Nucleic Acids Res., January 1, 2003; 31(1): 315 - 318. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. G. Pearl, C. F. Bennett, J. E. Bray, A. P. Harrison, N. Martin, A. Shepherd, I. Sillitoe, J. Thornton, and C. A. Orengo The CATH database: an extended protein family resource for structural and functional genomics Nucleic Acids Res., January 1, 2003; 31(1): 452 - 455. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. A. Buchan, S. C. G. Rison, J. E. Bray, D. Lee, F. Pearl, J. M. Thornton, and C. A. Orengo Gene3D: structural assignments for the biologist and bioinformaticist alike Nucleic Acids Res., January 1, 2003; 31(1): 469 - 473. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W.A. Buchan, A. J. Shepherd, D. Lee, F. M.G. Pearl, S. C.G. Rison, J. M. Thornton, and C. A. Orengo Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database Genome Res., March 1, 2002; 12(3): 503 - 514. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |