Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Pearl, F. M.G.
Right arrow Articles by Orengo, C. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pearl, F. M.G.
Right arrow Articles by Orengo, C. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Protein Science (2002), 11:233-244.
Copyright © 2002 The Protein Society

The CATH extended protein-family database: Providing structural annotations for genome sequences

Frances M.G. Pearl1,3, David Lee1,2,3, James E. Bray1, Daniel W.A. Buchan1, Adrian J. Shepherd1 and Christine A. Orengo1

1 Department of Biochemistry and Molecular Biology, University College London, University of London, London WC1E 6BT, UK
2 Department of Crystallography, Birkbeck College, University of London, London WC1E 7HX, UK

Reprint requests to: Dr. Frances Pearl, Department of Biochemistry and Molecular Biology, University College London, University of London, Gower Street, London WC1E 6BT, UK; e-mail: frances{at}biochem.ucl.ac.uk; fax: 44 (0) 20 7679 7193.

An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.

Keywords: Structural genomics; fold assignment; homology; sequence profiles; CATH


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. A. Rodriguez, T. Bompada, M. Syed, P. K. Shah, and N. Maltsev
Evolutionary analysis of enzymes using Chisel
Bioinformatics, November 15, 2007; 23(22): 2961 - 2968.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
I. Sillitoe, M. Dibley, J. Bray, S. Addou, and C. Orengo
Assessing strategies for improved superfamily recognition
Protein Sci., July 1, 2005; 14(7): 1800 - 1810.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, A. Grant, D. Lee, et al.
The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D247 - D251.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. J. Mulder, R. Apweiler, T. K. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley, P. Bork, et al.
The InterPro Database, 2003 brings increased coverage and new features
Nucleic Acids Res., January 1, 2003; 31(1): 315 - 318.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. M. G. Pearl, C. F. Bennett, J. E. Bray, A. P. Harrison, N. Martin, A. Shepherd, I. Sillitoe, J. Thornton, and C. A. Orengo
The CATH database: an extended protein family resource for structural and functional genomics
Nucleic Acids Res., January 1, 2003; 31(1): 452 - 455.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. W. A. Buchan, S. C. G. Rison, J. E. Bray, D. Lee, F. Pearl, J. M. Thornton, and C. A. Orengo
Gene3D: structural assignments for the biologist and bioinformaticist alike
Nucleic Acids Res., January 1, 2003; 31(1): 469 - 473.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. W.A. Buchan, A. J. Shepherd, D. Lee, F. M.G. Pearl, S. C.G. Rison, J. M. Thornton, and C. A. Orengo
Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database
Genome Res., March 1, 2002; 12(3): 503 - 514.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by The Protein Society.