Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2005), 14:13-23. Published by Cold Spring Harbor Laboratory Press. Copyright © 2005 The Protein Society
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Blades, M. J.
Right arrow Articles by Findlay, J. B.C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blades, M. J.
Right arrow Articles by Findlay, J. B.C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Automatic generation and evaluation of sparse protein signatures for families of protein structural domains

Matthew J. Blades1, Jon C. Ison2, Ranjeeva Ranasinghe2 and John B.C. Findlay3

1 AstraZeneca R&D Charnwood, Loughborough, Leicestershire LE11 5RH, England
2 MRC Rosalind Franklin Centre for Genomics Research (formerly the MRC UK HGMP Resource Centre), Hinxton, Cambridgeshire CB10 1SB, United Kingdom
3 School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, United Kingdom

(RECEIVED July 6, 2004; FINAL REVISION September 7, 2004; ACCEPTED September 7, 2004)

We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%–30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.

Keywords: sparse protein signature; SCOP; domain; protein family; ROC analysis; EMBOSS

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04929005.


Reprint requests to: Matthew J. Blades, AstraZeneca R&D Charnwood, Bakewell Road, Loughborough, Leicestershire LE11 5RH, England; e-mail: matthew.blades{at}astrazeneca.com; fax: +44 (0) 1509-645557.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by The Protein Society.