Protein Science Attend a BioResearch Product Faire
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by May, A. C.W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by May, A. C.W.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Protein Science (2002), 11:2825-2835.
Copyright © 2002 The Protein Society

Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies

Alex C.W. May

Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, London NW7 1AA, UK

Requests for reprints to: Alex C.W. May, Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK; e-mail: amay{at}nimr.mrc.ac.uk; fax: 44 (0) 20 8816 2460.

It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered.

Keywords: Sequence motif; multiple sequence alignment; optimal segmentation; protein homologous family; protein superfamily


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
D. Lupyan, A. Leo-Macias, and A. R. Ortiz
A new progressive-iterative algorithm for multiple structure alignment
Bioinformatics, August 1, 2005; 21(15): 3255 - 3263.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2002 by The Protein Society.