Protein Science CSH PROT
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Published online before print February 2, 2005, 10.1110/ps.041092605
Protein Science (2005), 14:617-625. Published by Cold Spring Harbor Laboratory Press. Copyright © 2005 The Protein Society
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
ps.041092605v1
14/3/617    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Otaki, J. M.
Right arrow Articles by Yamamoto, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Otaki, J. M.
Right arrow Articles by Yamamoto, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Availability of short amino acid sequences in proteins

Joji M. Otaki1, Shunsuke Ienaka2, Tomonori Gotoh2 and Haruhiko Yamamoto1

1 Department of Biological Sciences and
2 Department of Information Science, Kanagawa University, Hiratsuka, Kanagawa 259-1293, Japan

(RECEIVED August 31, 2004; FINAL REVISION November 9, 2004; ACCEPTED November 10, 2004)

Much attention is being paid to protein databases as an important information source for proteome research. Although used extensively for similarity searches, protein databases themselves have not fully been characterized. In a systematic attempt to reveal protein-database characters that could contribute to revealing how protein chains are constructed, frequency distributions of all possible combinatorial sets of three, four, and five amino acids ("triplets," "quartets," and "pentats"; collectively called constituent sequences) have been examined in the nonredundant (nr) protein database, demonstrating the existence of nonrandom bias in their "availability" at the population level. Nonexistent short sequences of pentats were found that showed low availability in biological proteins against their expected probabilities of occurrence. Among them, six representative ones were successfully synthesized as peptides with reasonably high yields in a conventional Fmoc method, excluding the possibility that a putative physicochemical energy barrier in forming them could be a direct cause for the low availability. They were also expressed as soluble fusion proteins in a conventional Escherichia coli BL21Star(DE3) system with reasonably high yield, again excluding a possible difficulty in their biological synthesis. Together, these results suggest that information on three-dimensional structures and functions of proteins exists in the context of connections of short constituent sequences, and that proteins are composed of evolutionarily selected constituent sequences, which are reflected in their availability differences in the database. These results may have biological implications for protein structural studies.

Keywords: protein sequence; database search; sequence availability; constituent sequence; rare short sequence

Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041092605.


Reprint requests to: Joji M. Otaki, Department of Biological Sciences, Kanagawa University, 2946 Tsuchiya, Hiratsuka, Kanagawa 259-1293, Japan; e-mail: otaki{at}bio.kanagawa-u.ac.jp; fax: 81-463-58-9684.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
T. Tuller, B. Chor, and N. Nelson
Forbidden penta-peptides
Protein Sci., October 1, 2007; 16(10): 2251 - 2259.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by The Protein Society.