|
|
||||||||
1 School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
2 The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
(RECEIVED June 12, 2007; FINAL REVISION July 9, 2007; ACCEPTED July 9, 2007)
There are 3,200,000 amino acid sequences of length 5 (penta-peptides). Statistically, we expect to see a distribution of penta-peptides that is determined by the frequency of the participating amino acids. We show, however, that not only are there thousands of such penta-peptides that are absent from all known proteomes, but many of them are coded for multiple times in the non-coding genomic regions. This suggests a strong selection process that prevents these peptides from being expressed. We also show that the characteristics of these forbidden penta-peptides vary among different phylogenetic groups (e.g., eukaryotes, prokaryotes, and archaea). Our analysis provides the first steps toward understanding the "grammar" of the forbidden penta-peptides.
Keywords: short peptides; proteomes; evolutionary selection; protein grammar; phylogenetic groups
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |