|
|
||||||||
1 Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan
2 Department of Developmental Biology and Neuroscience, Graduate School of Life Sciences and
3 Department of Molecular Immunology, Institute of Development, Aging and Cancer, Tohoku University, Sendai 980-8577, Japan
(RECEIVED April 15, 2004; FINAL REVISION May 19, 2004; ACCEPTED May 19, 2004)
We propose a new method for classifying and identifying transmembrane (TM) protein functions in proteome-scale by applying a single-linkage clustering method based on TM topology similarity, which is calculated simply from comparing the lengths of loop regions. In this study, we focused on 87 prokaryotic TM proteomes consisting of 31 proteobacteria, 22 gram-positive bacteria, 19 other bacteria, and 15 archaea. Prior to performing the clustering, we first categorized individual TM protein sequences as "known," "putative" (similar to "known" sequences), or "unknown" by using the homology search and the sequence similarity comparison against SWISS-PROT to assess the current status of the functional annotation of the TM proteomes based on sequence similarity only. More than three-quarters, that is, 75.7% of the TM protein sequences are functionally "unknown," with only 3.8% and 20.5% of them being classified as "known" and "putative," respectively. Using our clustering approach based on TM topology similarity, we succeeded in increasing the rate of TM protein sequences functionally classified and identified from 24.3% to 60.9%. Obtained clusters correspond well to functional superfamilies or families, and the functional classification and identification are successfully achieved by this approach. For example, in an obtained cluster of TM proteins with six TM segments, 109 sequences out of 119 sequences annotated as "ATP-binding cassette transporter" are properly included and 122 "unknown" sequences are also contained.
Keywords: transmembrane protein; transmembrane topology similarity; functional classification and identification; proteome-wide analysis; prokaryotic genome
Abbreviations: ABC, ATP-binding cassette n-tms, with n transmembrane segment(s) ORF, open reading frame SP, signal peptide TM, transmembrane TMS, transmembrane segment
Reprint requests to: Toshio Shimizu, Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, 3, Bunkyo-cho, Hirosaki 036-8561, Japan; e-mail: slsimi{at}si.hirosaki-u.ac.jp; fax: 81-172-39-3638.
Supplemental material: see www.proteinscience.org
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04814404.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |