Protein Science Attend a BioResearch Product Faire
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2005), 14:2804-2813. Published by Cold Spring Harbor Laboratory Press. Copyright © 2005 The Protein Society
This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Matsuda, S.
Right arrow Articles by Akutsu, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Matsuda, S.
Right arrow Articles by Akutsu, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

A novel representation of protein sequences for prediction of subcellular location using support vector machines

Setsuro Matsuda1, Jean-Philippe Vert2, Hiroto Saigo1, Nobuhisa Ueda1, Hiroyuki Toh3 and Tatsuya Akutsu1

1 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0111, Japan
2 Centre de Géostatistique, Ecole des Mines de Paris, 77300 Fontainebleau, France
3 Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan

(RECEIVED May 20, 2005; FINAL REVISION August 22, 2005; ACCEPTED August 22, 2005)

As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.

Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.051597405.


Reprint requests to: Setsuro Matsuda, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan; e-mail: smatsuda{at}kuicr.kyoto-u.ac.jp; fax: +81-774-38-3022.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Protein Eng Des SelHome page
H.-B. Shen and K.-C. Chou
Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM
Protein Eng. Des. Sel., November 10, 2007; (2007) gzm057v1.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by The Protein Society.