Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Farnum, M. A.
Right arrow Articles by Agrafiotis, D. K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Farnum, M. A.
Right arrow Articles by Agrafiotis, D. K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Protein Science (2003), 12:1604-1612.
Copyright © 2003 The Protein Society

Exploring the nonlinear geometry of protein homology

Michael A. Farnum, Huafeng Xu and Dimitris K. Agrafiotis

3-Dimensional Pharmaceuticals, Inc., Exton, Pennsylvania 19341, USA

Reprint requests to: Dimitris K. Agrafiotis, 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, PA 19341, USA; e-mail: dimitris.agrafiotis{at}3dp.com; fax: (610) 458-8249.

The explosion of biological data resulting from genomic and proteomic research has created a pressing need for data analysis techniques that work effectively on a large scale. An area of particular interest is the organization and visualization of large families of protein sequences. An increasingly popular approach is to embed the sequences into a low-dimensional Euclidean space in a way that preserves some predefined measure of sequence similarity. This method has been shown to produce maps that exhibit global order and continuity and reveal important evolutionary, structural, and functional relationships between the embedded proteins. However, protein sequences are related by evolutionary pathways that exhibit highly nonlinear geometry, which is invisible to classical embedding procedures such as multidimensional scaling (MDS) and nonlinear mapping (NLM). Here, we describe the use of stochastic proximity embedding (SPE) for producing Euclidean maps that preserve the intrinsic dimensionality and metric structure of the data. SPE extends previous approaches in two important ways: (1) It preserves only local relationships between closely related sequences, thus allowing the map to unfold and reveal its intrinsic dimension, and (2) it scales linearly with the number of sequences and therefore can be applied to very large protein families. The merits of the algorithm are illustrated using examples from the protein kinase and nuclear hormone receptor superfamilies.

Keywords: Stochastic proximity embedding; multidimensional scaling; nonlinear mapping; Sammon mapping; self-organizing; dimensionality reduction; protein kinase; nuclear hormone receptor; phylogenetic analysis


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2003 by The Protein Society.