Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kister, A. E.
Right arrow Articles by Gelfand, I. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kister, A. E.
Right arrow Articles by Gelfand, I. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
Protein Science (2001), 10:1801-1810.
Copyright © 2001 The Protein Society

The sequence determinants of cadherin molecules

Alexander E. Kister1, Michael A Roytberg2, Cyrus Chothia3, Jurii M. Vasiliev4 and Israel M. Gelfand1

1 Department of Mathematics, Rutgers University, Piscataway, New Jersey 08854, USA
2 Institute of Mathematical Problems of Biology, RAS, Pushchino, Moscow Region 142292, Russia
3 MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom
4 Oncological Scientific Center of Russia, Moscow 115478, Russia

Reprint requests to: Alexander E. Kister, Department of Mathematics, Rutgers University, Piscataway, NJ 08854, USA; e-mail: akister{at}math.rutgers.edu; fax: 732-445-55-30.

(RECEIVED August 29, 2000; FINAL REVISION January 23, 2001; ACCEPTED June 11, 2001)

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1101/ps.37001.


    Abstract
 TOP
 Abstract
 Introduction
 Methods and Results
 Discussion
 References
 
The sequence and structural analysis of cadherins allow us to find sequence determinants—a few positions in sequences whose residues are characteristic and specific for the structures of a given family. Comparison of the five extracellular domains of classic cadherins showed that they share the same sequence determinants despite only a nonsignificant sequence similarity between the N-terminal domain and other extracellular domains. This allowed us to predict secondary structures and propose three-dimensional structures for these domains that have not been structurally analyzed previously. A new method of assigning a sequence to its proper protein family is suggested: analysis of sequence determinants. The main advantage of this method is that it is not necessary to know all or almost all residues in a sequence as required for other traditional classification tools such as BLAST, FASTA, and HMM. Using the key positions only, that is, residues that serve as the sequence determinants, we found that all members of the classic cadherin family were unequivocally selected from among 80,000 examined proteins. In addition, we proposed a model for the secondary structure of the cytoplasmic domain of cadherins based on the principal relations between sequences and secondary structure multialignments. The patterns of the secondary structure of this domain can serve as the distinguishing characteristics of cadherins.

Keywords: Classic cadherins; cell adhesion molecules; method for protein family recognition; sequence comparison/classification


    Introduction
 TOP
 Abstract
 Introduction
 Methods and Results
 Discussion
 References
 
In the previous communications (Gelfand and Kister 1995, Gelfand and Kister 1997; Chothia et al. 1998; Galitsky et al. 1998, Galitsky et al. 1999), we described a new method of sequence-structural analysis of protein families. This method permitted us to find the set of a few key residues in a sequence that will constitute an amino acid pattern of a given family. In this article, we apply this approach to determine defining characteristics of the cadherin family.

Cadherins are a group of proteins essential for the formation of stable specialized cell–cell contacts, that is, adherent contacts in various tissues, and therefore for organization of these tissues and organs. Cadherins are found in many types of animals ranging from nematodes to humans. Humans and other vertebrate animals have several classes of cadherins, each class being characteristic for a group of tissues (Takeichi 1991,1995; Gumbliner 1996; Suzuki 1996; Gallin 1998; Shapiro and Colman 1999). For example, E-cadherins are specific for epithelial tissues, P-cadherins are found in placenta and other tissues, and N-cadherins are typical of neural and mesenchymal tissues.

The cadherin-like family comprises five subfamilies: classic cadherins types I and II, desmosomal cadherins, and protocadherins and cadherin-related proteins (Koch et al. 1999). In this work, we focus on the classic cadherins. The classic cadherins are transmembrane glycoproteins with five extracellular domains, a single membrane-spanning domain and a single cytoplasmic domain, which are linked to act in microfilaments via several linker proteins such as ß-catenin and {alpha}-catenin. Cell–cell contacts are formed by homophilic adhesion of external N-terminal domains of cadherin molecules on the surface of one cell with the corresponding domains of cadherin molecules on another cell. Cadherin adhesion is calcium dependent. Within the extracellular region of cadherins, Ca2+ ions bind between domains to produce a rigid link part. In the absence of calcium, these domains display excessive motions relative to one another and stable adhesions cannot be formed.

The goal of this work to find the sequence determinants: the residues that occupy the conserved positions in classic cadherins. To describe the sequence determinants, we extend here the methods of sequence and structural analysis that were developed in our previous works (Gelfand and Kister 1995; Chothia et al. 1998). We show here that the sequence determinants can serve as patterns of the classic cadherins. A new method of identification of proteins that is based on the pattern recognition in sequences was suggested. Using this method, we were able to distinguish sequences of the classic cadherins in the SWISS-PROT database.

The currently known structures for the first and the second domains show that they have the same overall immunoglobulin-like fold (Shapiro et al. 1995; Overduin et al. 1995; Nagar et al. 1996; Pertz et al.1999). However, three-dimensional structures of the third, fourth, and fifth domains are unknown. The multialignment of the sequences of all five domains revealed the common conserved positions for extracellular part of the classic cadherins. Discovering the common sequence determinants supports the idea that the all extracellular domains share the immunoglobulin-like structure with the N-terminal domain.

In the second part of this work, we show the possibility of predicting the secondary structure of proteins based on the results of the sequence multialignment. We focus on the analysis of cytoplasmic part of cadherins whose X-ray structures are unknown. We based our research on the results of the sequence multialignment of these sequences. In fact, the multialignment of sequences of a protein family that have no strong homology forces one to make insertion and deletions to make sequences align. As a rule, these gaps in sequences correspond to a beginning or end of the secondary structural units: strands, helices, or loops. On the basis of this observation and of the results of sequence multialignment of the cytoplasmic part, we propose a model for the secondary structures of the cytoplasmic domains of cadherins.


    Methods and Results
 TOP
 Abstract
 Introduction
 Methods and Results
 Discussion
 References
 
Classic cadherins: Extracellular domains
Secondary structural analysis of the first two domains
Three-dimensional structures have been determined for the N domains of murine neural cadherins (PDB files: 1NCG, 1NCH, 1NCI, 1NCJ, 2NCM; Shapiro et al.1995; Pertz et al. 1999) and for two domains of murine epithelial cadherins (PDB files: 1EDH, 1SUH, 3NCM; Overduin et al. 1995; Nagar et al. 1996; Jensen et al. 1999). Structural analysis revealed that sequences of these domains form sandwich-like structures with an immunoglobulin-like fold. Each domain consists of ~90–100 amino acids, which form seven ß-strands. According to the accepted classification of the immunoglobulin fold, the seven successive strands are termed A', B, C, D, E, F, and G, and the loops between them are named, respectively, A'B, BC, CD, DE, EF, EF', and FG (Chothia and Jones 1997). Strands B, E, and D make up one sheet, and strands A', C, E, and G make up another (Fig. 1Go).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1. Schematic representation of the strands in the N-terminal domain of 1NCI structure. A', B, C, D, E, F, and G strands form two ß-sheets (see text). Residues in the circles are shown with their number in the sequence. The dotted lines represent the hydrogen bonds between the main chain atoms.

 
On the basis of sequence alignments against the known structures, we determined secondary structures in the 37 classic cadherins from the SWISS-PROT database. They include the sequences of E-, N-, P-, R-, and other cadherins of various tissues and species, altogether 19 types of cadherins. The sequences of the N-terminal domains were divided into 15 fragments corresponding to the strands and loops (the loop between E and F strand is divided into two parts: EF and EF') and a linker that connects the domains (Table 1Go).


View this table:
[in this window]
[in a new window]
 
Table 1. The secondary structures of N-terminal domains of cadherins

 
Because the definition of secondary structure units usually is not very accurate, we strove to improve accuracy by performing a comprehensive multistep multialignment procedure that involved multialignment of structure superposition, as well as multialignment of residue–residue contacts, C{alpha} coordinates, H bonds, and accessibility values (for details, see Gelfand and Kister 1995). As long as multialignments performed in several ways gives the same results, we can be retroactively assured that the division of sequences into secondary structure units was essentially accurate. Nonetheless, it is clear that one cannot be absolutely sure where the border between two secondary structure units lies. We therefore separately studied such borderline regions for the presence of conserved positions. Our analysis shows that conserved positions rarely, if ever, are to be found at the very periphery of strands or loops (Gelfand and Kister 1997). It appears therefore that lack of absolute precision in secondary structure definition has very little effect on the final result.

To classify the conservation of residues, we collected from the various structures all the amino acid fragments that correspond to each of the strands or loops. Alignment was conducted separately for each set of amino acid fragments that describe a particular strand or loop. In our approach, the amino acid sequences of the aligned fragments are given the term "word" (Gelfand and Kister 1995). From this alignment, each residue in a sequence is assigned to a position in a word. Residues in sequences are referred to by an index that contains the letter code of the word and its position therein. For example, A'1 is the address of the first residue in the A' word. Describing residues with the two-part index gives us a common system of numbering for various cadherin sequences. It allows us to compare residue occupation in each position for various sequences and determine residue conservation at all positions.

Residue conservation: Patterns of strands and loops of the N-terminal domain
The first step toward defining characteristic patterns of cadherin strands and loops consists of the analysis of residue frequencies at all positions of words. This analysis reveals the nature and extent of residue conservation at each position. After the classification of residue conservation in immunoglobulins suggested in our previous article (Chothia et al. 1998), we divided residues into three groups: (1) V, L, I, M, A, F, W, and C; (2) R, K, E, D, Q, and N; and (3) P, H, Y, G, S, and T. This classification is based on two properties: hydrophobicity and the tendency to be on the surface or in the interior of a protein.

Inspection of residue frequencies showed that six positions are occupied by a single residue in almost all sequences, and 23 have only a few chemically similar residues from the same group (Table 2Go). For example, E residue is found at the position A'5 in all known cadherin sequences (Table 1Go). These 29 positions are considered to be the conserved positions. The other ~66 positions in sequences are variable. They can be occupied by residues from various groups.


View this table:
[in this window]
[in a new window]
 
Table 2. Patterns of the strands and EF' loop of the five extracellular domains

 
These data show that all words that describe the strands and EF' loop have several conserved positions. Residues at these positions constitute a pattern of the word. Analysis of, for example, the set of B words in the first domain (Table 1Go) shows that in all sequences position 3 and position 6 are occupied by hydrophobic and aromatic residues, which are assigned to group 1, and the polar and charged residues from group 2 were found at position 5 (Table 2Go). Thus, the residues at the conserved positions B3, B5, and B6 constitute the pattern of the word B in the Domain I (Table 2Go). As shown below, the patterns of words can serve as a useful tool for identifying cadherin sequences and for their structural predictions.

Secondary and three-dimensional structure prediction for five extracellular domains
For most molecules in the cadherin family, the three-dimensional structure is unknown. However, for these proteins it is possible to make secondary structure predictions for all extracellular domains based on the knowledge of the patterns of words in the first two domains. To determine secondary structures of cadherin chains in all domains, we have matched the patterns of the domains I and II with the sequences of the domains III, IV, and V. The result of this analysis showed that the patterns of the N-terminal domains fit with the sequences of all domains. It allowed us to divide the sequences of the domains III, IV, and V into the words. Because words describe secondary structural units, dividing a sequence of amino acids into words permits us to predict the secondary structure of a protein.

Because the alignment of cadherins was based on both sequence and structural information, it follows that residues at the identical positions of the words have the same structural role in various molecules. Analysis of the structural role of residues involves determining residue–residue interactions, residue exposure on the surface, and their coordinates in the system of coordinates unified for a given protein family. We can use for this preferred coordinate system, for example, the coordinate system of any of known structure of the cadherin molecule. Thus, it is possible to identify coordinates of residues for extracellular domains of all analyzed cadherins. We suppose that the C{alpha} atoms of the residues at the same positions of the words in various domains can be superimposed on each other.

Conserved positions in the strands and loops in five extracellular domains
Inspection of the sequences of different cadherins shows that the nature of residues and extent of conservation varies greatly at various positions. For example, comparison of the sequences of human E- and K-cadherins shows that in domain I ~32% of the residues are identical. Domains I and II of E-cadherins share only 25% identity. In comparison to the sequences of domains I and II the sequences of domains III, IV, and V show no significant similarity (<20%).

The alignment of the words allowed us to calculate the frequency of residues at every position in the words. Analysis of the residue frequency in the various domains showed that there are no positions that are occupied by a single type of residue in all domains. However, there are many positions where residue conservation was found in one or several domains but not in all five domains. For example, position A'5 is occupied by Glu in all sequences of domains I, II, and III, whereas in the sequences of domains IV and V Glu shares this position with Gln and Asp residues. Residues at the A'1 position are hydrophobic in all sequences of the first domain whereas in the second domain Gly and Ala are the most common residues. The D1 position can be considered as a conserved hydrophobic position in the first domain and conserved hydrophobic and aromatic position in domains II and IV, but a variable position in domains III and V. The residue conservation in the fifth domain differs in many cases from residue variations in the other domains.

The residues at the conserved positions for all strands and EF' loops in five extracellular domains are presented in Table 2Go. The comparison of the conserved positions in various domains revealed 15 extracellular conserved positions. All positions except one are occupied by hydrophobic residues in all five domains. The polar and charged residues are found at A'5 position.

Buried and surface positions in cadherins
The role of residues at each position was determined from the examination of their accessible surface areas. To give an overview of the positions of residues, we calculated the accessible surface area (ASA) of residues in three structures: domain 1 of N-cadherins and domains 1 and 2 of E-cadherins (Table 3Go). ASA are divided into 0, 1, 2, ..., 9 groups, where 0 indicates ASA in the range 0–9 Å2, 1 indicates 10–19 Å2, etc. Residues at 12 positions in all structures are buried in the protein interior (ASA are calculated in the range 0–2). Eight of these positions (A'3, B6, C3, D4, E3, EF''1, F3, F5) are hydrophobic and aromatic conserved positions at the center of the structure.


View this table:
[in this window]
[in a new window]
 
Table 3. Structural alignments of the sequences and the residue accessible surface areas of the cadherin domains

 
Method of attributing a protein to a protein family by using patterns
Discovering a small set of key residues that furnishes us with the amino acid patterns for the structural units in a the protein family allows us to develop a computer algorithm for classification of proteins.

To assign a query sequence to its proper protein family, we need to find a match between residues at positions in the query sequences and the residues in the patterns of the words of family members. In fact, we need not know residues at all positions in the query sequence. The advantage of our approach is that it allows one to find a few of the class-determining positions that uniquely determine a family. We developed a new approach for assigning a protein to a protein family, which we applied for identification of classic cadherins.

Algorithm
A sequence in a protein family can be defined in terms of an ordered set of patterns of words. For each pattern of a word the following are determined: (1) number of positions in a given word; (2) conserved positions and the various sets of residues that can occupy these positions; (3) interval (a possible range of residues) to the next word in the sequence.

In the search procedure, we matched the patterns of words with a query sequence. To check it, we implemented an algorithm based on appropriate modification of the dynamic programming. The algorithm of the method is the following: patterns of all or several secondary structural units are matched with a query sequence in consecutive order, starting from the first pattern. First, we pick out those sequences of the database that contain a fragment that fits one of the known basic patterns describing the first (A') fragment of cadherins. Then, we again search out the entire database, this time using patterns for B fragment as our query patterns, and selecting sequences containing one of the B patterns. We continue this procedure with patterns of other words.

Results of the analysis are formulated in the following way: how many words (more precisely: fragments describable by cadherin patterns) are found in a given sequence. If in a sequence in question fragments are found that match with patterns of all, or almost every, cadherin word, then that sequence is considered to belong to the cadherin family.

Results of the analysis of sequences in SWISS-PROT database
We used patterns of eight words (A', B, C, D, E, EF', F, and G) of the first domain of classic cadherins in the search procedure (Table 2Go). These patterns are presented in Table 2Go. The goal of this test is to show that these patterns are sufficient to identify the classic cadherins. We analyzed the sequences in SWISS-PROT (release 38 with 79,909 entries). The results of the analysis are presented in Table 4Go. Thirty sequences were found to contain all eight cadherin patterns, that is, there are eight fragments within these sequences that sequentially match with A', B, C, D, E, EF' F, and G patterns (the first row in the table). According to the description in SWISS-PROT, all of these proteins are classic cadherins.


View this table:
[in this window]
[in a new window]
 
Table 4. The numbers of sequences where cadherins' words are found
 
Six sequences were found to contain seven cadherin patterns, that is, one of the patterns of words was not found in the sequences (the second row in Table 4Go). For example, the analysis of the VE-CAD_M sequence (Table 1Go) showed that the patterns of seven words; all except B word match with the sequence. (No fragment corresponding to B word was observed because the position B6 is occupied by Q residue and does not match with the conserved hydrophobic position in the pattern of B word.) According to the description in SWISS-PROT, five of six found sequences are classic cadherins and one protein is a noncadherin protein. In row 3, it is shown that seven sequences match with the patterns of exactly six words. It was found that two of these sequences are classic cadherins. Analysis of the sequences where 5, 4, 3, 2, 1, and no cadherin words were found showed that all of these proteins are not classic cadherins.

In total, there are 43 sequences (30 + 6 + 7) in which the patterns of at least six words were found. Thirty-seven of these proteins are classic cadherins. Six other proteins in which at least six or seven patterns were found can be called false-positive. These proteins are identified in SWISS-PROT as desmogleins and desmocollins. They are not classic cadherins but belong to cadherin family. These proteins have sequence homology with classic cadherins. However, the patterns of the classic cadherins developed in this work mainly allow us to distinguish the classic cadherins from other cadherin-like proteins.

Thus, the result of a search of the cadherin sequences shows that patterns at least of six words allow us to find all classic cadherins in the database. It gives us a new tool for identifying of proteins. Thus, if the patterns of eight, seven, or six words are observed in a sequence in question, then there is a great probability that the sequence is a classic cadherin. Because in total there are 27 conserved positions in the patterns of eight words, we can classify a protein sequence if we know residues at no more than 27 conserved positions.

Comparison of secondary structural units with the results of sequence multialignment
The comparison of sequence and structural multialignment shows that the gaps (deletions and insertions) in the sequences are almost never found in the middle of the strands or helices but at the borders. This observation could help us to predict a secondary structure for proteins with unknown three-dimensional structure. Consider, for example, the sequence multialignment. We present the results of the multialignments for seven cadherin sequences of the I domains in Table 5Go. Sequence multialignment shows the sequences to be divided into 10 ungapped fragments. For example, there are two ungapped fragments at the beginning of E-cadherin of the xenla (E-CAD_X) sequence: VSENE (fragment 1) and KGPFP (fragment 2). In such manner the sequences were divided into 10 fragments (Table 5aGo).


View this table:
[in this window]
[in a new window]
 
Table 5. The comparison of the sequence and secondary structural multialignments

 
The comparison of the sequence multialignments with the secondary structures of these proteins obtained from the analysis of three-dimensional structures (Table 5bGo) shows that most fragments and secondary structural units coincide. In fact, in E-CAD_X sequence the fragment VSENE corresponds to the A' strand and KGPEP residues correspond to A'B loop (Table 5Go). This relationship – sequence ungapped fragment and secondary structural units, are observed for all fragments except fragments 7, 8, and 10. Fragment 7 corresponds to strand D and loop DE together and fragment 8 corresponds to strand E and loop EF', whereas fragment 10 involves FG loop and G strand. Thus, there is a strong relationship between sequence multialignments and the secondary structures of cadherins.

It is obvious that the greater the number of sequences we consider for multialignment, the greater the accuracy in predicting the secondary structure. The classic cadherins give us a good example of this. We have analyzed 37 sequences, involving 14 types of cadherins (Table 1Go). Thus, we propose that the results of sequence multialignment gives a reliable basis to predict secondary structure.

Sequence multialignment gives important information about three-dimensional structure as well. Residues of molecules that are aligned with each other have approximately the same structural characteristics, such as H bonds between main chain atoms, approximately the same residue–residue of contacts, or equal values of accessibility. This observation has been made in the analysis of proteins (see, e.g., Lesk et al. 1987)

Classic cadherins: Cytoplasmic part
In this part, we describe the result of our investigation of the cytoplasmic domain of cadherins. Currently, there is no structural information about the intracellular domains. We analyzed amino acid sequences of 36 cytoplasmic domains. They consist of ~120 amino acids. Because we have found the relationship between sequence and structural alignment for the extracellular domains, it is likely that the sequence alignment can give some information about secondary structures of the cytoplasmic domains. The mutialignment of 36 sequences resulted in 14 ungapped fragments. We can speculate that these fragments correspond to some extent to the helices or strands and loops in this part of cadherins.

The residue frequency was calculated at each position of the sequences. It was found that 71 of ~120 positions are occupied by only one residue or very similar residues in all or almost all sequences (Table 6Go). This observation shows that unlike the extracellular domain, the cytoplasmic part is characterized by a high degree of residue conservation. Twenty-four positions are occupied by hydrophobic and/or aromatic residues. The polar and charged amino acids are found in 26 positions, and hydrophilic and neutral residues are found in 21 positions. The conserved positions are mostly found near the N and C termini in sequences. Fragments 4 and 14 have the most conserved positions (13 and 18 positions, respectively), whereas the ungapped fragments in the middle of the cytoplasmic part (fragments 6, 7, 8, 9, and 10) have one conserved positions in each fragment. (Note that residues in fragment 4 are involved in binding with ß-catenin.)


View this table:
[in this window]
[in a new window]
 
Table 6. The most common residues in the cytoplasmic domain

 
On the basis of the analysis of the extent of conservation, we determined the amino acid patterns for each fragment (Table 6Go). We expected patterns of several long fragments to be characteristic of the cytoplasmic part. For example, there are 18 conserved positions in fragment 14. To test our suggestion that the pattern of a single fragment is sufficient for cadherin recognition, we used the pattern matching method that we developed for analysis of the extracellular part. The patterns of fragments 4, 5, 11, 12, 13, and 14 were matched separately with the sequences of the SWISS-PROT database. The results of the analysis showed that the pattern of just one fragment, either 4 or 12 or 14, can be used for identification of the cadherins (Table 7Go).


View this table:
[in this window]
[in a new window]
 
Table 7. Numbers of sequences where patterns of the fragments of the cytoplasmic part are found
 

    Discussion
 TOP
 Abstract
 Introduction
 Methods and Results
 Discussion
 References
 
To find reasonable criteria for classification of proteins into families, one needs to find invariant characteristics that are shared by all members of the family. Traditional tools for sequence classification use different methods of alignment: BLAST, FASTA, HMM, and others require one to know all or almost all residues in sequences (Smith and Waterman 1981; Eddy 1996; Pearson 1996; Altschul et al. 1997; Gusfield 1997). Another method used for dividing proteins into families in the Prosite database (Hofmann et al. 1999) identified specific sites of conserved regions in protein families.

We propose another approach for classification of protein families. An essential feature of the method is that it combines sequence and structural data. Putting together the results of the sequence and structural multialignments, we are able to give a description of the major structural units in a protein family. Patterns of strands and loops serve as defining characteristics of a protein family. In this work, we applied this method to one particular protein family, cadherins. The results of this analysis showed that, in fact, on the basis of defining characteristics one could unequivocally select all members of the cadherin family from ~80,000 proteins. Qualitatively specific patterns are characteristics of both the extracellular and the cytoplasmic domains. We can use independently the patterns of any of these parts. Notably, the sequence of the cytoplasmic tail is especially specific: the pattern of one unit is sufficient to determine a family. In contrast, patterns of transmembrane parts cannot assign proteins to a proper family, because they were found in >2000 proteins. These results confirm that defining patterns can be successfully used for reliable assignment of proteins to a proper protein family. We plan to expand the investigation of defining characteristics of protein families of the ß fold.

In this work, we found that the gaps in sequences of cadherins obtained as the result of insertions and deletions in the sequence multialignment divide the sequences into the structural units (strands and loops). Thus, sequence multialignments may give us a clue about secondary structure. The assignment of sequence units to a secondary structure has, however, some limitations. The multialignment of sequences with homology results in long ungapped fragments that include several structural units. To obtain a more reliable secondary structural assignment in the protein family, we need to use as many diverse sequences as possible. In our further analysis of other protein families, we plan to test the hypothesis about relationship between the sequence and structural alignments.


    Acknowledgments
 
Authors are grateful to Dr. P. Ehrlich for critical review of the manuscript. The authors acknowledge the assistance of L. Pogost in performing computer calculations. We acknowledge with deep gratitude the support of the Gabriella and Paul Rosenbaum Foundation and also thank M. Goldman for continuous encouragement. A.E.K. is supported by the Gabriella and Paul Rosenbaum Foundation. M.A.R. was supported by grants from Russian Fund of Basic Research (Grants 00-04-48246 and 00-07-90037), Russian State Scientific Program Human Genome (Grant 1gc/00), INTAS (Grant 99-1476), and Merck Genome Research Institute (Grant 244).

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.


    References
 TOP
 Abstract
 Introduction
 Methods and Results
 Discussion
 References
 
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: 3389–3402.[Abstract/Free Full Text]

Chothia, C. and Jones, E.Y. 1997. The molecular structure of cell adhesion molecules. Annu. Rev. Biochem. 66: 823–862.[CrossRef][Medline]

Chothia, C., Gelfand, I.M., and Kister, A.E 1998. Structural determinants in the sequences of immunoglobulin variable domain. J. Mol. Biol. 278: 457–479.[CrossRef][Medline]

Galitsky, B., Gelfand, I.M., and Kister, A.E. 1998. Predicting amino acids sequences of antibody human VH chains from its first several residues. Proc. Natl. Acad. Sci. 95: 5193–5198.[Abstract/Free Full Text]

———. 1999. Class-defining characteristics in the mouse heavy chains of variable domains. Protein Eng. 12: 101–107.[Abstract/Free Full Text]

Gallin, W.J. 1998. Evolution of the classical cadherin family of cell adhesion molecules in vertebrates. Mol. Biol. Evol. 15: 1099–1107.[Abstract]

Gelfand, I.M. and Kister, A.E., 1995. Analysis of the relation between the sequence and secondary and three dimensional structures of immunoglobulin molecules. Proc. Natl. Acad. Sci. 92: 10884–10888.[Abstract/Free Full Text]

———. 1997. A very limited number of keywords main patterns) describes all sequences of the human variable heavy (VH) and {kappa} (V{kappa}) domains. Proc. Natl. Acad. Sci. 94: 12562–12567.[Abstract/Free Full Text]

Gumbliner, B.M. 1996. Cell adhesion: The molecular basis of tissue architecture and morphonegenesis. Cell 84: 345–357.[CrossRef][Medline]

Gusfield, D. 1997. Algorithms on strings, trees and sequences: Computer science and computational biology. Cambridge University Press, New York.

Eddy, S.R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6: 361–365[CrossRef][Medline]

Hill, E., Broadbent, I., Chothia, C., and Peltitt, J. 2001. Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J. Mol. Biol. 305: 1011–1024.

Hofmann, K., Bucher, P., Falquet, L., and Bairoch, A. 1999. The PROSITE database, its status in 1999. Nucleic Acids Res. 27: 215–219.[Abstract/Free Full Text]

Jensen, P.H., Soroka, V., Thomsen, N.K., Ralets, I., Berezin, V., Bock, E., and Poulsen, F.M. 1999. Structure and interactions of Ncam modules 1 and 2, basic elements in neural cell adhesion. Nat. Struct. Biol. 6: 486–493.[CrossRef][Medline]

Koch, A.W., Bozic, D., Pertz, O., and Engel, J. 1999. Homophilic adhesion by cadherins. Curr. Opin. Struct. Biol. 9: 275–281.[CrossRef][Medline]

Lesk, A.M., Levitt, M., and Chothia, C. 1987. Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Protein Eng. 1: 77–78.[Free Full Text]

Nagar, O., M., Ikura, M., and Rinl, J.M. 1996. Structural basis of calcium-induced E-cadherin rigidification and dimerization. Nature 380: 360–364.[CrossRef][Medline]

Overduin, M., Harvey, T.S., Bagby, S., Tong, K.L., Yau, P., Takeishi, M., and Ikura, M. 1995. Solution structure of the epithelial cadherin domain responsible for selective cell adhesion. Science 267: 386–389.[Abstract/Free Full Text]

Pearson, W.R. 1996. Effective protein sequence comparison. Methods Enzymol. 266: 227–258.[Medline]

Pertz, O., Bozic, D., Koch, A.W., Fauser, C., Brancaccio, A., and Engel, J. 1999. A new crystal structure, Ca2+ dependence and mutational analysis reveal molecular details of E-cadherin homoassociation. EMBO J. 18: 1738–1747.[CrossRef][Medline]

Takeichi, M. 1991. Cadherin cell adhesion receptors as a morphogenetic regulator. Science 251: 1451–1455.[Abstract/Free Full Text]

Takeichi, M. 1995. Morphogenetic roles of classic cadherins. Current Opin. Cell Biol. 7: 619–627.[CrossRef][Medline]

Shapiro, L. and Colman, D.R. 1999. The diversity of cadherins and implications for a synaptic adhesive code in the CNS. Neuron 23: 427–430.[CrossRef][Medline]

Shapiro, L., Fannon, A.M., Kwong, P.D., Thompson, A., Lehmann, M.S., Grubel, G., Legrand, J-F., Als-Nielsen, J., Colman, D.R., and Hendrickson, W.A. 1995. Structural basis of cell-cell adhesion by cadherins. Nature 374: 327–336.[CrossRef][Medline]

Smith T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147: 195–197.[CrossRef][Medline]

Suzuki, S.T. 1996. Structural and functional diversity of cadherin superfamily: Are new members of cadherin superfamily involved in signal transduction pathway? J. Cell. Biochem.. 61: 531–542.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
A. E. Kister, A. V. Finkelstein, and I. M. Gelfand
Common features in structures and sequences of sandwich-like proteins
PNAS, October 29, 2002; 99(22): 14137 - 14141.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
B. Reva, A. Kister, S. Topiol, and I. Gelfand
Determining the roles of different chain fragments in recognition of immunoglobulin fold
Protein Eng. Des. Sel., January 1, 2002; 15(1): 13 - 19.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kister, A. E.
Right arrow Articles by Gelfand, I. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kister, A. E.
Right arrow Articles by Gelfand, I. M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS