|
|
||||||||
Department of Molecular Genetics and Biotechnology, The Hebrew UniversityHadassah Medical School, Jerusalem 91120, Israel
Reprint requests to: Hanah Margalit, Department of Molecular Genetics and Biotechnology, The Hebrew UniversityHadassah Medical School, POB 12272 Jerusalem 91120, Israel; e-mail: hanah{at}md2.huji.ac.il; fax: 972-2-6784010.
(RECEIVED May 23, 2001; FINAL REVISION November 2, 2001; ACCEPTED November 5, 2001)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.18602
| Abstract |
|---|
|
|
|---|
Keywords: Molecular evolution; sequence conservation; protein structure; protein folding; bioinformatics
Abbreviations: PC, persistently conserved MPC, mutually persistently conserved SSSD, structurally similar sequence dissimilar
| Introduction |
|---|
|
|
|---|
Recent computational studies have reached similar conclusions. Mirny and colleagues (Mirny et al. 1998; Mirny and Shakhnovich 1999) and Ptitsyn and Ting (1999) compared the sequences within large protein superfamilies of solved structures to search for positions that conserve a certain type of residues (hydrophobic, charged, etc.). The comparison was done within each family that is contained in the superfamily, and conserved positions were examined across the families within a superfamily. Thus, a position can be conserved across the superfamily but contain different types of residues in the different families. Both studies have identified only a small number of such positions that were also spatially close, and suggested that they form folding nuclei. Stabilization centers that are formed by long-range interactions also were suggested by Dosztányi et al. (1997), who demonstrated their conservation within the corresponding protein families. Similar findings also were described in computational design protocols, where compatible sequences have been sought for a given structural template (Koehl and Levitt 1999), and in lattice models (Mirny et al. 1998). The designed sequences showed high conservation in a relatively small number of specific positions. The potential role of key residues in structure determination was also demonstrated by the Gaussian network model, by which residues involved in the highest frequency fluctuations near the native state coordinates were identified (Demirel et al. 1998). Recently, Reddy et al. (2001) have analyzed common substructures that recur in the protein data bank (PDB), and identified in them conserved key amino acids positions (CKAAPs). All these observations show that a very small number of residues play a key role in fold determination, enabling different sequences that contain these key residues at appropriate positions to assume similar folds. These can function either as folding nuclei (Mirny and Shakhnovich 1999, 2001), establish the molecule's active site (e.g., Reddy et al. 2001), or play key roles in critical positions of secondary structure elements (Reddy et al. 2001). In this study, we use a novel automated technique on a global, stringently constructed structural database to identify and characterize such significant positions. A global analysis of proteins from many different folds enables us to draw general conclusions regarding soluble proteins whose structures have been solved. Stringency is applied at the level of database construction and at the level of positional conservation analysis, enabling us to separate those positions that are conserved as a result of nondivergence from a common ancestor, from those that are conserved for structural/functional reasons.
The database that is used in the current analysis is of pairs of proteins that share a common fold but are dissimilar in sequence (12% identical residues on the average). By aligning the two structures of paired proteins, structurally aligned positions are determined. These positions may possess identical residues in the two proteins or different residues. One way to distinguish structurally aligned positions that may be critical for the structure and/or function of the proteins in a pair is to follow their pattern of conservation in their respective protein sequence families. Positions that contain residues that are conserved in both protein families are denoted mutually conserved. Because the two families of the structurally similar, sequence-dissimilar (SSSD) aligned proteins are so remote, mutually conserved positions, by virtue of their intra- and interfamily conservation, may be important for structure or function. More specific information can be obtained by focusing on those mutually conserved positions that are retained conserved as the multiple alignment of family members is expanded to include members that are more and more remote. Positions that are conserved among close family members and remain conserved when remote members are added to the alignment, add further support to the putative role of these positions in structure and function determination. On the one hand, examination of conservation among close members may lead to the inclusion of positions that have possibly not yet diverged. On the other hand, examination of conservation among distant family members may be erroneous because of possible arbitrary drift in the aligned sequences (Park et al. 1998; Friedberg et al. 2000b). Therefore, examination of conservation among both close and distant members may pinpoint the critical residues. Such positions are denoted here "mutually persistently conserved" (MPC), and the current study focuses on them. We show that many of these residues are located in positions that play an important role in secondary structure determination, that they are buried in their protein structures, and that many of them form spatial clusters. Moreover, the substitution matrix derived from MPC positions has high relative entropy, indicating that MPC replacements are limited to only certain types of residues.
| Results and Discussion |
|---|
|
|
|---|
To identify MPC positions, the persistently conserved positions in each protein family need first to be determined. As noted in the introduction, we wish to identify positions where the conservation is maintained as the alignment is advanced and more remote family members are added. To do that, we need to generate multiple sequence alignments containing close and remote family members. This can be achieved by using PSI-BLAST that identifies remote homologs by an iterative process (Altschul et al. 1997).
Figure 1
depicts a flowchart of the analysis. Each sequence in the database was run through PSI-BLAST for five iterations, or until convergence. For each sequence, per iteration, the conservation at each position was evaluated by calculating the information content (IC) (see Materials and Methods). Note that the IC of each position is determined by using relative entropy (see Materials and Methods). This means that the conservation of a given amino-acid type in a given position is evaluated relative to its background frequency in the entire database. There exists a certain ambiguity in the use of the term "conservation" when applied to scores assigned to positions in a multiple sequence alignment. Most studies use some variant of the Shannon entropy (IC) formula to assess positional conservation. According to the question at hand, some studies use the information-content formula with scores normalized according to background frequencies, whereas others do not. The use or omission of priors in the computation should depend on the question asked. Here we were interested in determining in the SSSD protein pairs positions that show distinct amino acids and therefore took into account the background frequencies.
|
Out of 118 protein pairs in our database, 93 pairs had more than a single PSI-BLAST iteration for both pair mates. Seventy-four percent of the positions identified as conserved at the first iteration were persistently conserved. Among all persistently conserved positions, 45% show mutual conservation, while 55% show persistent conservation only in one pair mate. It is evident from the above statistics that the application of the two requirements of persistency and mutuality of conservation directs us to a strictly defined subset of residues. These positions, together with the mutually conserved positions in protein families that converged after the first iteration, make up the MPC positions, 2603 in total. This result is highly statistically significant (p<.0001; see Materials and Methods). Eight hundred thirty-eight of these positions (32%) are occupied by identical residues in the two pair mates, and 1765 positions (68%) show different residues. These proportions deviate significantly from what is observed for all aligned positions in the database, where only 12% (1962/15,566) of the aligned positions are occupied by identical residues and 88% (13,604/15,566) show different residues. The total number of residues in the 2603 MPC positions added up to 3701 (and not to 2603x2, as some proteins appeared in more than one pair and in quite a few cases exhibited the same MPC positions). We assume that the MPC positions were maintained persistently conserved in a corresponding manner in the two remote protein families because they play important roles in structure and/or function determination, and we turn to find out what these roles might be.
Over-represented amino acid residues in mutually persistently conserved positions
Comparison of the amino-acid frequency distribution in MPC positions with their frequency distribution in all positions in the data revealed a significant difference (p
.01 by a
2 test). By applying a
2 test to the individual amino acids we could point out the amino acids that contributed mostly to the significant deviation, and to identify those residues that were significantly over-represented (or under-represented) in MPC positions. As illustrated in Figure 2
, aspartic acid, isoleucine, glycine, proline, histidine, cysteine, tryptophan, phenylalanine, and tyrosine were found to be significantly over-represented in MPC positions in comparison to their background frequencies. In many cases, those residues were maintained unchanged in the structurally aligned positions of the two pair mates. Previously, we have shown that conserved identical residues in aligned positions have distinct roles, mainly in or near the active sites of the proteins (Friedberg et al. 2000a).
|
|
|
.01). For comparison, the relative entropy of the structurally derived matrix is 0.17 bits. Generally, when a substitution matrix is derived from a multiple sequence alignment, the relative entropy of the matrix decreases as the evolutionary distances among the sequences increase. This is because the observed and background distributions draw closer as evolutionary distance increases. Interestingly, although the MPC-derived matrix is constructed from protein pairs with no detectable sequence similarity, its relative entropy value is quite high, correlating with those matrices incorporating close homologs, such as BLOSUM 85. However, the high relative entropy values in matrices incorporating close homologs is because of the fact that they have not yet diverged, and the rate of substitutions among them is quite low. By definition, the majority of substitutions between similar sequences are synonymous. In the MPC-derived matrix, the same high frequency of synonymy prevails, but for a completely different reason: MPC positions will tend to be synonymous by virtue of their irreplaceability, and not because they have not yet diverged in evolution. The MPC-derived matrix therefore combines two components of vastly different types of substitution matrices: on the one hand, it is a matrix derived from the alignments of proteins that are remote in sequence. On the other hand, the actual data from which this matrix is derived is that of structurally aligned positions that are extremely well-conserved. We propose that by determining the MPC positions, we include in the matrix only those positions that, although substituted between distant proteins, still maintain a necessary role in their respective structures. Inspection of the MPC-derived matrix (Fig. 3a
Mutually persistently conserved positions in secondary structure elements
One role key positions may have in structure determination is in the stabilization of secondary structures. Extensive work has been done in determining the preferences of certain residue types in
helices and their role in determining helix structure, including helix initiation and termination (Aurora and Rose 1998; Kumar and Bansal 1998). Here, we investigate the presence of MPC residues in specific positions along secondary-structure elements, both in
helices and ß strands. The MPC frequencies at each position in the vicinity of the termini of secondary structure elements and their flanking regions were determined and compared to the frequencies expected at random (see Materials and Methods). Figure 5
shows the comparison between observed and expected frequencies of MPC positions along the positions of the secondary-structure elements, expressed by their log-odds values. As demonstrated in Figure 5a
, there is a clear preference for MPC positions to be present at the flanking regions of
helices, both at their N and C termini. These tendencies were found to be highly statistically significant by a
2 test, especially in positions N"`, N`, and C" (p
.01). Thus, these residues probably play a role in the helix initiation and termination. Notably, among the MPC positions at N` and N", there is over-representation of amino-acid residues with hydrogen bond acceptors in their side chain, consistent with a possible role in determination of the N-terminus of the helix. Similarly, the MPC positions at position C` show over-representation of amino-acid residues with a hydrogen-bond donor in their side chains, consistent with the stabilization of the C-terminus of the helix, which is relatively negatively charged. Specific positioning of MPC residues in ß strands also is observed, although less prominently (Fig. 5b
). It was found that MPC residues were preferred at the terminal position of the ß strand and in its vicinity (p
.01). Thus, we have demonstrated that one of the roles MPC residues may have is in the determination and stabilization of secondary structure elements along the protein sequence. A similar observation was reported by Reddy et al. (2001) for residues in CKAAPs.
|
0.02). This suggests that MPC positions tend to be located in the protein's interior, lending further support to their possible role as maintainers of structure/function.
|
At the end of this procedure, each protein pair is assigned a quantitative value that depicts the structural proximity of its MPC positions. To assess the significance of this spatial proximity score, we performed a Monte-Carlo run under the null hypothesis that the spatial proximity score for MPC positions is no better than that expected at random. Given Np MPC positions in a protein pair, we selected from the aligned proteins Np aligned positions at random, and derived their proximity score, as done for the MPC positions. This process was repeated 500 times for each protein pair. If less than 25 Monte-Carlo runs had a score greater than the MPC score (p<.05), then the null hypothesis was rejected, and the spatial proximity of the MPC positions was determined as statistically significant. Out of the 118 protein pairs, 69 pairs were found to have MPC positions whose spatial proximity was better than that expected at random. This suggests that a fraction of the MPC positions form spatial clusters of interacting residues that may have a functional or structural role. Thus, an additional role of these residues may be in establishing folding nuclei and/or special substructures associated with the functional sites.
A case study
For a close impression of the possible roles of MPC positions, we look at one example of a protein pair in our database and its MPC constituents: Lipase B from Candida antarctica (CALB; PDB entry 1tca [Uppenberg et al. 1994]) and haloalkane dehalogenase from Xanthobacter autotrophicus (XADL; PDB entry 1ede [Verschueren et al. 1993]), two enzymes that show a very high structural similarity, yet no detectable sequence similarity. CALB belongs to the lipase family, a diverse group of enzymes that hydrolyze triglycerides at lipid-water interfaces, and which all have a catalytic triad similar to the one found in serine proteases (Ser-His-Asp/Glu). XADL is a haloalkane dehalogenase, which converts 1-haloalkanes into primary alcohols and a halide ion by hydrolytic cleavage of the carbon-halogen bond. CALB and XADL belong to the SCOP superfamily of
/ß hydrolases (Murzin et al. 1995), although they are classified in different families, fungal lipases and haloalkane dehalogenases, respectively. In CATH, CALB and XADL are classified in the same homology group despite their sequence dissimilarity, as a result of the high structural similarity of the two structures (Orengo et al. 1997). XADL is of length 310 amino acids and CALB is of length 317 amino acids, and there are 39 positions determined as MPC along the structural alignment. Inspection of the locations of these residues along the structures of the proteins shows the same patterns that were revealed for the whole database: many of the MPC positions are located at the termini of secondary structures, and a relatively large fraction are located in turns, thus their importance in the structure determination is by maintaining the turns that are critical for the overall fold of the molecules. Most interesting are those residues that form spatial clusters. Here, we discuss in detail one such cluster in the two proteins, which is related to the active site (Table 1
).
|
/ß hydrolases. Indeed, its ZIC score is quite high: 3.47 in XADL and 4.65 in CALB. A side chain at this position would create a close contact between its Cß and the Cß of the active site position XADL:D124/CALB:S105 and possibly disturb interactions. Therefore, glycine is mostly suitable at this position. Thus, we have seen that spatially close MPC residues have clearly been maintained for a reason in this example for maintenance and stabilization of the structure of the active site.
Concluding remarks
Evolutionary information has been used in various studies to identify residues that may be important for structure and function determination (Mirny and Shakhnovich 1999, 2001; Ptitsyn and Ting 1999; Reddy et al. 2001). In most of these studies, the identification of candidate residues was obtained based on structural and sequence information, using different data sets for the structural and sequence analysis, and different approaches to estimate conservation of a suspected residue. It is mostly important that interpretation of the results should be done in view of the different parameters used in the analysis. In the current study, the data set of protein sequences used is constructed in a very stringent fashion. It includes pairs of similar-structure proteins that exhibit only 12% of identical residues on the average. The positions that are candidates for maintaining important structural/functional characteristics are structurally aligned positions that show residue conservation in their respective protein sequences. It is important to note that conservation is evaluated by calculating the information content of a position, considering the 20 amino acids without clustering and taking into account the background frequencies of the amino acids in the data. Positions with information content above the average of all the sequence positions are determined as conserved. Furthermore, only positions that were found to be conserved in both close and remote family members of the two corresponding protein families are considered in the analysis.
These definitions direct us to a specific subset of residues (MPC) that are shown to be relevant for structure/function determination. Among these residues stand out the aromatic residues, and cysteine, glycine, and proline as appearing above their background frequencies. The bulky aromatic residues play important roles in the packaging of the protein, while the cysteines are maintained conserved for preserving the disulfide bridges, and proline and glycine are located in critical structural positions, most often flanking secondary structures or near active sites. Notably, we do not identify as standing out the aliphatic residues that are frequently found in the protein hydrophobic cores and are important for protein stabilization. Because these residues are interchangeable among themselves and are highly abundant in the data, they cannot be singled out by our procedure that uses a 20-letter alphabet for the amino acids and considers their background frequencies in the information content calculation. Thus, the analysis directs us to different types of residues and structural roles. The residues that we have identified are mostly buried within their protein structures, but only in 70% of the proteins they form spatial clusters more than expected at random. In many of these cases, these clusters are found to be related to the active site of the protein. Thus, residues in MPC positions are important for establishing and maintaining the substructure around the active site. One very distinct feature of the identified MPC positions is their location in the termini of secondary structure elements. Thus, it is important to conserve certain types of residues in these positions to maintain secondary structure elements through evolution.
To obtain the MPC positions, we perform in each family a strict conservation analysis, but we do not require that the same residues be maintained in both proteins. This enables us to capture the interchangeability between MPC positions by analyzing the pairs of residues found in MPC positions of the two structurally aligned proteins. The substitution matrix obtained is very informative (H = 1.015) and defines the restrictions of allowable substitutions in these critical positions. We observe interchangeability only within groups of same-character amino acids, i.e., within the aliphatic, hydroxyl-containing aliphatic, aromatic, and charged amino acids. Interestingly, some interchanges between positively and negatively charged residues are observed, consistent with other structurally derived matrices (Naor et al. 1996; Blake and Cohen 2001). From a predictive perspective, such a matrix provides valuable information regarding the allowable substitutions for maintaining a desired structure. When comparing two multiple alignments of protein families, it may be used to identify those persistently conserved positions that should be aligned as mutually conserved, and serve as the anchor positions for homology modeling.
| Materials and methods |
|---|
|
|
|---|
-C
distance matrices. SSAP aligns proteins as a set of Cß vectors. We found that
80% of the paired positions were coaligned when either alignment was used as a standard.
For generation of the SSSD database used in this study, the DAPS database was filtered using the following criteria (Friedberg et al. 2000a): minimal protein length of 30 residues for both pair members; resolution better than 3.5Å for each pair member; difference in lengths within a protein pair does not exceed 50% of the shorter member; the alignment length is at least 60% of the longer member's length; and the sequences of the pair members should not be well aligned using sequence alignment methods. A good sequence alignment, regardless of compatibility with the FSSP structural alignment, denotes a sequence similarity that we wish to avoid. Each pair was checked for similarity using the BESTFIT program from the GCG package (version 10, Genetics Computing Group), an implementation of the Smith-Waterman algorithm (Smith and Waterman 1981). Statistical significance was evaluated by comparing the actual alignment score to a sample of random scores obtained by alignment of one sequence to shuffled sequences with the same amino-acid composition as the second sequence. Sequence pairs with alignment scores that deviated more than six standard deviations from the average random score were excluded (Z
6). The average Z score of the sequence pairs in our data was found to be 1.16 with a standard deviation of 1.63. Therefore, we can say that in SSSD, no proteins within a pair are similar. As a population, they are dissimilar enough to be considered unrelated by sequence.
The SSSD database is available at http://bioinfo.md.huji.ac.il/marg/SSSD/.
Assessment of conservation
Generally, per any given PSI-BLAST iteration, the information content for a single position j in a multiple sequence alignment (IC(j)) would be:
![]() |
Upon obtaining the IC(j) for a given position in a given iteration, the normalized value ZIC(j) was calculated by
![]() |
Statistical significance of number of mutually persistently conserved positions found
To evaluate if the observed number of MPC positions deviates significantly from that expected at random, we applied a normal approximation to a binomial test. The null hypothesis is that the fraction of MPC positions observed is the same as that expected at random. The expected number of MPC positions is calculated as follows: as there is a differential conservation in buried locations compared with exposed locations, we partitioned all positions according to their solvent exposure. Fifty-six percent of the positions were found to be buried (<30% solvent exposure) and 44% exposed. The fraction of PCs in buried and exposed positions is 0.3784 and 0.1914, respectively. The fraction of MPC positions expected at random is therefore 0.56 x 0.37842 + 0.44 x 0.19142 = 0.096. By using a normal approximation to a binomial test, we show that the deviation between observed and expected at random is highly statistically significant (Z = 30; p<.0001).
Residue distribution in secondary structure elements
Analysis of residue distribution in secondary structure elements was carried out as follows: helix and strand locations were determined using DSSP (Kabsch and Sander 1983). Helices or strands whose lengths were less than seven residues were discarded. Each MPC position was assigned in a secondary structure position, or a flanking region. We have named the positions as in Aurora and Rose (1998): the order of N4`, N"`,N",N`,Ncap,N1,N2,N3, N4. . .C4,C3,C2,C1,Ccap. . .C4` for flanking and in-element positions is given. The flanking regions are marked with apostrophes, the in-element residues with digits, and the initial and terminal (capping) residues with a "c."
We aligned all helices by the determined positions and calculated the relative occurrence of MPC residues in each position. The same was done for ß strands. The occurrence of MPC residues in a position was expressed as log(Nj/Ej), where Nj is the actual number of MPC residues at position j, and Ej is the expected number of MPC residues, based on the fraction of MPC residues in the data.
Solvent accessibility
Solvent accessibility (SA) values in Å2, were taken from the FSSP database. For each residue, these were divided by the accessible surface area of the extended conformation of that residue (Miller et al. 1987) and expressed in percentages. The analysis was carried out both by using these values and by clustering the residues into two solvent-accessibility categories: buried (SA<30%) and exposed (SA
30%).
Assessing spatial proximity of mutually persistently conserved positions
As described in the Results section, we assess the spatial proximity of MPC positions using a graph representation of the residues in the protein. A quantitative measure of the spatial proximity of residues in an MPC subgraph would be the number of edges in it compared to the number of vertices. However, because we compare the actual measure to those obtained by a Monte-Carlo procedure that uses the same number of vertices, the constant number of vertices is canceled out. In addition, instead of just counting the number of edges, the spatial proximity is better represented by weighting the edges according to the probability of having contacting residues within that sequence distance in the particular fold examined. Generally, an edge drawn between contacting residues distant in sequence receives a higher weight than an edge drawn between contacting residues that are close in sequence. However, because of the different folds of different proteins, the weighting function should not be universal. Therefore, it was constructed according to the contact map of each chain. For example, upon examining a particular fold, it might be shown that residues within a sequence distance of 50 positions have a higher probability of being in contact than residues within a distance of 40 or 60 positions. This phenomenon is a result of regularity in the protein's tertiary structure, and will vary between different fold patterns. Thus, for computing the edge weights, the frequency of contacting residues with the same sequence separation was taken into account.
The calculation in detail is shown below. Given two structurally aligned proteins A and B:
For protein A build a vector A = <a2,. . .,an-2>, ak being the number of contacting residue pairs that are k residues distant in sequence, and n is the chain length (2
k
n-2). Contacting residues are those with a distance
7.0Å between ß-carbons. This process is repeated for the second protein in the alignment, generating B = <b2,. . .,bm-2> (m being the length of the second protein sequence).
Determine the probability for two residues separated by a sequence distance of k positions to be in contact:
![]() |
![]() |
![]() |
Ai and Aj are two contacting residues in protein A. They are aligned to Bk and Bl respectively, which are also contacting. Therefore, one vertex would be [Ai,Bk] and the other would be [Aj,Bl].
The edge weight between the vertices [Ai,Bk] and [Aj,Bl] would be
![]() |
Finally, assessment of the spatial proximity of the MPC positions is performed using a Monte-Carlo procedure. For each protein pair we repeat the above analysis with randomly picked aligned positions. The number of those positions is the same as the number of vertices in the MPC graph. The randomization is repeated 500 times. If <25 randomization scores (5% of 500) have a better spatial proximity score than the MPC score, the result is considered significant.
Generation of log-odds matrices
Generation of the matrix derived from mutually persistently conserved positions
All the aligned positions that were determined to be MPC were tallied. For each two residues Ai and Aj (1
i
j
20), we count the number of times that they appear as aligned in MPC positions. This provides the number of substitutions between Ai and Aj. A substitution matrix was derived as described in (Naor et al. 1996). The values that appear in the matrix in Figure 3
were obtained by
![]() |
Generation of the structurally derived matrix
For the structurally derived matrix, all the aligned positions in the SSSD protein pairs were tallied, and the matrix was derived as described above.
Comparing frequency distributions by the Jensen-Shannon divergence
The Jensen-Shannon (JS) divergence of two distributions p1 and p2 is defined as in (Lin 1991): JS = H(
1p1+
2p2)
1H(p1)
2H(p2), where H(pi) is the entropy of distribution pi, and
i is the weight given to that distribution.
1,
2>0 and
1+
2 = 1. We used JS divergence with
1 =
2 = 0.5 to compare between the observed amino-acid pair frequency distributions in the BLOSUM matrices and the MPC and structurally derived matrices.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Aurora, R. and Rose, G.D. 1998. Helix capping. Protein Sci. 7: 2138.[Abstract]
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The protein data bank. Nucleic Acids Res. 28: 235242.
Blake, J.D. and Cohen, F.E. 2001. Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307: 721735.[CrossRef][Medline]
Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A., and Sauer, R.T. 1990. Deciphering the message in protein sequences: Tolerance to amino acid substitutions. Science 247: 13061310.
Brenner, S.E. and Levitt, M. 2000. Expectations from structural genomics. Protein Sci. 9: 197200.[Abstract]
Demirel, M.C., Atilgan, A.R., Jernigan, R.L., Erman, B., and Bahar, I. 1998. Identification of kinetically hot residues in proteins. Protein Sci. 7: 25222532.[Abstract]
Dosztanyi, Z., Fiser, A., and Simon, I. 1997. Stabilization centers in proteins: Identification, characterization and predictions. J. Mol. Biol. 272: 597612.[CrossRef][Medline]
Friedberg, I., Kaplan, T., and Margalit, H. 2000a. Glimmers in the midnight zone: Characterization of aligned identical residues in sequence-dissimilar proteins sharing a common fold. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8: 162170.[Medline]
Friedberg, I., Kaplan, T., and Margalit, H. 2000b. Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci. 9: 22782284.[Abstract]
Henikoff, S. and Henikoff, J.G. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89: 1091510919.
Hobohm, U. and Sander, C. 1994. Enlarged representative set of protein structures. Protein Sci. 3: 522524.[Abstract]
Holm, L. and Sander, C. 1996. The FSSP database: Fold classification based on structure-structure alignment of proteins. Nucleic Acids Res. 24: 206209.
Jaroszewski, L. and Godzik, A. 2000. Search for a new description of protein topology and local structure. Proc. Int. Conf. Intell. Syst. Mol. Biol. 8: 211217.[Medline]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Kannan, N. and Vishveshwara, S. 1999. Identification of side-chain clusters in protein structures by a graph spectral method. J. Mol. Biol. 292: 441464.[CrossRef][Medline]
Kennes, C., Pries, F., Krooshof, G.H., Bokma, E., Kingma, J., and Janssen, D.B. 1995. Replacement of tryptophan residues in haloalkane dehalogenase reduces halide binding and catalytic activity. Eur. J. Biochem. 228: 403407.[Medline]
Koehl, P. and Levitt, M., 1999. Structure-based conformational preferences of amino acids. Proc. Natl. Acad. Sci. 96: 1252412529.
Koppensteiner, W.A., Lackner, P., Wiederstein, M., and Sippl, M.J. 2000. Characterization of novel proteins based on known protein structures. J. Mol. Biol. 296: 11391152.[CrossRef][Medline]
Kumar, S. and Bansal, M. 1998. Dissecting
-helices: Position-specific analysis of
-helices in globular proteins. Proteins 31: 460476.[CrossRef][Medline]
Lim, W.A. and Sauer, R.T. 1989. Alternative packing arrangements in the hydrophobic core of lambda repressor. Nature 339: 3136.[CrossRef][Medline]
Lin, J. 1991. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37: 145151.[CrossRef]
Markiewicz, P., Kleina, L.G., Cruz, C., Ehret, S. and Miller, J.H. 1994. Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. J. Mol. Biol. 240: 421433.[CrossRef][Medline]
Milla, M.E., Brown, B.M., and Sauer, R.T. 1994. Protein stability effects of a complete set of alanine substitutions in Arc repressor. Nat. Struct. Biol. 1: 518523.[CrossRef][Medline]
Miller, S., Janin, J., Lesk, A.M., and Chothia, C. 1987. Interior and surface of monomeric proteins. J. Mol. Biol. 196: 641656.[CrossRef][Medline]
Mirny, L. and Shakhnovich, E. 2001. Evolutionary conservation of the folding nucleus. J. Mol. Biol. 308: 123129.[CrossRef][Medline]
Mirny, L.A. and Shakhnovich, E.I. 1999. Universally conserved positions in protein folds: Reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol. 291: 177196.[CrossRef][Medline]
Mirny, L.A., Abkevich, V.I., and Shakhnovich, E.I., 1998. How evolution makes proteins fold quickly. Proc. Natl. Acad. Sci. 95: 49764981.
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Naor, D., Fischer, D., Jernigan, R.L., Wolfson, H.J., and Nussinov, R. 1996. Amino acid pair interchanges at spatially conserved locations. J. Mol. Biol. 256: 924938.[CrossRef][Medline]
Orengo, C.A. and Taylor, W.R. 1996. SSAP: Sequential structure alignment program for protein structure comparison. Methods Enzymol. 266: 617635.[Medline]
Orengo, C.A., Jones, D.T., and Thornton, J.M. 1994. Protein superfamilies and domain superfolds. Nature 372: 631634.[CrossRef][Medline]
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATHa hierarchic classification of protein domain structures. Structure 5: 10931108.[Medline]
Park, J., Karplus, K., Barrett, C., Hughey, R., Haussler, D., Hubbard, T., and Chothia, C. 1998. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J. Mol. Biol. 284: 12011210.[CrossRef][Medline]
Prlic, A., Domingues, F.S., and Sippl, M.J. 2000. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng. 13: 545550.
Ptitsyn, O.B. and Ting, K.L. 1999. Non-functional conserved residues in globins and their possible role as a folding nucleus. J. Mol. Biol. 291: 671682.[CrossRef][Medline]
Reddy, B.V., Li, W.W., Shindyalov, I.N., and Bourne, P.E. 2001. Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins. Proteins 42: 148163.[CrossRef][Medline]
Rennell, D., Bouvier, S.E., Hardy, L.W., and Poteete, A.R. 1991. Systematic mutation of bacteriophage T4 lysozyme. J. Mol. Biol. 222: 6788.[CrossRef][Medline]
Samudrala, R. and Moult, J. 1998. A graph-theoretic algorithm for comparative modeling of protein structure. J. Mol. Biol. 279: 287302.[CrossRef][Medline]
Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147: 195197.[CrossRef][Medline]
Suckow, J., Markiewicz, P., Kleina, L.G., Miller, J., Kisters-Woike, B., and Muller-Hill, B. 1996. Genetic studies of the Lac repressor. XV: 4000 single amino acid substitutions and analysis of the resulting phenotypes on the basis of the protein structure. J. Mol. Biol. 261: 509523.[CrossRef][Medline]
Uppenberg, J., Hansen, M.T., Patkar, S., and Jones, T.A. 1994. The sequence, crystal structure determination and refinement of two crystal forms of lipase B from Candida antarctica. Structure 2: 293308.
Verschueren, K.H., Franken, S.M., Rozeboom, H.J., Kalk, K.H., and Dijkstra, B.W. 1993. Refined x-ray structures of haloalkane dehalogenase at pH 6.2 and pH 8.2 and implications for the reaction mechanism. J. Mol. Biol. 232: 856872.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
G. Pugalenthi, P. N. Suganthan, R. Sowdhamini, and S. Chakrabarti SMotif: a server for structural motifs in proteins Bioinformatics, March 1, 2007; 23(5): 637 - 638. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Sharma, S. Chavali, A. Mahajan, R. Tabassum, V. Banerjee, N. Tandon, and D. Bharadwaj Genetic Association, Post-translational Modification, and Protein-Protein Interactions in Type 2 Diabetes Mellitus Mol. Cell. Proteomics, August 1, 2005; 4(8): 1029 - 1037. [Abstract] [Full Text] [PDF] |
||||
![]() |
|