|
|
||||||||
Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch, Galveston, Texas 77555-1068, USA
Reprint requests to: Vincent J. Hilser, Department of Human Biological Chemistry and Genetics, 5.162 Medical Research Bldg., University of Texas Medical Branch, Galveston, TX 77555-1068, USA; e-mail: vince{at}hbcg.utmb.edu; fax: (409) 747-6816.
(RECEIVED February 23, 2004; FINAL REVISION April 23, 2004; ACCEPTED April 23, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: native state ensemble; sequential cooperative segments; fold recognition; protein structure prediction; position-specific thermodynamics; protein stability
Abbreviations: PDB, Protein Data Bank PAM, Partitioning Around Medoids ASA, accessible surface area SCOP, structural classification of proteins FSSP, families of structurally similar proteins
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04706204.
| Introduction |
|---|
|
|
|---|
-helix,
-sheet, etc.), each structural unit is, in turn, part of a higher order structural motif (e.g.,
/
), and the motifs are arranged to form unique folds. Although structural descriptions of fold space have proven to be effective in fold recognition as well as homology studies (Bowie et al. 1991; Godzik and Skolnick 1992; Jones et al. 1992; Bryant and Lawrence 1993; Defay and Cohen 1996; Huang et al. 1996; Rost et al. 1997; Kelley et al. 2000; Mallick et al. 2002), such approaches do not account (at least not explicitly) for the well-known experimental observation that proteins display regional differences in conformational heterogeneity, even under native conditions (Wuthrich 1989; Bai and Englander 1996). This result suggests that the canonical structure alone may not provide the required determinants for fold specificity, and that a classification scheme that accounts explicitly for this heterogeneity could be of significant value. In the early 1970s Anfinsen (1973) reported that under the proper solvent conditions, amino acid sequences fold spontaneously into functional three-dimensional protein structures, thus introducing the "thermodynamic hypothesis." An important implication of the thermodynamic hypothesis is that all of the information required for specifying a protein fold is contained in the primary sequence, and that the information is thermodynamic in nature. An extension, or perhaps even a consequence, of the thermodynamic hypothesis is that in addition to considering a protein as a sequence of structural building blocks (i.e., secondary structure), "a parallel view can be adopted, wherein a protein can be represented as a sequence of thermodynamic building blocks." Indeed, as shown previously (Wrabl et al. 2002), a database of proteins can be represented in purely thermodynamic terms, and the thermodynamic environments can be implemented successfully into a fold recognition approach, thus providing a proof of principle for the notion of an entirely thermodynamic description of protein folds.
The success of these initial studies (Wrabl et al. 2001Wrabl et al. 2002) and the unique nature of the environmental descriptors leaves open the possibility that a hierarchical thermodynamic classification scheme similar to SCOP (Murzin et al. 1995) or FSSP (Holm and Sander 1996), but independent of structure, can be developed that will serve as the basis for evaluating thermodynamic similarities between folds. Toward this end, the following questions must be addressed: How many distinct energetic environments are present across a database of proteins? Do hierarchical thermodynamic elements exist that are analogous to secondary structure? What is the relationship between the structural and the thermodynamic building blocks? How many amino acid types are needed to encode the thermodynamic environments across the entire structural database? In the present study, cluster analysis and fold recognition are used as tools to address these questions, which constitute the cornerstone of a thermodynamic classification scheme that can be used as the basis for comparison between folds (Holm and Sander 1996).
| Results and Discussion |
|---|
|
|
|---|
|
![]() | (1) |
Where Ki= [exp(
Gi/RT)] is the statistical weight of each microstate and the summation in the denominator is the partition function, Q, for the system (Wrabl et al. 2002).
An important feature of the COREX algorithm is that it provides a means of describing a protein structure by position-specific values that can be ascertained directly from the probabilities described in equation 1. One such quantity, known as the stability constant,
f,j, is the ratio of the summed probability of states in the ensemble in which a particular position, j, is folded (
Pf,j) to the summed probability of states in which that position is not folded (
Pnf,j):
![]() | (2A) |
The importance of the stability constant is twofold. First, it can be compared directly to hydrogen exchange protection factors, thus representing an experimentally verifiable energetic description of the protein (Hilser and Freire 1996; Hilser et al. 1998). The good agreement between calculated and experimental protection factors demonstrates that the native state ensemble, as calculated by the COREX algorithm, provides a reasonable representation of the actual native state ensemble (Wrabl et al. 2001).
The most important aspect of the stability constant, however, is that it provides a means of characterizing the regional differences in stability within the protein, at the level of each residue position. In energetic terms, the stability constant reports on the difference in energy between the subensemble of states in which position j is in a folded region and the subensemble of states in which position j is in a nonfolded region (Fig. 1
):
|
![]() | (2B) |
![]() | (2C) |
Likewise, position-specific reporters of the component thermodynamic functions can also be defined; the polar enthalpy ([
H]pol,j), apolar enthalpy ([
H]ap,j), and conformational entropy ([T
S]conf,j), like the stability constant, report on the difference in energetics between the folded and nonfolded subensembles for each position (Materials and Methods; Wrabl et al. 2002). The unique and quintessential feature of these quantities, which is shown in Figure 1
, is that they are ensemble averaged thermodynamic reporters of the energetics at each position, which implicitly account for the effects of all regions of the protein on the energetics at a particular position (Wrabl et al. 2002). In contrast, they do not represent the energetic contribution of an amino acid to the stability of the molecule. This is highlighted in Figure 2
and Table 1
, which show the relationship between the position-specific descriptors of the proteins in the H. sapiens database and the contribution of the amino acid at that position to the accessible surface area (ASA) of the native structure. Because the energetic contribution of each amino acid is calculated from the
ASA, as described in Materials and Methods, the absence of a correlation between the position-specific descriptors and the energetic contributions indicates that position-specific quantities provide a means of characterizing the fold of a protein in a way that effectively separates the amino acid at a position in the protein from the position itself. As such, the position-specific energetics are a property of the ensemble as a whole, and the sequence of properties constitutes the thermodynamic signature of that fold.
|
|
Previous studies from this laboratory have revealed that the propensities of amino acids for empirically defined thermodynamic environments can provide significant structure encoding information (Wrabl et al. 2002). This was demonstrated by successfully matching sequences to folds using a thermodynamics-based threading approach. In this study, separate experiments were performed, using the different clustering results, wherein the propensities of the 20 amino acids for each environment cluster were determined. The resultant log-odds probabilities were used in fold recognition experiments to determine the minimum number of thermodynamic environments necessary to sufficiently describe the structure encoding energetics of the proteins analyzed in the database.
Figure 3
shows fold recognition results obtained by threading a library of sequences onto protein folds that have been defined by different numbers of thermodynamic environment clusters. Fold recognition success is represented by the percent of proteins in which the correct sequence scored in the top 1 percentile (i.e., was among the top four scoring sequences out of 431 decoys) when matched with its corresponding fold. Two features are apparent in Figure 3
. First, fold recognition success saturates at ~84% (dotted line) as the number of environments increases. Second, eight thermodynamic environments provide more than 95% (80%/84%) of the structure encoding information with 80% (128/159) of the sequences correctly matched with structure. Of note is that the choice of criteria for success does not dramatically impact the results. Defining success as scoring in the top 5th and 10th percentiles increases the fraction of proteins that are correctly matched to 87% (139/159) and 91% (145/159), respectively. These results are reproducible using randomly and nonrandomly jackknifed data sets (not shown), indicating that the results are not sensitive to the choice of proteins used. As no size- or structure-related bias in the analysis has been identified (Wrabl et al. 2002), these results suggest that within the database of H. sapiens proteins, eight distinct thermodynamic environments are sufficient to account for virtually all of the thermodynamic diversity captured by this analysis.
|
H]ap/[
H]pol), which provides a metric of the relative polarity of a position-specific environment, appears to oscillate as a function of thermodynamic environment. This means that the cluster analysis is discriminating between apolar and polar environments at each level of stability, and suggests that proteins have evolved multiple energetic mechanisms to achieve a particular stability.
|
|
|
Interestingly, comparison of the boundaries for the sequential cooperative segments and the boundaries for secondary structure elements reveals that although some segments correspond directly to structural elements, most segments are independent of traditional structural classifications. In several cases the sequential cooperative segments correspond to the ends of
-strands or
-helices and the adjacent loops. In short, sequential cooperative segments can bridge multiple structural elements, and structural elements can span multiple sequential cooperative segments. The lack of correspondence between the two is important because it demonstrates that each secondary structural element does not obligatorily behave as a cooperative unit. Instead, the cooperative building blocks in proteins are more accurately represented by the segments depicted in Figure 5
. In essence, the sequential cooperative segments identified here are the thermodynamic counterpart to secondary structure, as they represent the first level of thermodynamic organization in proteins.
Comparison of the sequential cooperative segments to secondary structure is useful because it highlights several important aspects of the segments. First, like secondary structure, which reports on the local structure in the context of the overall fold, the sequential cooperative segments report on the local energetics in the context of the entire conformational manifold of the protein. As such, they are a representation of the overall structure but are merely defined in energetic terms at the level of groups of amino acids. This leads to a second similarity, which is that the sequential cooperative segments are not reporting only on the intrinsic properties of the local sequence. Rather, the boundaries and thermodynamic properties of the segments are influenced by a combination of local and global factors. Finally, all residues in a sequence are not found to be part of sequential cooperative segments. Much like secondary structure, which can be flanked by residues with more or less nonregular structure, the energetically defined segments are often abutted by amino acids with no discernible energetic similarity to neighboring positions. The qualitative similarities between the segments described here and secondary structure are therefore compelling, as they appear to illuminate a novel way of dissecting proteins into their elementary building blocks.
Hierarchical clustering of amino acids in thermodynamic environments
As the position-specific thermodynamic descriptors in the H. sapiens protein database are independent of the contributions of the amino acids at each site (Fig. 2
; Table 1
), the propensities of each amino acid for the different environments cannot be predicted de facto from the properties (i.e., size, charge, hydrophobicity, etc.) of the amino acids. It is therefore of significant interest to know the distributions of amino acids in each environment, as well as which amino acids share similar propensities across all environments. To address these issues, the probabilities of the 20 amino acids for the eight thermodynamic environments were subjected to double hierarchical clustering as described in Materials and Methods. The resultant hierarchical groupings (i.e., dendrograms) and heat map illustrate amino acid propensities for the eight thermodynamic environments (Fig. 6
). Inspection of the row dendrogram shows that the first separation of amino acid clusters is based on hydrophobicity. The aromatic amino acids (Trp, Phe, and Tyr) and the branched aliphatic amino acids (Leu, Ile, and Val) make up the hydrophobic group, and the remaining amino acids comprise the hydrophilic group. Although noted above, it should be emphasized that the separation by hydrophobic and hydrophilic is not predetermined by the method of analysis. As the contribution of each amino acid is not correlated to the thermodynamics of the environment to which it belongs (Fig. 2
; Table 1
), the hierarchical cluster analysis is reporting on a selection mechanism that is not specifically determined by the chemistry of the amino acid at that position.
|
H]ap/[
H]pol ratios (i.e., TE1, TE2, and TE4). Once again, this discrimination cannot be predicted on the basis of side chain properties, suggesting that the results are not a simple consequence of the energy function used to determine the ensemble. Further inspection of the row dendrogram reveals that the propensity of Pro is unique, as it is found often in low stability environments (i.e., TE2) at the expense of high stability (i.e., TE7 and TE8). Gly, Thr, and Ala form a fourth cluster, trending with the stability dimension and being found more often in medium- to low-stability environments. The fifth cluster consists of Met, His, Glu, and Arg residues, which are found in medium- to high-stability and high-enthalpy-ratio environments (i.e., TE5 and TE7). The sixth and final cluster is composed of amino acids with charged and uncharged polar side chains (Ser, Asp, Asn, Lys, Glu, and Cys). The frequency of occurrence of these residues does not track with the stability of a cluster, but they are found frequently in environments with high [
H]ap/[
H]pol ratios (i.e., TE3, TE5, and TE7).
Interestingly, comparison of the propensities of chemically and structurally similar amino acids such as Lys and Arg reveal distinct differences in environmental preferences (Fig. 6
). TE3 illustrates one of the differentiating factors between Lys and Arg; Arg is seldom found in the low-stability environment that has a high enthalpy ratio, although Lys shows no preference. Indeed, throughout the entire database, numerous differences in thermodynamic usage are found for amino acids with apparently similar chemistry.
One of the most compelling features of the pattern of propensities (Fig. 6
; Table 2
) is that the propensities of some amino acids are strongly influenced by the stability of the particular region of the protein, whereas others are more strongly influenced by the polarity of the environment and are independent of stability. The latter result is especially noteworthy as it further indicates that the propensity of an amino acid for an environment is not simply recapitulating the stability contribution of that amino acid to the environment (as implemented in the energy function). Although we have no definitive explanation for these results, it is, nonetheless, further indication of a degree of independence between the position in a fold and the amino acid that is encoded at that position. If this is indeed the case, then it would leave open the possibility that the thermodynamic signature of a fold is coded in the primary sequence, but not at the level of the individual residue. In other words, it would appear to suggest that the thermodynamic signature of a fold is encoded at the level of "groups" of residues.
Statistically derived amino acid clusters
The double hierarchical clustering of amino acids in the eight thermodynamic environments (Fig. 6
) reveals both traditional and nontraditional groupings of amino acid types. Underlying questions are whether these groupings provide sufficient resolution to encode structure, and if so, how many amino acid clusters are required to describe the eight thermodynamic environments of the proteins in the H. sapiens structural database. To determine the thermodynamic information content of the hierarchical clustering analysis, simple fold recognition experiments were performed based on the observed amino acid distributions within the eight thermodynamic environments in a manner similar to the analysis of the environment clusters described above. Figure 7
shows fold recognition results obtained by threading sequences, which are defined as having 220 amino acid clusters (in separate experiments), into folds that are defined in terms of the eight thermodynamic environment clusters. The amino acid clusters came directly from the nodal divisions of the hierarchical clustering analysis (row dendrogram in Fig. 6
). As stated previously, the first division of the 20 amino acids was correlated with hydrophobicity (Fig. 6
), in which one group was comprised of the aromatic and branched aliphatic amino acids, and the second group was made up of the remaining 14 amino acids. Fold recognition based on this binary hydrophobicity scale was poor (~50% success), indicating that simple hydrophobic versus hydrophilic was not sufficient to match sequence to fold, even with the eight thermodynamic environments (Fig. 7
). Indeed, fold recognition based on the threading of the two amino acid clusters into just two environmental clusters (which, as seen in the column dendrograms in Fig. 6
, reduces to polar versus apolar) results in no fold recognition success (open symbols in Fig. 7
). This result shows that the environmental resolution provided by the current thermodynamic descriptors represents a dramatic improvement over classification schemes that simply identify hydrophobic versus hydrophilic or inside versus outside, and it does so without detailed structural specifications at each position.
|
Position-specific thermodynamics of chaperone Hsp90
The use of the ensemble-based, position-specific energetics is an important facet of the current analysis, and represents a critical element in the interpretation of the data. As noted previously, the position-specific environmental descriptors are more appropriately viewed as reporters rather than contributors to the energetics at that position. However, of particular significance is that the information is not captured through a structural definition of the environments. Figure 8
highlights the Phe residues in the chaperone Hsp90 protein (PDB: 1BYQ
[PDB]
). As noted, F34, F108, and F160 are nearly identical in terms of their accessible surface area contribution to the energetic calculations as provided by the static representation. The ensemble-based characterization reveals, however, that these three "structurally" similar positions have significantly different thermodynamics in the native state ensemble. The differences in natural log of the stability constants illustrate this point. F160 has a ln
f of ~13 whereas F34 has a ln
f of ~24 (i.e., F34 is in a more stable environment). Contrary to the canonical representation of these two residues afforded by the high-resolution structure, F160 is far more dynamic than F34 and has a much higher probability of being unfolded in the native state ensemble. Similarly, residues F160 and F203 are found in different regions of the Hsp90 chaperon protein. F160 is buried in the core of the protein and has nearly zero surface area exposure. F203 is located on the surface of the protein with approximately 70% surface area exposure. The ensemble-based characterization of these structurally different residues reveals the energetics of these positions to be similar in the native state ensemble.
|
-strand, whereas F203 is 70% exposed and is located in a turn just C-terminal to an
-helix. F10 and F108 have analogous thermodynamic environments, but F10 is surface exposed in an extended
-strand and F108 is completely buried in an
-helix. Furthermore, F10 and F160 are both found in extended
-strands but belong to different thermodynamic environments. Finally, F34 and F108 are both located in
-helices, but exhibit vastly different energetics in the native state ensemble.
Although anecdotal, Figure 8
illustrates the fundamental difference between the thermodynamic descriptors and structural classifications of each position. By definition, classic structural descriptions of the positions within proteins are describing some facet of the structure itself. The descriptors used here provide a metric of the thermodynamic susceptibility of each position. This is an essential aspect of the ensemble-based description because it provides a means of quantitatively accounting for the fact that although all regions of the molecule are seen in a unique conformation within the context of the canonical structure, some regions are more dynamic and have a higher proclivity to adopt other conformations. Just as important, proteins utilize different thermodynamic mechanisms (eight, to be precise; Fig. 3
) to achieve the regional differences in stability and dynamics, and the approach described here captures these differences. Finally, the fact that the environments are described in units of energy means that this approach provides a direct quantitative connection with the biophysical and functional properties, opening a venue for experimental validation (Hilser and Freire 1996; Pan et al. 2000; Babu et al. 2004).
Conclusions
The results presented here reveal that a database of H. sapiens protein structures can be represented as sequences of eight thermodynamic environment descriptors, which, when applied to the high-resolution structure, reveal sequential cooperative segments within the proteins. As these segments represent the first level of thermodynamic organization in proteins, they can be considered the thermodynamic equivalent to secondary structure. Interestingly, the boundaries for the sequential cooperative segments and traditional secondary structural elements are not identical, suggesting that secondary structures, although representing elementary structural units, are not the thermodynamic building blocks. Identification of the number and types of thermodynamic building blocks in proteins, as well as the pattern of these building blocks within the protein structure, is a prerequisite to a classification scheme that can be used to compare thermodynamic similarity between folds.
Finally, the results indicate that almost all of the structure encoding information in the thermodynamic analysis can be conferred with six amino acid clusters. This result is intriguing as it implies that the full spectrum of thermodynamic diversity could have been achieved with a much simpler genetic code. In effect, the results indicate that although proteins (and the genetic code) may have increased in complexity during the evolution process, the thermodynamic architecture of the resultant proteins can nonetheless be explained in the context of a primordial library.
| Materials and methods |
|---|
|
|
|---|
The COREX algorithm and accessible surface area calculations
Each of the 159 proteins in the database was analyzed using the COREX algorithm (Hilser and Freire 1996), which models the native state ensemble of a protein molecule in solution. In the present analysis a Monte Carlo sampling method was used to select states, in order to accommodate large ensembles that would be computationally intractable with a full COREX enumeration. The total number of states saved was 50,000 per partition, except for proteins less than 80 residues. For proteins less than 80 residues the Monte Carlo sampling method is still used, but the total number of saved states is lowered accordingly. The Monte Carlo sampling method preferentially selects lower energy states at the expense of high-energy states. The selection subroutine uses the free energy of the completely unfolded state as a reference. The probability of selecting states with an energy equal to the reference state is 75%. The probability of selecting a higher energy state drops exponentially to 1%. Similarly, the probability of selecting a state lower in energy than the reference state increases exponentially to 100%.
The free energy for any state in the ensemble relative to the fully folded state is calculated using equation 3:
![]() | (3) |
COREX uses accessible surface-area-based parameterizations to calculate the relative apolar and polar free energies of each enumerated state (Gomez et al. 1995; Hilser and Freire 1996):
|
| (4) |
|
| (5) |
The conformational entropy (
Sconf) for each state has three contributing factors: (1)
Sbu
ex, the entropy change associated with the transfer of a side chain that is buried in the interior of the protein to its surface; (2)
Sex
u, the entropy change gained by a surface-exposed side chain when the peptide backbone unfolds; and (3)
Sbb, the entropy change gained by the backbone itself upon unfolding (DAquino et al. 1996; Hilser and Freire 1996). The simulated temperature of all the analyses was 25°C, the window size was 5, and the entropy weighting factor (W) was 0.500. The entropy weighting factor is a scaling variable used in the fold recognition experiments to minimize contributions of the completely unfolded states from the position-specific calculations (equation 3). Inquiries or requests for the COREX algorithm can be made to the corresponding author.
Ensemble-averaged thermodynamic descriptors
To arrive at the position-specific descriptors, an average excess quantity is first defined, which represents the probability distribution of all states in the ensemble (Wrabl et al. 2002):
![]() | (6) |
![]() | (7) |
![]() | (8) |
Taking the difference in average excess quantities of the folded (Qf) and unfolded (Qnf) subensembles yields the position-specific values as described previously (Wrabl et al. 2002):
![]() | (9) |
![]() | (10) |
![]() | (11) |
Equations 911 reflect the average thermodynamic environments of a particular position in the protein, accounting implicitly for the contribution of all the amino acids over all the states in the ensemble.
Statistical derivation of the thermodynamic environments
The thermodynamic environments were statistically defined using S-Plus 6.0 professional software. The clustering analysis algorithm used was Partitioning Around Medoids (PAM; Kaufman and Rousseeuw 1990). The position-specific [
G]j, [
H]ap,j, [
H]pol,j, and [T
S]conf,j were the variables used to cluster all 23,944 residue positions in the database. The dissimilarity metric was Manhattan, which calculates the sum of the absolute differences between clusters. The number of medoids was set to 2, 4, 6, 8, 10, 12, 14, 16, and 18 in separate cluster analyses.
Statistics for the 20 amino acids as a function of thermodynamic environment cluster number were tabulated for all the residue positions in the database (data not shown). The differential distribution of the 20 amino acids within the thermodynamic environments was used to calculate the log-odds probability of finding an amino acid type within a particular thermodynamic environment cluster. The log-odds probability (LOP) is double normalized to account for differences in amino acid and thermodynamic environment counts as calculated below:
![]() | (12) |
Double hierarchical cluster analysis
SpotFire DecisionSite Statistics 7.2 software was used to visualize and cluster the calculated log-odds probabilities of the 20 amino acids for the eight thermodynamic environments. The heatmap (Fig. 6
) illustrates the relative intensities of the calculated log-odds probabilities. The color range is set to continuous coloring and spans from green to black to red. The range is set so that log-odds probabilities equal to zero are colored black, log-odds probabilities less than zero are green, and log-odds probabilities greater than zero are colored red. The relative intensity of the colors reflects their distance from zero.
SpotFire uses agglomerative hierarchical clustering to generate dendrograms showing the similarity between rows of the heatmap (amino acids) and columns of the heatmap (thermodynamic environments). The agglomerative approach iteratively merges the closest pair of records according to the selected clustering method and dissimilarity measure. The clustering method is complete linkage that computes the distance between any two clusters, x and y, as the maximum distance between a member of cluster x and a member of cluster y. The similarity measure is city block distance (Manhattan), which is the distance between two points measured along axes at right angles.
Fold recognition experiments based on amino acid propensities for thermodynamic environments
Fold recognition experiments are performed using PROFILESEARCH of Eisenberg and coworkers (Bowie et al. 1991), which implements the SmithWaterman local alignment algorithm (Smith and Waterman 1981) as described previously (Wrabl et al. 2002). The three-dimensional profiling method is used as a proof-of-principle assessment of the amino acid propensities for the thermodynamic environments as seen in Figure 6
. The three-dimensional profiling method characterizes the high-resolution structure of a protein as a one-dimensional string of "environmental classes" as a function of residue position (Bowie et al. 1991). There are 431 decoy sequences in each fold recognition experiment that were obtained from the Protein Data Bank (Berman et al. 2000). The sequence library was inclusive for all H. sapiens fold types coding for experimentally solved structures ranging from 50 to 250 residues in length and having a maximum sequence identity of 50% (Berman et al. 2000). The PROFILESEARCH algorithm dynamically aligns each decoy amino acid sequence plus the native sequence to the one-dimensional string of thermodynamic environments. Each combination of amino acid and thermodynamic environment in the alignment receives a score from a scoring matrix derived from the log-odds probabilities calculated by equation 12. The cumulative score over all positions in the alignment is the score for a particular sequence to a target protein. A successful fold recognition experiment is one in which the native sequence had a greater cumulative score than 99% of the sequences in the sequence library.
| Appendix |
|---|
|
|
|---|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Babu, C.R., Hilser, V.J., and Wand, A.J. 2004. Direct access to the cooperative substructure of proteins and the protein ensemble via cold denaturation. Nat. Struct. Mol. Biol. 11: 352357.[CrossRef][Medline]
Bai, Y. and Englander, S.W. 1996. Future directions in folding: The multi-state nature of protein structure. Proteins 24: 145151.[CrossRef][Medline]
Baldwin, R.L. 1986. Temperature dependence of the hydrophobic interaction in protein folding. Proc. Natl. Acad. Sci. 83: 80698072.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Bowie, J.U., Luthy, R., and Eisenberg, D. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science 253: 164170.
Bryant, S.H. and Lawrence, C.E. 1993. An empirical energy function for threading protein sequence through the folding motif. Proteins 16: 92112.[CrossRef][Medline]
DAquino, J.A., Gomez, J., Hilser, V.J., Lee, K.H., Amzel, L.M., and Freire, E. 1996. The magnitude of the backbone conformational entropy change in protein folding. Proteins 25: 143156.[CrossRef][Medline]
Defay, T.R. and Cohen, F.E. 1996. Multiple sequence information for threading algorithms. J. Mol. Biol. 262: 314323.[CrossRef][Medline]
Godzik, A. and Skolnick, J. 1992. Sequence-structure matching in globular proteins: Application to supersecondary and tertiary structure determination. Proc. Natl. Acad. Sci. 89: 1209812102.
Gomez, J., Hilser, V.J., Xie, D., and Freire, E. 1995. The heat capacity of proteins. Proteins 22: 404412.[CrossRef][Medline]
Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pave, A. 1980. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 8: r49r62.
Habermann, S.M. and Murphy, K.P. 1996. Energetics of hydrogen bonding in proteins: A model compound study. Protein Sci. 5: 12291239.[Abstract]
Hilser, V.J. and Freire, E. 1996. Structure-based calculation of the equilibrium folding pathway of proteins. Correlation with hydrogen exchange protection factors. J. Mol. Biol. 262: 756772.[CrossRef][Medline]
Hilser, V.J., Dowdy, D., Oas, T.G., and Freire, E. 1998. The structural distribution of cooperative interactions in proteins: Analysis of the native state ensemble. Proc. Natl. Acad. Sci. 95: 99039908.
Holm, L. and Sander, C. 1996. Mapping the protein universe. Science 273: 595603.
Huang, E.S., Subbiah, S., Tsai, J., and Levitt, M. 1996. Using a hydrophobic contact potential to evaluate native and near-native folds generated by molecular dynamics simulations. J. Mol. Biol. 257: 716725.[CrossRef][Medline]
Jones, D.T., Taylor, W.R., and Thornton, J.M. 1992. A new approach to protein fold recognition. Nature 358: 8689.[CrossRef][Medline]
Kaufman, L. and Rousseeuw, P.J. 1990. Finding groups in data. An introduction to cluster analysis. John Wiley and Sons, New York.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299: 499520.[Medline]
Lee, K.H., Xie, D., Freire, E., and Amzel, L.M. 1994. Estimation of changes in side chain configurational entropy in binding and folding: General methods and application to helix formation. Proteins 20: 6884.[CrossRef][Medline]
Mallick, P., Weiss, R., and Eisenberg, D. 2002. The directional atomic solvation energy: An atom-based potential for the assignment of protein sequences to known folds. Proc. Natl. Acad. Sci. 99: 1604116046.
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Pan, H., Lee, J.C., and Hilser, V.J. 2000. Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc. Natl. Acad. Sci. 97: 1202012025.
Rost, B., Schneider, R., and Sander, C. 1997. Protein fold recognition by prediction-based threading. J. Mol. Biol. 270: 471480.[CrossRef][Medline]
Sharp, P.M., Cowe, E., Higgins, D.G., Shields, D.C., Wolfe, K.H., and Wright, F. 1988. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens: A review of the considerable within-species diversity. Nucleic Acids Res. 16: 82078211.
Smith, T.F. and Waterman, M.S. 1981. Identification of common molecular subsequences. J. Mol. Biol. 147: 195197.[CrossRef][Medline]
Wrabl, J.O., Larson, S.A., and Hilser, V.J. 2001. Thermodynamic propensities of amino acids in the native state ensemble: Implications for fold recognition. Protein Sci. 10: 10321045.
. 2002. Thermodynamic environments in proteins: Fundamental determinants of fold specificity. Protein Sci. 11: 19451957.
Wuthrich, K. 1989. Determination of three-dimensional protein structures in solution by nuclear magnetic resonance: An overview. Methods Enzymol. 177: 125131.[Medline]
Xie, D. and Freire, E. 1994. Structure based prediction of protein folding intermediates. J. Mol. Biol. 242: 6280.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
|
|