|
|
||||||||
College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109-1065, USA
Reprint requests to: Andrei L. Lomize, College of Pharmacy University of Michigan, 428 Church St., Ann Arbor, MI 48109-1065; e-mail: almz{at}umich.edu; fax: 734-763-5595.
(RECEIVED March 11, 2002; FINAL REVISION May 10, 2002; ACCEPTED May 20, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0307002.
| Abstract |
|---|
|
|
|---|
-helices and ß-sheets of T4, human, and hen lysozymes and HI ribonuclease. The determined energies of vdW interactions and H-bonds were smaller than in molecular mechanics and followed the "like dissolves like" rule, as expected in condensed media but not in vacuum. The depths of modified Lennard-Jones potentials were -0.34, -0.12, and -0.06 kcal/mole for similar atom types (polarpolar, aromaticaromatic, and aliphaticaliphatic interactions, respectively) and -0.10, -0.08, -0.06, -0.02, and nearly 0 kcal/mole for different types (sulfurpolar, sulfuraromatic, sulfuraliphatic, aliphaticaromatic, and carbonpolar, respectively), whereas the depths of H-bond potentials were -1.5 to -1.8 kcal/mole. The obtained solvation parameters, that is, transfer energies from water to the protein interior, were 19, 7, -1, -21, and -66 cal/moleÅ2 for aliphatic carbon, aromatic carbon, sulfur, nitrogen, and oxygen, respectively, which is close to the cyclohexane scale for aliphatic and aromatic groups but intermediate between octanol and cyclohexane for others. An analysis of additional replacements at the waterprotein interface indicates that vdW interactions between protein atoms are reduced when they occur across water. Keywords: Free energy; protein engineering; solvation; energy functions; protein stability; secondary structure; protein folding
Abbreviations: vdW, van der Waals ASA, accessible surface areas ASP, atomic solvation parameter PDB, Protein Data Bank, H-bond, hydrogen bond, BPTI, bovine pancreatic trypsin inhibitor r.m.s.d., root mean square deviation
| Introduction |
|---|
|
|
|---|
It is accepted that the appropriate energy functions would reproduce the thermodynamic stabilities, that is, free energies of proteins and proteinligand complexes (Kollman 1993; Fersht 1999). However, the calculation of free energy is a difficult undertaking. For example, molecular mechanics potentials have been designed for calculation of conformational energies (enthalpies) of organic molecules in vacuum (Halgren 1995; Lazaridis et al. 1995). The conformational energy of a protein (several hundreds or thousands of kcal/mole) does not reflect its thermodynamic stability for many reasons. First, all interactions of electrostatic origin, including van der Waals (vdW) dispersion attraction and hydrogen bonds (H-bonds), depend on dielectric properties of the environment (Wood and Thompson 1990; Israelachvili 1992; Leckband and Israelachvili 2001), and therefore their energies are expected to be different in vacuum, within the interior of proteins, and in water. Second, the contributions of ion pairs, H-bonds, or hydrophobic contacts to the protein stability can be significantly reduced when they are formed by flexible side chains (Shortle 1992; Fersht and Serrano 1993; Scholtz et al. 1993; Matthews 1995a), which must be taken into account. Third, the intramolecular energy of the polypeptide chain must be supplemented by conformational entropy (Brady and Sharp 1997) and solvation free energy. The corresponding atomic solvation parameters (ASPs) for different atom types can be derived from experimental transfer energies of model compounds from water to octanol or cyclohexane. However, these organic solvents can serve only as a crude approximation of the protein interior. Moreover, many published solvation parameter sets are poorly consistent with each other (Juffer et al. 1995). Fourth, the quantity of interest is not the energy of the folded structure alone but rather the difference of free energies of folded and unfolded states (Shortle 1992, 1996; Koehl and Levitt 1999).
There are two important questions here: What theory would be appropriate for free-energy calculations, and how to determine parameters of energy functions. Experimental thermodynamics studies provide a basis to resolve these questions. This rapidly evolving field includes measurements and interpretation of free energy, entropy, enthalpy, and heat capacity changes associated with peptide and protein folding, mutations, conformational transitions, ligand binding, or transfer of model compounds (Fersht and Serrano 1993; Matthews 1993, 1995b; Chakrabartty and Baldwin 1995; Makhatadze and Privalov 1995; Robertson and Murphy 1997; Luque and Freire 1998; Gohlke and Klebe 2001; Rees and Robertson 2001). The accumulated data provided energetic estimates of hydrophobic interactions, ion pairs, H-bonds, conformational entropy, and other contributions to the protein stability. However, application of these estimates to the practical modeling of protein structure, such as fold recognition or de novo design, is not straightforward. For example, vdW and solvation energies, which are crucially important for protein modeling, have been estimated using rough approximations. Indeed, all "nonpolar" atoms (sulfur, aliphatic, and aromatic groups) were usually described by a single solvation parameter, even though this was not supported by studies of organic compounds (Makhatadze and Privalov 1994). Also, vdW interactions in proteins were treated as cavity formation or packing density terms (Eriksson et al. 1992; Jackson et al. 1993; Funahashi et al. 2001), that is, not using specific energies for different types of atoms and without explicit distance-dependent potentials.
In the present work, we determined the energies of vdW interactions between specific atom types in the protein interior and ASPs on the basis of published protein-engineering data. To bypass a number of difficulties, we designed a specific model for calculation of the free-energy differences and selected the set of mutants using very restrictive criteria. First, only mutants with high-resolution crystal structures available were used to take into account the conformational relaxation of protein structure after the substitutions. Second, the replaced residues had to be buried from water, because interactions at the protein surface may be different (Shortle 1992; Matthews 1995a). It is important that most of the buried residues had a limited conformational flexibility, judging from the low B-values of their atoms. Third, substitutions of charged residues were not included in the set to omit the explicit Coulomb electrostatics that need to be studied separately. Finally, we used only replacements in
-helices and ß-sheets, because the secondary structures can serve as well-defined reference states for calculation of 
G values, whereas the properties of coil are poorly defined and understood (Shortle 1992, 1996; Creamer et al. 1995, 1997; Plaxco and Gross 2001).
| Theory |
|---|
|
|
|---|

Gclc, were represented as a combination of contributions of residues that are identical in the wild-type and mutant proteins (
Gident) and those that have been replaced (
Grepl):
![]() | ((1)) |
The contribution of identical residues was given by
![]() | ((2)) |
i is the ASP for the corresponding type of atom (Juffer et al. 1995); (eijmut - eijwt) is the difference between interaction energies of atoms i and j (i
j) in the proteins, and (Eltors,mut - Eltors,wt) is the difference of torsion potentials for angle l.
The contribution of replaced residues 
Grepl was calculated using a model that was designed for substitutions in the regular secondary structures. The model was based on a thermodynamic cycle, in which an
-helix or ß-sheet in aqueous solution serves as a reference state (Fig. 1
). For replacement of N
-helical or ß-sheet residues (X1
Y1, X2
Y2, . . . XN
YN),
![]() | ((3)) |
Gmuttransition and
Gwttransition are free-energy changes, for the sets of residues (X) and (Y), during conformational transition from the reference to the folded state (Fig. 1
Gprop,mut and 
Gprop,wt are experimental secondary structure propensities, that is, free-energy changes associated with replacement of the "host" Ala residue by (X) and (Y) residues in the model
-helix or ß-sheet (Chakrabartty and Baldwin 1995).
|
Gwttransition, Fig. 1
![]() | ((4)) |
-helix or ß-sheet to the protein interior; the second term is the difference of interaction energies of side-chain atoms in the protein structure and the reference state; the third term is the conformational entropy contribution that originates from restraining
angles in the protein structure relative to the model
-helix or ß-sheet; and the last term is change of torsion energy during the reference to folded state transition. All substituted side chains of selected mutants had only one allowed conformation in the protein structure, that is, Skprotein = 0. The transition energy of mutant protein (
Gmuttransition, Fig. 1
Thus, theoretical 
G values were calculated as the sum of the following contributions: (1) the enthalpic term describing changes of vdW interaction, H-bond and torsion energies of the protein caused by substitutions of the amino acid residues, (2) conformational entropies of the replaced side chains, (3) transfer energies of protein atoms from water to the protein interior, and (4) the secondary structure propensities (the last two terms include both enthalpic and entropic components). It is important that free energies in Equations 2 and 4![]()
represented a difference between two conformational states of the same chemical structure, because comparisons of dissimilar molecules involve serious problems (Mark and van Gunsteren 1994; Boresch and Karplus 1995; Brady and Sharp 1995).
The described approach is not very different from other models applied for interpretation of protein-engineering data. However, none of these models use explicit vdW interaction and torsion potentials, and they all treat the reference state differently. The most common approximation simply does not consider any reference or unfolded protein states and implies that 
G values arise from the differences of surface areas, cavity volumes, or other structural parameters of the wild-type and mutant folded structures (Shortle 1992, 1996; Matthews 1993). In terms of our model, this means elimination of 
ereferij term from Equation 4
and omitting conformational entropy, torsion potentials, and secondary structure propensities. Apparently, these and other neglected factors do not hinder determination of specific average energies, for example, for buried H-bonds in proteins (Myers and Pace 1996), but they make all the determined energetic parameters highly context dependent. An alternative model uses coil as a reference for calculation of buried surfaces and side-chain conformational entropies (Funahashi et al. 2001); however, it combines the parameters of coil with
-helix and ß-sheet propensities, which is a mixture of different reference states.
Our model is also similar to molecular mechanics calculations, with the solvation and conformational entropy terms included (e.g., Morris et al. 1998; Koehl and Levitt 1999; de la Paz et al. 2001). However, these studies use potentials that were originally developed for conformational analysis of organic molecules in vacuum and therefore describe a different energy (see Discussion). Moreover, even the functional form of the standard potentials needed to be modified for our purpose. The following deviations from molecular mechanics were found to be important to reproduce the experimental 
G values.
(1) All interatomic repulsions were omitted, because the experimental atomic coordinates were determined with a precision of
0.20.3 Å, which would produce significant errors in the calculated energies at short interatomic distances. The elimination of repulsions worked well after excluding a few mutants with artificially introduced strong sterical clashes (Liu et al. 2000) and when applied in combination with approximations (5) and (7) below. The 612 and 1012 potentials were modified as follows:
![]() | ((5)) |
![]() |
![]() |
![]() |
(2) Explicit Coulomb electrostatics were omitted, which is a possible approximation for uncharged groups (Dunitz and Gavezzotti 1999). Therefore, all electrostatic effects were implicitly included in the adjustable vdW and H-bond potentials.
(3) Hydrogen atoms were not included, as usual, in analysis of packing and accessible surfaces in proteins (Tsai et al. 1999). This was required to reduce the number of different atom types and the corresponding adjustable parameters.
(4) H-bonds were identified on the basis of the types of participating atoms and on the angles in A-B . . . C-D systems, where A, B, C, and D are nonhydrogen atoms: At least one of A-B . . . C or B . . . C-D angles had to be >90° in H-bond (otherwise the polar B and C atoms were considered as forming vdW interaction). The spatially closest H-bond partners of polar atoms in each replaced residue were identified automatically, assuming only one acceptor for each NH or OH group and two possible donors for each oxygen, in accordance with statistics of H-bonds in proteins (McDonald and Thornton 1994). This is a departure from molecular mechanics, in which each donor, for example NH group, would form H-bonds with all surrounding acceptor atoms (e.g., all oxygens in a radius of 8 Å), even though it actually could participate in only one H-bond.
(5) All backbonebackbone interactions within
-helices and ß-sheets were excluded from Equations 2 and 4![]()
, because these interactions were assumed to be the same in wild-type and mutant proteins.
(6) Equation 4
includes energies of interactions between each replaced side chain and backbone of its
-helix or ß-sheet in aqueous solution (eijrefer). This term is difficult to calculate precisely because it depends on conformational averaging, which is more significant for some residues than others. Moreover, the dynamic averaging can significantly weaken H-bonds, because they are geometrically allowed only in a very narrow range of side-chain torsion angles, whereas vdW contacts are much less specific. It has been empirically found that two approximations, when applied together, provide a good fit of 
G values: (a) All H-bonds between the side chain of the replaced residue (typically Ser or Thr) and the backbone of
-helix or ß-sheet containing this residue are absent in the reference secondary structures but appear when the residue is buried in the protein; and (b) the total energy of vdW interactions between the side chain of the replaced residue and the backbone of its own regular secondary structure does not change during the reference to folded state transition (Fig. 1
), so the (eijprotein - eijrefer) difference in Equation 4
is zero. Therefore, the eijrefer sum was omitted, and the corresponding side-chain-backbone vdW interactions (but not H-bonds) were simultaneously excluded from eijprotein term.
(7) Summation over a large set of interactions for atoms with imprecise coordinates causes accumulation of errors. Therefore, only interactions of the replaced or strongly shifted residues with each other and with the surrounding atoms, excluding flexible side chains, were taken into account as described in Materials and Methods.
(8) Three buried water molecules that are present in almost all crystal structures of T4 lysozyme and its mutants were included as a part of the protein core, that is, they contributed to the sums of eij. The entropic cost of bound solvent (Dunitz 1994) was not included, because these water molecules were present in both the wild-type and mutant structures. Three similar solvent molecules in human lysozyme structure were treated in the same way.
(9) Side-chain torsion potentials were applied using an adjustable energy barrier that was the same for all "aliphatic" C-C and C-S bonds (Momany et al. 1975). The potential was "softened" to account for the imprecisely determined atomic coordinates (see Materials and Methods). Torsion energy in the model
-helix or ß-sheet (Eltors,refer in Equation 4
), originating from dynamic averaging of
angles in the reference structures, was neglected.
| Results |
|---|
|
|
|---|
-helix and ß-sheet (Table 1
-helix and ß-sheet, as described in Materials and Methods. The positions in the middle and C-turns of
-helices and in the central and edge ß-strands of ß-sheets were considered as separate reference states, because secondary structure propensities and ASA in these positions are different (Table 1
|
-helix and ß-sheet (Table 1
-helix and coil, respectively (Lee et al. 1994; Creamer 2000; <0.3 kcal/mole differences for individual residues). Thus, the extended polypeptide chain probably puts very similar limitations on the conformational freedom of side chains in ß-sheet and coil. The entropies of long linear side chains (Glu, Gln, Met, Lys, and Arg) were nearly identical in
-helix and ß-sheet (Table 1
1 = +60 are generally forbidden in
-helices but allowed in ß-sheets. This happens because of a compensatory effect:
2 conformers of the long side chains are less restricted in
-helices than in ß-sheets.
Adjustable interaction energies and solvation parameters
The adjustable energetic parameters for the protein interior were determined using 
G values and crystal structures of 106 mutants of four proteins (T4, human and chicken lysozymes, and ribonucleases HI) with substitutions of buried, uncharged residues in
-helices and ß-sheets, including all appropriate multiple replacements. Several small-to-large substitutions with significant sterical clashes (e.g., T152I, A98V, A98L, A98M, A129F, A129W, and A42V in T4 lysozyme and A32L and A96M in human lysozyme) were excluded from the set, as well as almost all cases in which bound water molecules spatially substitute for the replaced side chains. All the applied 
G values were measured in thermal unfolding experiments. The list of mutants can be found in Materials and Methods and Supplementary materials; it includes 77, 24, 2, and 3 replacements for T4, human and chicken lysozymes, and ribonuclease HI, respectively.
Eighteen adjustable parameters of the model represented five atomic solvation constants, 12 depths of interatomic potentials (nine types of vdW interactions and three types of H-bonds), and a torsion potential barrier for rotation around "aliphatic" C-C and C-S bonds (Table 2
). These parameters were determined by a least squares fit as described in Materials and Methods. The relatively small set of adjustable parameters was achieved by excluding hydrogens and considering only five types of atoms: aliphatic carbon, carbon of aromatic and carbonyl groups, oxygen, nitrogen, and sulfur. In addition, all interactions between similar atoms, whose energies could not be reliably distinguished, were combined together. The unification of interactions involving N and O atoms can be justified by their similar polarizabilities: 0.85, 0.78, and 0.74 Å3 for amine nitrogen, hydroxyl oxygen, and carbonyl oxygen, respectively, whereas polarizabilities of carbon and sulfur are in the 1.12 to 3.06 Å3 range (Miller 1990). The parameters obtained were determined by two factors: (1) appearance or disappearance of atoms in the mutants (e.g., Ala-Ser replacement yields a number of new O
interactions) and (2) conformational relaxation in the mutants that changed distances and accessible surfaces even for atoms that were identical in the wild-type and mutant structures.
|
angles. The obtained depths of H-bond potentials (1.51.8 kcal/mole) were consistent with other experimental estimates (see Discussion).
The determined solvation parameters (
in Table 2
) represent transfer energies per unit area of different atom types from water to the protein interior. In general, they do not correspond closely to any previously published scale describing transfer from water to organic solvents (Eisenberg and McLachlan 1986; Juffer et al. 1995; Vajda et al. 1995; Efremov et al. 1999; Wang et al. 2001). Only the parameter for aliphatics (19 cal/moleÅ2) agrees with the value generally suggested for nonpolar groups in proteins, 2030 cal/moleÅ2 (Richards 1977).
Different adjustable parameters were unequally represented in the set of mutants. The effective numbers of vdW interactions in the system of linear equations (Cl in Equation 9
) varied from 1600 for aliphaticaromatic to 132 for sulfurpolar pairs, whereas the two least represented categories were polarpolar (53) and sulfursulfur (10) interactions. Because of the relatively small number of sulfursulfur contacts, which included significant noise, the S . . . S energy was undefined. Unlike other adjustable parameters this energy significantly drifted (-0.30.4 kcal/mole) on removal or addition of a few mutants to the set.
Performance of the model
The calculated and observed 
G values were in very good agreement (Fig. 2
): Their overall root mean square deviation (r.m.s.d.) was 0.41 kcal/mole, which is not much higher than the errors in measurement of free-energy changes in the thermal unfolding experiments (
0.2 kcal/mole). The fit was significantly better than in any other theoretical methods that predict mutational effects in proteins (Miyazawa and Jernigan 1994; Gilis and Rooman 1996, 1997; Topham et al. 1997; Reddy et al. 1998; Carter et al. 2001) For example, the most recent study of Funahashi et al. (2001) yields an r.m.s.d. of 1.8 kcal/mole for 110 mutants of T4 and human lysozymes. The better precision in our case can be explained by two reasons. First, the mutants were selected using very restrictive criteria. Indeed, the deviations for water-accessible residues would be higher as discussed below. Second, we used a more elaborate and accurate model in which all included energy terms and the approximations described in Theory and Materials and Methods were essential.
|
and
+ ß, subdomains), and ribonuclease HI (
+ ß protein). Moreover, we conducted an additional test for 10 barnase mutants that were not included in the main set because their thermodynamic stabilities were measured by chemical denaturaton. The r.m.s.d. of 
G values for 10 barnase mutants was higher (0.57 kcal/mole), possibly because of less precise measurement of the mutant stabilities. (The crystal structures of barnase and its mutants [L14A, I51V, I76A, I76V, I88A, I88V, L89V, S91A, I96A, and I96V] were from 1a2p, 1brh, 1bsa, 1bri, 1bsb, 1brj, 1bsc, 1bse, 1ban, 1brk, and 1bsd Protein Data Bank (PDB) files, respectively, and 
G50%urea values were from the work of Serrano et al. 1992.) Thus, all interaction and solvation parameters obtained here probably reflect properties of an average protein interior rather than a specific protein site. The agreement was also nearly identical for replacements in
-helices (91 mutant) and ß-sheets (15 mutants) and in different positions within the secondary structures, including N-turn, middle, and C-turn of
-helices, and the central and edge ß-strands (Fig. 2
and ß propensities seem to be sufficiently general and precise.
Also, the standard deviations for the individual parameters of our model (e.g., ±0.15 kcal/mole for H-bond energies, Table 2
) were significantly lower than could be achieved by a simple averaging of the experimental energies in large sets of mutants (
±1 kcal/mole for H-bonds, Myers and Pace 1996). The relatively low standard deviations and the high precision of 
G calculations (0.4 kcal/mole) indicate that a majority of "context-dependence" factors were taken into account in the model, and that the determined parameter set (Table 2
) may be widely applicable.
Comparison of ASPs for the protein interior, octanol, and cyclohexane
It has been often suggested that protein core can be approximated by nonpolar solvents (Baldwin 1986; Juffer et al. 1995; Vajda et al. 1995). To examine this issue, we compared ASPs for the protein interior, gas phase, wet octanol, and cyclohexane (first four columns in Table 3
). The "polarity" of different atom types, that is, the rank order of their transfer energies from water to the different media, was identical for protein and organic solvents: Cali < Caro < S < N < O. However, the absolute values and even the signs of ASP were strongly environment dependent. For example, the ASPs of sulfur, nitrogen, and oxygen for the protein interior,
W
P, were intermediate between those for octanol and cyclohexane (
W
C, and
W
O in Table 3
). On the other hand, the ASPs of aliphatic groups were indeed identical for the protein interior and cyclohexane.
|
V
P, are intermediate between the affinities to water and cyclohexane (-
W
V and
V
C, respectively;
W
V must be considered with a minus sign because it describes transfer in the opposite direction), whereas for sulfur,
V
P
-
W
V, and for aliphatic groups,
V
P
V
C (Table 3
Reduced vdW interactions between protein atoms across water
All parameters in Table 2
describe energetics of the protein core. However, interactions in water could be different. To check this possibility, we calculated the thermodynamic stabilities for an additional set of mutants with replacements of partially water-exposed, inflexible residues using parameters for the protein interior (Table 4
). The discrepancies obtained were larger than for buried residues and reflected two different cases.
|

Gcalc - 
Gexp deviations are negative). Some of the newly appearing cavities were wide open to the solvent (F104A and Q105G in T4 lysozyme and Y23A and F45A in BPTI), whereas the others were small and included bound water molecules spatially substituting for the replaced side chain (S117A and V149 mutants). A significant additional destabilization in this case arises from reduced vdW interactions across the water-filled pockets. This destabilizing effect was present even for small cavities containing single water molecules with low B-values, for example, in T4 lysozyme mutants with substituted V149 (Table 4
Gcalc = 
Gexp), which probably is a common situation for mutants with artificially incorporated solvent (Xu et al. 2001).
The cavity-forming replacements (Case I) change media in the mutation site, which affects interactions of surrounding atoms. However, the situation is different for the residues situated at the surface in both the wild-type and mutant proteins (Case II in Table 4
). In this case, all discrepancies arise from overestimated interactions at the waterprotein interface: Their energy was calculated with parameters for the protein core, whereas the actual interactions are weaker. As a result, any large-to-small replacements, in which the mutant loses vdW interactions and H-bonds, were predicted as strongly destabilizing, whereas the actual destabilization was smaller (Case IIa in Table 4
, all 
Gcalc - 
Gexp deviations are positive). The deviation for a small-to-large replacement (Case IIb) was obviously of opposite sign.
Apparently, vdW attractions of protein atoms became weaker when the atoms were separated by water, which is in agreement with many published theoretical and experimental studies (McLachlan 1963; Wood and Thompson 1990; Leckband and Israelachvili 2001). For example, it was shown that vdW interaction between nonconducting solids and liquids across water are an order of magnitude smaller than that across vacuum (Israelachvili 1992; Leckband and Israelachvili 2001).
| Discussion |
|---|
|
|
|---|

ASA and (2) interactions between spatially fixed atoms, eij (Equations 2 and 4
G values are considered as arising from two corresponding processes: (1) transfer of protein atoms from water to the liquid-like or uniform medium with specific solvation properties, which is similar to formation of a liquid nonpolar droplet in the aqueous solution (Baldwin 1986, 1989), and (2) the liquid to solid transition, when all protein atoms adopt certain spatial positions, which gives rise to the distance-dependent interactions, torsion energies, and the decrease of conformational entropy of the system (eij + Etorsion - T
S in Equations 2 and 4Both transfer and freezing processes may actually occur during protein folding and therefore reflected in the experimental unfolding free energies. The transfer process may be related to formation of the molten globule state whose nonpolar residues are buried from water but can move more or less freely, whereas the freezing represents first-order transition from the molten globule to the native structure (Shaknovich and Finkelstein 1989). The cooperative freezing transition must be driven by the increasing packing density of atoms, for example, from 0.44 (the fraction of space occupied by atoms in liquid cyclohexane) to 0.75 in the folded protein (Pace 2001). Indeed, the molten globule states of many proteins have a loosely packed hydrophobic core and gyration radii of 10% to 30% larger than that of the native structure (Arai and Kuwajima 2000).
It is also important to realize that all interatomic interactions occur in a condensed protein medium, not in vacuum, and therefore are expected to follow the "like dissolves like" rule (Israelachvili 1992). Indeed the following trends are obvious. First, the energies of interactions between same atom types increase with "polarities" of the participating atoms (Cali . . . Cali < Caro . . . Caro < N/O . . . N/O), that is, exactly in the same order as transfer energies of aliphatic, aromatic, sulfur, and polar groups from water to the protein interior, vacuum, or cyclohexane (Tables 2 and 3![]()
). Second, the interactions of different atom types are usually weaker than interactions of same atoms. For example, aliphaticaromatic energies are smaller than aromaticaromatic and aliphaticaliphatic ones (Table 2
). Third, the weakest interactions are observed between atoms with the most dissimilar polarities, whereas sulfur, with an intermediate polarity, interacts equally well with all other atoms. This is consistent with formation of "polarity gradients" in proteins, in which sulfur and aromatic groups often separate polar and aliphatic clusters (Pogozheva et al. 1997; Lomize et al. 1999b).
The determined interaction energies also correlate with atomic polarizabilities, which are directly related to the strength of vdW forces. The polarizabilities of aliphatic carbon, aromatic carbon, and sulfur are 1.12, 1.37, and 3.06 Å3, respectively (Miller 1990). Therefore, aromaticaromatic interactions are stronger than aliphaticaliphatic ones (a comparison for same atom types), whereas the alphaticsulfur interactions are stronger than aliphaticaromatic ones. However, this trend is violated for polar atoms that have the lowest polarizabilities (0.78 and 0.85 Å3 for hydroxyl oxygen and amine nitrogen, respectively; Miller 1990). Although the polarcarbon interaction is indeed the weakest, as could be expected from the corresponding polarizabilities, the polarpolar energy is the highest (Table 2
). This situation can be explained by the hidden electrostatic attractions of NH, OH, and C = O dipoles. The energies of such interactions depend on the mutual orientation of two dipoles. However, after averaging over different rotational orientations, these interactions are attractive and proportional to r-6 (Israelachvili 1992).
The affinities of atoms to the protein interior (vacuum-protein transfer energies,
V
P in Table 3
) follow the same general trend as interactions of same atom types, that is, Cali < Caro < S < N
O. The rank order for aliphatics, aromatics, and sulfur correlates with their polarizabilities and therefore can be explained by the increasing dispersion attraction forces. The affinity of polar groups is much higher, possibly because of the electrostatic self-energy of NH, OH, and C = O dipoles in the protein environment, whose effective dielectric constant may be relatively high (412; Warshell and Papazyan 1998; Dwyer et al. 2000) as a result of the presence of polar polypeptide backbone, polar side chains, and bound water.
Comparison with independent experimental estimates
The energies obtained for H-bonds (-1.5 -1.8 kcal/mole) were consistent with results of protein-engineering studies, which range from -1 to -2 kcal/mole (Fersht and Serrano 1993; Matthews 1993; Thorson et al. 1995; Myers and Pace 1996; Funahashi et al. 1999; Takano et al. 1999c). They are also in agreement with enthalpy of N-H . . . O = C H-bond in
-helix (-1.3 kcal/mole; Scholtz et al. 1991) and with the contribution of an H-bond to fusion enthalpy of N-acetyl amines of amino acids (-1.3 -1.6 kcal/mole; Graziano et al. 1996). The determined torsion potential barrier (3.0 kcal/mole) was close to 2.7 kcal/mole in ECEPP/2, a value that was derived from spectroscopic studies of organic molecules (Momany et al. 1975).
The determined vdW interaction energies indicate a very strong tendency to formation of aliphatic, aromatic, or polar clusters in proteins, whereas sulfur can serve as an "adhesive" between the polar and nonpolar groups, because it interacts favorably with both. This is in agreement with observations of significant aromaticaromatic and sulfuraromatic interactions in peptides and proteins (Fersht and Serrano 1993; Viguera and Serrano 1995). Moreover, the destabilization energy on removal of buried aliphatic side chains correlates with packing densities of surrounding aliphatic groups rather than with densities of any atoms (Serrano et al. 1992; Jackson et al. 1993; Fersht 1999).
The energies of vdW interactions obtained here are easier to compare with the independent estimates for aliphatics, because there are significantly less data for other groups. Moreover, most of the published estimates reflect a combination of vdW and transfer energies. To make a relevant comparison, we excluded all the explicit vdW and H-bond potentials from calculations, exactly as in other studies. Such a simplified approach produces a worse fit with the experimental free energies (r.m.s.d. of 
G increases from 0.41 to 0.83 kcal/mole) and yields a different set of "solvation parameters" that describe transfer from water to the solid protein interior (
W
Psolid in Table 5
). The obtained
solid parameter for aliphatic groups, 41 cal/moleÅ2, is in agreement with results of other studies that did not use the explicit vdW potentials (42 cal/moleÅ2, Funahashi et al. 1999; >50 cal/moleÅ2, Kellis et al. 1989; or 55 cal/moleÅ2, Jackson et al. 1993); however, it is two times greater than the actual transfer energy of aliphatic groups from water to the protein interior (
W
P = 19 kcal/moleÅ2, Table 5
), because it includes implicitly the additional vdW interactions in proteins. Thus,
50% of the stabilizing energy for aliphatics originates from vdW interactions, whereas the remainder comes from the transfer energy, that is, hydrophobic effect. This share correlates very closely with fusion enthalpy of alkanes, which is
0.59 kcal/mole per CH2 group (Nicholls et al. 1991), that is, also
50% of the average contribution of a buried CH2 group to the protein stability (
1.27 kcal/mole, Pace 1992). Therefore, the vdW interactions under discussion may indeed represent melting enthalpy of the protein core. This enthalpy for aliphatic groups, (
W
Psolid -
W
P) = 22 cal/moleÅ2, is close to the energy losses induced by formation of water-inaccessible cavities in proteins, which is 20 cal/mole per Å2 of the nonpolar (aliphatic) cavity surface created (Eriksson et al. 1992).
|
W
Psolid -
W
P), is even higher for aromatic groups and sulfur:
80% and
100% of
W
Psolid, respectively (Table 5
W
Psolid -
W
P 44 and 100 cal/moleÅ2). Hence vdW interactions and H-bonds represent a crucial contribution to the protein stability (Eriksson et al. 1992; Serrano et al. 1992; Makhatadze and Privalov 1995; Ratnaparkhi and Varadarajan 2000; Pace 2001), whereas the hydrophobic effect provides 50% of stabilizing energy for aliphatic groups, 20% for aromatics, and nothing for sulfur and polar groups. This situation does not contradict the large positive changes in heat capacity of protein unfolding, which is usually attributed to the hydrophobic effect, because the large heat capacity changes are characteristic for any melting transitions, especially in the presence of hydrogen-bonding networks (Cooper 2000).
The determined
W
Psolid "solvation parameters" indicate that burial of polar groups in proteins is energetically favorable (
30 cal/moleÅ2), because their vdW interactions and H-bonds in the protein core outweigh the dynamically averaged hydration by liquid water (Pace 2001). However, this is true only for the native protein structures with saturated H-bonding potential (McDonald and Thornton 1994). The
W
Psolid parameters are inappropriate for protein-modeling studies, because all buried polar groups would always be energetically favorable, even when they do not form any H-bonds in incorrect protein models.
Comparison with molecular mechanics potentials
The interatomic energies determined here (Table 2
) are generally smaller than in molecular mechanics force fields that describe interactions in vacuum. For example, the H-bond energy is only
-1.5 kcal/mole, which is smaller than -4 to -6 kcal/mole values that are generally accepted for H-bonds in vacuum and are applied in many computational studies (Momany et al. 1974; Hermans et al. 1984; Rose and Wolfenden 1993; Gavezzotti and Filippini 1994). The estimated aliphaticaliphatic energy (e0 = -0.06 kcal/mole, Table 2
) is also less than the depth of the corresponding CH2 . . . CH2 "united atom" potential (-0.14-0.11 kcal/mole, Lazaridis et al. 1995). Moreover, the depths of all interatomic potentials in Table 2
are smaller than in ECEPP/2, OPLS, and CFF force fields (Momany et al. 1974; Jorgensen et al. 1996; Ewig et al. 1999), except for the polarpolar interactions, which are probably reinforced by the hidden electrostatic attractions in our model. These discrepancies can be explained by two related reasons: (1) The protein interior is a condensed medium in which all forces of electrostatic origin, including the vdW interactions, are expected to be weaker than in vacuum (Israelachvili 1992) and (2) the energies determined here reflect melting enthalpy of the protein core rather than sublimation enthalpy of molecular crystals, which is greater and can be calculated as a sum of the corresponding eij potentials in molecular mechanics (Momany et al. 1974; Gavezzotti and Filippini 1997; Ewig et al. 1999).
It is important that the interactions in the protein core follow the "like dissolves like" pattern that has been predicted by the general theory of vdW forces in media (McLachlan, 1963, Israelachvili 1992). For example, the aliphaticpolar energy was determined as nearly zero, which means it does not exceed the average attraction of the participating aliphatic and polar groups to the protein interior, a contribution that is included in the waterprotein transfer energies of the aliphatic and polar atoms. However, the situation in vacuum is very different. Here, the molecular mechanics calculations produce a very significant energy of polarnonpolar interactions that is larger than energies of polarpolar and nonpolarnonpolar interactions combined (Lazaridis et al. 1995), because the number of polarnonpolar contacts is greater. The large nonpolarpolar energy originates from the Slater-Kirkwood equation or "combinatorial rules," which state that interaction energy of any pair of dissimilar (e.g., C and O) atoms is an intermediate value between the energies of the corresponding same type (i.e., O . . . O and C . . . C) interactions. However, this approximation was based on the original London theory of dispersion forces that can be applied only in vacuum (Israelachvili 1992).
The application of a universal (in vacuum) vdW parameter set in different media, during the molecular mechanics or dynamics simulations, seems to be a problematic approach, although this is not commonly admitted. The environment-dependence of vdW forces has been justified previously in theoretical and experimental studies of intermolecular interactions in media (McLachlan 1963; Wood and Thompson 1990; Israelachvili 1992; Leckband and Israelachvili 2001). Moreover, it has also been found that vdW energies, when calculated with ECEPP/2 or AMBER potentials, must be reduced several fold to reproduce experimental ligand binding constants (Morris et al. 1998) or stabilities of designed peptides (de la Paz et al. 2001). The reduced interactions across water may be also one of the reasons why surface residues contribute less to protein stability (Shortle 1992; Scholtz et al. 1993; Fersht and Serrano 1993).
| Conclusions |
|---|
|
|
|---|

G values. The proposed approach works successfully, judging from the low r.m.s.d. of observed and calculated free-energy changes for 106 mutants (0.4 kcal/mole), whereas the number of adjustable energetic parameters of the model was six times smaller than the number of experimental data. The derived interaction and solvation energies seem to be generally applicable for uncharged groups buried in the protein interior on the basis of the following criteria: (1) The model works equally well for a wide variety of microenvironments in four different proteins, (2) the parameters were successfully tested for 10 barnase mutants that were not used for the parameterization, and (3) the results obtained were perfectly consistent with independent experimental estimates of torsion potential barriers, energies of H-bonds, melting enthalpies of alkanes, contributions of aliphatic groups to the protein stability, and energy losses associated with formation of water-free cavities. At the same time, the proposed potentials overestimate vdW interactions across water-filled cavities.
The developed parameters and theoretical model are expected to be helpful for the improvement of modeling methods, including ligand binding, ab initio prediction of protein structure, fold recognition, and computational de novo design, which all should be based on optimization of free energy. However, a number of problems must first be addressed, including quantification of interactions at the waterprotein interface and contributions of charged groups, a better treatment of interatomic repulsions, implementation of coil as a reference state, and operating with
G rather than 
G values.
| Materials and methods |
|---|
|
|
|---|
-helices of T4 lysozyme, L99I, L99V, L99F, L99M, L99A/F153A, F153L, F153M, F153I, and F153V (Eriksson et al. 1993); I50A and F67A (Xu et al. 1998); M6A, I50M, L66M, I78M, I78A, L84M, L84A, V87A, V87M, L99A, I100A, I100M, M102A, V103A, V103M, F104M, M102A/M106A, M106A, V111A, V111M, L118A, L118M, L121A, L121M, A129M, and F153A (Gassner et al. 1999); L133A (Eriksson et al. 1992); L121A/A129L, L121A/A129M, A129L, and A129M/F153A (Baldwin et al. 1996); M106L and M120L (Lipscomb et al. 1998); L99F/M102L/F153L, L99F/M102L/V111I, L99F/V111I, and M102L (Hurley et al. 1992); L121A/A129M/V149I, L121A/A129M/F153L, L121M/L133V/F153L, L121A/A129V/L133A/F153L, L121A/A129V/L133M/F153L, L121I/A129L/L133M/F153W, and L121M/A129L/L133M/V149I/F153W (Baldwin et al. 1993); A98C (Liu et al. 2000); A42S, V75T, V87T, A98S, and A130S (Blaber et al. 1993); V149C and T152S (Dao-Pin et al. 1991); S117F (Anderson et al. 1993); M6I (Faber and Matthews 1990); N101A, V149I, V149C, T152A, T152S, and T152V (Xu et al. 2001); L99G (Wray et al. 1999); M120A (Blaber et al. 1995); and L46A (Matthews 1995b); human lysozyme, A9S, A92S, V93T, A96S, V99T, and V100T (Takano et al. 2001); V93A, V99A, an