|
|
||||||||
Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599-7260, USA
Reprint requests to: Jan Hermans, Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599-7260, USA; e-mail: hermans{at}med.unc.edu; fax: (919) 843-9244.
(RECEIVED June 20, 2003; FINAL REVISION August 29, 2003; ACCEPTED August 29, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03273303.
| Abstract |
|---|
|
|
|---|
3) correlates well with the Boltzmann factor of the torsion energy, exp(-ßE) of the model compounds C2H5C2H5 and C2H5SCH3. An exponential relation was again found between the relative occurrence of g+, g- and t conformations for C
Cß bonds in long side chains and the energy differences of rotamers of
-amino n-butyric acid, when dependence on backbone conformation was taken into account. The distribution of all 27 rotamers of methionine was correlated with the energy differences between the models rotamers, corrected for clashes with nearby residues, the correlation being good for a set with backbone in the ß-conformation, but less clear for backbone
-conformation. In all correlations, the value of the coefficient ß corresponds to a temperature of circa 300 K. These results can be interpreted with a model that considers the structure of a folded protein as resulting from packing imperfectly complementary parts, with a requirement of an overall low energy. Compromises are required to optimize the fit of nonbonded contacts with surrounding groups, and side chains assume conformations away from the energy minimum. An exponential distribution is a most probable distribution, and this can be established easily under conditions other than thermal equilibrium. Keywords: Torsion angle distribution; relation between distribution and energy; Boltzmann-type distribution; exponential distribution; methionine side chains; small molecules as models
| Introduction |
|---|
|
|
|---|
The folded structures are stable globally, but also locally the structures are close to optimal. Overall energy will be low if components of the structure have low-energy conformation; local deviations from minimum energy conformation, that may be thought of as structural imperfections, should be small. However, the structure of a folded protein contains a limited variety of structural elements that do not all fit perfectly, even with a highly optimized amino acid sequence, and packing imperfections appear unavoidable.
Structural elements that are sensitive to conformational details include internal coordinates (bond lengths, bond angles, and torsion angles), contact distances between nonbonded atoms, and hydrogen bond geometry, any of which might be chosen to assess the extent of imperfections. When the geometry of a structural element is not optimal, then the element is strained by exterior forces. Strained elements of different types are coupled: For example, if a side chain is strained to improve a nonbonded contact, then that nonbonded contact is strained in turn. Accordingly, by evaluating strain of one type, one may draw inferences about strain of other types. Here, strain is evaluated in terms of deviations of torsion angles from minimum-energy values.
The conformations of side chains in proteins of known structure, when characterized in terms of torsion angles, distribute into mostly well-separated clusters, called rotamers (Ponder and Richards 1987; Dunbrack and Karplus 1993), that correspond to expected low-energy conformations. The internal energies of the different rotamers are not the same; the conformer with lowest internal energy is preferred; conformers with higher internal energy do occur, but with lower frequency.
This study focuses on the frequency of deviations from minimum-energy conformations and on the relative population of different rotamers, and asks how these correlate with energies of representative model molecules.
We have relied on a recently developed and updated database of residues in high-resolution X-ray structures (Lovell et al. 2000, 2003). Elsewhere, we have shown that the inherent variation of single-bond torsion angles is found by considering only atoms with atomic B-factors below 20 Å2, which effectively eliminates the effects of statistical error and positional uncertainty (G. Butterfoss, J. Richardson, and J. Hermans, in prep.). The model molecules selected for calculations of the energy are small, so that structures can be carefully optimized with accurate energy functions based on high-level quantum mechanics.
To compare the statistics of the database with model energies, one typically assumes an exponential relation between the density of instances in the database, P and the energy, E
![]() | (1) |
where ß is a constant. The form of equation 1
is that of a Boltzmann distribution; its application to distributions in native protein structures was first proposed by Pohl (1971). A Boltzmann distribution applies in an ensemble at thermal equilibrium, in which case ß has its usual form
![]() | (2) |
where kB is Boltzmanns constant, and T the absolute temperature.
Pohl proposed an exponential relation of the form of equation 1
for side-chain torsion angles
1 of residues with aromatic side chains (Pohl 1971), and such relations have subsequently been proposed for distribution of ion pairs in proteins (Bryant and Lawrence 1991), for occurrence of residues in secondary structures (Chou and Fasman 1978; ONeil and DeGrado 1990; Chakrabartty et al. 1994), for buried-exposed distributions of side chains (Finkelstein et al. 1995), and for size distribution of cavities (Rashin et al. 1997). The present article benefits from a number of circumstances: the availability of a growing data base of high-resolution crystal structures, the elimination of effects of experimental error and positional uncertainty by use of atoms with low B-factors (G. Butterfoss, J. Richardson, and J. Hermans, in prep.), and the possibility of computing accurate energies of simple models that reproduce components of proteins in isolation. As a result, we have been able to establish instances in which the distributions closely follow the exponential of the energy of the isolated models. In one instance in which this is not true, the deviations can be reconciled by considering effects from parts of the protein outside the simple model.
Being apolar, methionine side chains are predominantly part of the cores of proteins, where they tend to be surrounded by nonpolar side chains, and where the structure is most closely defined. The methione side chain has two CC and one CS single bonds, and can assume 27 conformations of locally minimum energy. In isolation, each conformation has a preferred geometry, but when a residues side chain is fit into a particular protein structure, its conformation may change by internal rotation to achieve whatever balance between packing and torsional forces is required to minimize the energy. Analysis of statistics of lysine side chains is included for comparison.
Studies with high-level QM of models of structural features that occur in proteins are not uncommon. For example, these calculations include studies of diethyl disulfide, CH3CH2SSCH2CH3, as a model of the disulfide bridge (Qian and Krimm 1993; Görbitz 1994), models of hydrogen bonding by and to aromatic rings in proteins (Scheiner et al. 2002), a study of geometry of hydrogen bonding between NH and CO groups (Lipsitz et al. 2002), a study of preferred geometry for interactions between S and O atoms (Iwaoka et al. 2002), and calculations of the energy of the benzene dimer (Sinnokrot et al. 2002; Tsuzuki et al. 2002) that indicate as the most stable forms the T-shaped dimer, which is similar to a common mode of packing of phenylalanine side chains inside proteins (Burley and Petsko 1985), and the parallel-displaced dimer, which is uncommon in proteins. A recent study cataloged the low-energy geometries of the common amino acids computed at the Hartree-Fock level (Matta and Bader 2002).
In most of these papers, only a qualitative connection between database and energetics is sought. Sometimes this is understandable because the available databases were small and not restricted to well-defined residues; in other cases, the energy difference between alternative structures is large, and the frequency of the high-energy form, if it occurs at all in the database, cannot be determined with sufficient precision ("forbidden conformations"; Ramachandran and Sasisekharan 1968). In contrast, the work described below has made extensive use of a recently updated database of high-resolution protein structures, and has sought to establish form and extent of correlations between database and energetics including instances in which the energy differences are relatively small.
| Results |
|---|
|
|
|---|
3 of methionine and lysine
|
Tsuzuki et al. (1996) studied the effects of electron correlation methods on the energies of the EMS stationary point structures and found that, in general, the skew and eclipsed barrier heights tend to decrease with increasing consideration of electron correlation. This is consistent with a systematic study of butane indicating that electron correlation effects are more significant in the eclipsed and skew conformations than in the other geometries (Allinger et al. 1997). In agreement, here MP2 predicts lower barrier heights for both transitions than HF. SCCDFTB predicts much lower energy barriers than the ab initio calculations, a feature that seems rather systematic to SCCDFTB (G.L. Butterfoss, unpubl.).
The energy profile for internal rotation of butane has been computed by Allinger et al. (1997) at a very high level of theory with a very large basis set. The trans conformation is more stable than the gauche conformation by 0.62 kcal/mole; the barrier between gauche and trans is 3.31 kcal/mole, and that between the two gauche conformations, 5.51 kcal/mole. Very similar values are obtained at the MP2 level of theory with a smaller basis set, the most different being the value of 5.98 for the barrier between the gauche conformations.
For comparison with the distribution in the database, the MP2 energies for torsion about the CS bond in EMS and CC bond in butane have been converted to probability distributions according to equations 1
and 2
(for T = 300 K; Fig. 2
). (The distribution was generated as follows: A relative probability was calculated for each energy, E; the resulting curve was integrated using the trapezoid rule, and the probability was normalized by dividing each value by the integral.) These figures also contain plots of the distribution of
3 for, respectively, well-ordered methionine and well-ordered lysine residues in the database.
|
near ±120°), is low but not zero. On the other hand, the population of eclipsed conformers of methionine (
near 0°), is negligible, as is true for lysine near all three minima. Predicted peak widths are close to the experimental values for the gauche conformations of methionine and slightly narrower for the trans conformation of methionine and all conformations of lysine. Use of a value of T smaller than 300 K to compute the theoretical distributions in equation 2
For EMS, the trans conformation is somewhat underrepresented in the database, relative to the distribution based on the models energies. Also, there appear to be "deficits" in the database distribution for
near 50° and for
between 150° and 180°. Finally, the spike in the database distribution near 100° is due to the presence of the mmp rotamer, which has a mean value of
3 in this range, due to interactions with the protein backbone (Word et al. 1999). (Following Lovell et al. [2000], we designated rotamers of methionine with three-letter acronyms, the first letter representing torsion about the C
Cß bond, the second torsion about the CßC
bond, and the third torsion about the C
C
bond, with the letter m standing for gauche-, the letter t for trans, and the letter p for gauche+.) Clearly, the symmetry of the model is not perfectly reflected in the database. The asymmetry of the latter must be due to the asymmetry of attachment of the side chain to the backbone and the resulting asymmetry also of preferred regular secondary structure. Even though several steps removed from the backbone, the side-chain torsion angle,
3 of methionine is influenced by the asymmetry of its environment. No significant asymmetry is noticeable for the distribution of
3 of lysine, but this may be due also to the omission of side chains with gauche conformation at
2.
As determined from solvent-accessible surface area (Connolly 1983a, b), methionine and lysine side chains with low atomic B-factors are, on average, the least exposed. For any range of B-factors, the lysine side chains have greater exposed surface area: Considering only the Cß, C
, C
, or S
, and C
atoms, the average surface area is three times larger for lysine than for methionine for atoms with B-factors below 20 Å3. The solvent-accessible surface area of the lysine side chains with high B is slightly over twice that of the set with largest B, while for methionine the ratio is near 7. Clearly, as expected, methionine side chains are predominantly part of the protein cores, while this is rarely the case for a lysine side chain. On the other hand, those lysines with low B-factors are screened from solvent by protein side chains and backbone, although not often completely.
Distribution of
1 rotamers of Aba
When analyzing the distribution over different rotameric states, it is important to take into account a correlation between preferred side chain and backbone conformation that was first established by Dunbrack and coworkers (Dunbrack and Karplus 1993, 1994; Dunbrack and Cohen 1997), who were able to rationalize the dependence of rotamer preference on backbone conformation in terms of a qualitative conformational analysis (Dunbrack and Karplus 1994). Here, these preferences are reanalyzed in terms of the energies of the dipeptide of
-amino n-butyric acid, Ace-Aba-Nme, which has a single CC torsion in the side chain.
Contour plots of the conformational prevalence of each of the three rotamers are shown as Ramachandran plots (Fig. 3AC
). Figure 3D
shows which rotamer has highest density, and indicates the points selected for comparison of experimental data and energies. In most of the "allowed" regions, m is the most favorable conformation. The t conformation is the most favorable rotamer in the "ideal"
R region and in a roughly hourglass-shaped strip across the ß-sheet region with
< 150°. The p rotamer is most favorable in a small slice near the upper left-hand corner of the Ramachandran plot and in two other small patches.
|
(Met, Glu, Gln, Arg, Lys). Excellent correlations with slopes close to unity are obtained when ß is given by equation 2
|
![]() | (3) |
where y stands for -(1/ß) ln (Pi/Pj) and x for the energy difference, Ei - Ej.
Methionine dipeptide
We have attempted also to optimize the geometry of all 27 possible methionine rotamers with two different fixed backbone conformations:
-helix and ß-sheet, chosen to represent the approximate average of the
and
values in these sets in the database. Probably because of severe atomic overlaps, the optimization failed in the case of a very few conformers; these conformers are not represented in the corresponding database sets.
Figures 5
and 6
show plots of the populations of the rotamers in the data set as a function of the MP2 energies of the dipeptide structures; the data are tabulated in Table 1
. Each of these figures contains an exponential distribution in the computed energy, E of the form Aexp(-E/kBT), equation 1
, with A chosen to produce an acceptable fit, and T = 300 K.
|
|
|
= -120,
= 140] and [
= -130,
= 120] ß-sheet conformations, with
3 = 131.2 and 138.7, respectively. These two conformations have both been termed mmt/mmp, and their energies have been compared to the sum of the mmp and mmt fractions in the database.) Qualitatively, these instances can largely be explained by considering interactions that are expected within the context of the folded protein. The tpp rotamer is most favorable in the ß-sheet backbone conformation. In this rotamer, the sulfur sits in a pocket formed by H
and the carbonyl oxygen atoms of the (same) residue. With such a geometry, the sulfur atom, with its large van der Waals radius, interferes with hydrogen bonding of polar groups to the carbonyl oxygen (Fig. 7
|
Excluding these low-energy anomalies, the energies follow an exponential distribution rather well. A likely source of scatter is that rather strong nonbonded interactions of, especially, the sulfur atom with other atoms of the dipeptide favor compact conformations of the model, while inside a folded protein nonbonded contacts with other side chains will favor more extended conformations.
-Helix rotamers
In the case of methionine side chains in residues in
-helices, the regular structure of the surrounding
-helix precludes some side-chain conformations that would otherwise be allowed in the dipeptide. We accounted for such clashes by using the Reduce program (Word et al. 1999) to test for atomic overlaps in an (Ala)9-Met-(Ala)11
-helix with each of the energy-minimized methionine side-chain geometries. Several conformations that are apparently energetically favorable in the dipeptide produce clear clashes in the
-helical structure, and these rotamers are essentially absent in the experimental database. An exception is the tpm rotamer, which has rather low energy but forms no internal clashes in the model helix. In this conformation the side chain sits in a space between the Cß of following residues (i+3 and i+4; Fig. 7
). It is likely that any residues larger than alanine in these positions would cause a clash. Once the obviously clashing rotamers and the tpm rotamer have been excluded, the remaining conformers contain no anomalously low-energy rotamers; however, the correlation is less clear than for the set having the backbone in a ß-conformation.
| Discussion |
|---|
|
|
|---|
S
torsion angle of methionine and the C
C
torsion angle of lysine (
3) and the energetics of the respective models, EMS and butane (Fig. 2
1 and the energy difference between rotational isomers of the model, the Aba dipeptide (Ace-Aba-Nme; Fig. 4
The correlations are clear instances illustrating the relation between the conformation of parts of proteins and local energetics. Also, the results (Fig. 2
) establish the validity of the "Boltzmann hypothesis" (Shortle 2003), according to which imperfections in protein structure follow an exponential distribution in the energy, exp (-ßE), equation 1
with a factor ß, equation 2
for a temperature not very different from 300 K. The distribution is one over many different environments, that are individually well determined. The observation that the distribution over many instances approaches an exponential in the energy of the isolated part is evidence that, on average, the environment neither favors nor disfavors particular instances. In actual fact, the environment is not isotropic, because every instance of a given type of structural element in a protein is bonded in identical manner to the rest of the protein, and the constant part of the environment may have a strong influence on the elements conformational preferences.
It requires some effort to find structural elements whose distributions are not systematically skewed. The C
C
bond in lysine and the C
S
bond in methionine, both in linear side chains, are two bonds removed from the branch point at C
, and for that reason are good candidates for this study. On closer inspection, the agreement between distribution and energy is imperfect even for
3 of methionine, as the symmetry of the model C2H5SCH3 is nearly, but not exactly, reproduced in the distribution (Fig. 2
). For the C
Cß torsion angle, it was possible to reconcile distribution and energy by only considering also backbone geometry.
The sharp maxima in the distribution of
3 of lysine and the absence of instances in the database of values at the energy maxima for butane at ±120° are characteristic of most aliphatic side-chain CC bonds in proteins (Lovell et al. 2000) and the distribution of the C
C
torsion angle in lysine is, therefore, quite representative. Apparent exceptions in leucine and valine can convincingly be attributed to misinterpretation of the electron density with models having reversed geometry (Lovell et al. 2000), and, otherwise, only the few instances of the mp rotamer of isoleucine suggest a distribution that is spread over a somewhat broader range. The CS bond in methionine is special among side chains in allowing a small, but significant fraction of instances at intermediate values of
. Torsion angles for bonds connecting sp2 and sp3 carbon atoms have very different distributions. The CßC
bond in Asp and Asn and the C
C
bond in Glu and Gln distribute very broadly about a value of 0° (or 180°), while those for the CßC
bond in phenylalanine and other residues with aromatic and heterocyclic side chains distribute broadly about ±90°. Although this is qualitatively as expected from the energy profile of models (data not shown), the presence of long-range interactions between the polar side chain of the first set, and the proximity of the two
-atoms to the backbone of the second set have led us to exclude these from consideration at this stage of our study.
We briefly mention two effects that could bias the distribution of torsion angle deviations. In the first place, side chains located on the protein surface, because of an absence of packing restraints tending to force the conformation away from the energy minimum, will tend to have more narrow distributions. However, as we have shown above, side chains exposed to solvent have higher B-values, and these are not considered in obtaining the distributions of Figure 2
. In the second place, the (common) application of torsional restraints during crystallographic refinement procedures will tend to restrict the distributions to canonical values of ±60 and 180° and also to shift the mean of each of the peaks of these distributions towards its canonical value. Results presented elsewhere show that deviations of the mean values from canonical as large as 15° are reproduced by deviations from canonical values in conformations of models at the energy minima, which indicates that the effect of torsional restraints imposed during refinement is not large, at least not for well-defined atoms (G. Butterfoss, J. Richardson, and J. Hermans, in prep.).
The exponential form corresponds to a most probable distribution of the energy, and this is also the form of a Boltzmann distribution. However, the ensemble of folded proteins cannot be considered as constituting a thermal equilibrium ensemble. In an ensemble at thermal equilibrium, the Boltzmann distribution describes deviations from a mean equilibrium value, while every geometric parameter observed in any one folded protein is an equilibrium value (Bryant and Lawrence 1991; Thomas and Dill 1996; Finkelstein and Ptitsyn 2002). (An opposing view has been expressed as well [Sippl 1993].) Bürgi and Dunitz have made an analogous statement concerning geometric deviations in crystal structures of small molecules (Bürgi and Dunitz 1988), while Finkelstein et al. (1995; Finkelstein and Ptitsyn 2002) have presented one plausible rationalization of why such distributions follow an exponential dependence on the energy. A shortcoming of this particular model is that it does not permit an evaluation of the factor ß in equations 1
and 2
, required to fit exponentials to the observed distributions; the fact that this factor corresponds to a value of T of 300 K (which in any case is not established precisely by these results) is, at present, unexplained.
A satisfactory argument relating distributions and energy can be made on the basis of the premise that all observed folded proteins are in a conformation that is stable relative to the unfolded state, which means that the free energy of each is lower than that of the ensemble of all other conformations. The free energy of a stable folded protein structure must be below an upper bound, or else the protein remains unfolded; this requirement is met by combining a specific backbone fold with a particularly suited set of side chains. At each position some choice(s) of side chain and of rotameric state will give a lower overall free energy than others. The packing of imperfectly complementary side chains to produce the protein interior is accompanied by small adjustments of the parts that raise the internal energy of the side chain, but are globally favorable because they lower the nonbonded energy more. If the global energy cost of making all the necessary choices may not exceed a set limit, then relatively many solutions are possible that require many small adjustments, while solutions that require a few large and, consequently, many very small adjustments are rare. Qualitatively, small imperfections are more probable. An exponential distribution results if a set amount of energy is divided over an ensemble of many independent members; this is true of the Boltzmann distribution, for which the average energy determines the temperature. For the above-drawn picture of the ensemble of folded proteins, the total energy is not set, but has an upper bound. Nevertheless, an exponential distribution is a good approximation because the distribution is dominated by structures that just meet the upper bound (Taverna and Goldstein 2002). This model implies an assumption that the actually observed protein folds and their amino acid sequences are a representative sample of all possible stable globular proteins. Ramifications of the model are discussed in Finkelstein and Ptitsyn (2002).
For an exponential distribution according to equation 1
, the mean energy is of the order of 1/ß and with ß given by equation 2
and T = 300 K, of the order of 0.6 kcal/mole. (The exact value depends on the distribution of states of different energy.) Assuming that deviations from canonical values of the torsion angles about the single bonds in a protein are independent, the overall strain energy is simply this amount times the number of CC and CS single bonds in side chains that are subject to packing constraints. The estimate is of the order of many kcal/mole for even a small protein, more than the net stability as characterized by the free energy of unfolding. The strain energy stored in torsion of side chains is actually only one part of the story, because, presumably, nonbonded contacts provide the forces that hold the side chains in their particular conformations, and thus the nonbonded energy is also not as low as it could be if the contact distances were optimal.
The extent of the imperfections will vary from one side chain to the next. The internal energy of a methionine or lysine side chain can vary considerably due to different amounts of strain of torsion angles and different choice of rotamer, and this will constitute one source of uncertainty in, for example, the outcome of experiments in which individual residues are replaced with alanine.
Structural details that in isolation have much higher energy than the minimum are correspondingly rare. A relation between energy and frequency of occurrence, such as derived in this study, provides a rational approach to questioning the accuracy of uncommon structural details introduced as part of structure determination, and seeking alternatives of lower energy or detecting additional interactions that stabilize the rare high-energy detail. An example of the former is provided by two side-chain conformers of leucine, denoted tt* and mp*. Analysis of structural detail indicates that these are "imposter" conformations, erroneously assigned in the model-building stage in place of the actual tp and mp conformations (Lovell et al. 2000). In agreement with that conclusion, we have found the tt* and mp* conformers of leucine to have high calculated conformational energies and also to not cluster about minima of the energy, as is the case for other observed conformers (results not shown).
Methods used in protein design and sequence-based protein structure prediction frequently make use of statistical properties of known proteins as a basis for structure optimization. This is often cast in terms of optimization of a (free) energy function obtained by inverting equation 1
, expressed in terms of statistical properties, P of the ensemble of proteins of known structure (Sippl 1993). In these cases, minimization of -kBT ln P in fact amounts to maximizing a log likelihood representing the "resemblance" of the target structure with known protein structures, and the correspondence between the used free energy function and physical energetics (and the choice of value of the factor kBT) is unimportant. The situation is very different if a hybrid energy function is used, which consists in part of physical energy terms and in part of terms related to data base statistics, which must now be expressed in common units on a common scale. The results presented here provide a good basis for converting statistical distributions of side-chain conformation to corresponding energies that can be combined with other energy terms to produce a global energy function for structure optimization or protein design (Desjarlais and Handel 1995; Simons et al. 1999). (When combining such disparate energy terms, care must be taken to avoid including the same effect in more than one contribution.) Side-chain distributions for well-defined residues (atomic B-factors below 20; G. Butterfoss, J. Richardson, and J. Hermans, in prep.) in high-resolution protein structures (Lovell et al. 2000) can be converted to reliable potentials that may be useful in these applications.
| Materials and methods |
|---|
|
|
|---|
Hartree-Fock (HF)
This is an ab initio self-consistent field method. Each electron sees the other electrons as net wave functions without consideration of correlations. Each wave function is iteratively adjusted in terms of the other electrons wave functions (hence, the term "self-consistent"). The wave functions are linear combinations of a basis set of Gaussian functions, the complexity of which is indicated by "6-311+G(d,p)." This is interpreted as follows: Each atoms nonvalence electrons are represented in terms of a function ("contraction"), which itself is a linear combination of six Gaussians (6); valence electrons are represented in terms of a linear combination of a three-Gaussian and two one-Gaussian functions (311); a diffuse Gaussian is added for nonhydrogen atoms (+); to account for polarization, a d-orbital shaped term is added for nonhydrogens and a p-orbital shaped term is added for hydrogens (G[d,p]).
MP2
This method is similar to HF, but in addition includes electron correlation through second-order perturbation theory. It is the only method used here that includes long-range van der Waals attraction (dispersion) energy. This method also requires by far the longest computations. As the length of the calculations scales as the third or higher power of the number of electrons, it was possible to perform geometry optimizations with MP2 for the smaller models (butane and C2H5SCH3), while for the larger models (Ace-Met-Nme and Ace-Aba-Nme) geometry optimizations were done with HF, and the energy of the optimized models was then recalculated with MP2.
B3LYP
This method uses a hybrid exchange-correlation functional in density functional theory, which generally gives energies and geometries of similar accuracy as MP2, despite an absence of long-range dispersion terms, and of greater accuracy than HF with the indicated basis set, while calculation time is comparable to that for HF.
SCCDFTB
This is a very fast density-functional theory-based approximate method, and is among the highest level of quantum theory that can be reasonably applied to molecular dynamic simulations of small molecules at present (Elstner et al. 1998, 2000). We have here used this method to calibrate its results by comparison with the energies obtained with more accurate methods.
SCCDFTB has recently been used to model crambin in a 350-psec simulation and reproduced conformational details of the 0.83 Å resolution crystal structure better than similar simulations using popular molecular mechanics (MM) force fields (Liu et al. 2001). This method was also used in QM/MM simulations of short peptide helices (Cui et al. 2001) and of alanine and glycine dipeptides (Hu et al. 2003).
Model molecules
Four model structures were used: butane; ethyl methyl sulfide (EMS); the methionine dipeptide, Ace-Met-Nme; and the dipeptide of
-amino n-butyric acid, Ace-Aba-Nme. The EMS molecule is a model of the end of the methionine side chain, and its principal degree of freedom, torsion about the CS bond, corresponds to the torsion about the CS bond in methione (torsion angle
3), that is, rotation about the bond farthest removed from the polypeptide backbone (ignoring torsion angles whose value depends on hydrogen coordinates), while Ace-Aba-Nme is a model of the part of the side chain closest to the backbone. The butane molecule models the energy for torsion about single CC bonds, in the absence of substituents.
The backbone has two principal degrees of freedom. It is well known that dipeptides modeled in isolation assume preferred conformations with an internal hydrogen bond between the two peptide groups (Tobias and Brooks 1992). This C7eq conformation is not common in folded proteins, where the peptide groups generally make hydrogen bonds with peptides of residues farther removed along the chain, or with solvent molecules. To study the dipeptides, the backbone torsion angles
and
were kept fixed while the side-chain conformation was varied, and this was done with several different backbone conformations typical of residues in folded proteins, as described below.
Potential energy surface of butane and C2H5SCH3 (EMS)
We determined the minimum potential energy surface for varying the CCSC torsion angle of EMS at various levels of theory. The HF energies are reported as part of the optimization at the MP2 level, and accordingly are for HF/6-311+G(d,p), rather than HF/6-31G(d). For SCCDFTB, a harmonic restraint of the form
![]() | (4) |
with Kf equal to 600 kcal/(mole rad2) was applied to the torsion angle,
. The value of
o was changed in 5° increments. After each change of
o, the energy of the structure was optimized with seven cycles each of 20 steps of conjugate gradient minimization. After the minimization process, the value of
differed from that of
o by less than 1° in all cases. The SCCDFTB calculations used a self-consistent field tolerance of 2x10-9, and the gradient of the energy was estimated with a 0.0002 Å shift in the atomic coordinates.
For the higher level methods, the geometry was optimized at 10° intervals of the CCSC torsion angle, with the torsion angle held fixed. The defaults of the Gaussian program were used as convergence criteria for all optimizations.
Calculations for butane were done only at the MP2 level. The resulting energies at minima and maxima were very similar to those obtained by Allinger et al. (1997) with a very large basis set.
Rotameric states of methionine
Torsion about any one of the single C
Cß, CßC
, and C
S
bonds in methionine produces three local energy minima. Thus, there are 27 distinct side-chain rotamers to consider. For each torsion angle, the minima are near -60°, 180°, and 60°, designated as, respectively, gauche- (m), trans (t), and gauche+ (p) conformations, the single-letter abbreviations having been adopted by Lovell et al. (2000). The energies of the 27 rotamers of the methionine dipeptide (Ace-Met-Nme) were calculated at the B3LYP and MP2 levels of theory as described above. These calculations were done for two different fixed backbone conformations,
-helical (
= -60°,
= -40°), and ß sheet (
= -120°,
= 140°). Prior to optimization, initial coordinates of the dipeptide were prepared with the Sigma program using standard molecular mechanics geometry with "ideal" side-chain torsion angles of ±60° or 180°.
The results of calculations with SCCDFTB are not reported because we noted that currently available integral and spline tables for SCCDFTB produced an anomalously large attraction between the S and N atoms of the methionine dipeptide.
Rotameric states of Aba
The dipeptide of
-amino-butyric acid (Ace-Aba-Nme) was selected as the model structure for computing the energies of rotameric states about
1. At each backbone conformation (
,
) the energy of the model structure was minimized in each of three side-chain conformations (m, p, and t) with fixed values of
and
at the HF level of theory, and the energy of the minimized structure was then calculated at the MP2 level (see above).
This calculation was done for backbone conformations for which at least two rotamer density surfaces were highly populated in the database; for a given backbone conformation the energies were computed only given adequate densities of at least two rotameric states, as indicated by the red areas in the first three panels of Figure 3
. The selected backbone conformations are indicated in the fourth panel of this figure. In several regions in which m and t are common, the p density surface is below the cutoff: The p conformation is uncommon in the area between the
-helical and ß-sheet regions, and is sterically excluded in the majority of both right- and left-handed
-helical structures. Energy differences have been computed for (
,
) = (-160,170), (-120,140), (-145,150), (-130,120), (-100,155), (-60,135), (-80,145), (-130,75)*, (-90,100)*, (-100,0)*, (-60,-40)*, and (55,45)*; however, energy differences for p versus m or t have not been computed for the conformations marked with *.
Database of well-ordered residues in high-resolution structures
The database of Lovell et al. (2000, 2003) was used to extract statistics of side-chain conformation in folded proteins. This is based on 500 protein structures of 1.8 Å or better resolution; sets of well-ordered residues are extracted by restricting the atoms B-factors to below 20 Å2 (G. Butterfoss, J. Richardson, and J. Hermans, in prep.).
Methionine is a fairly uncommon residue; thanks to a rapidly growing number of high-resolution crystal structures, the number of methionine residues in the database used here is 2145, of which 998 remain when only those containing atoms with B-factors below 20 Å2 are included.
Two sets of methionine residues with specific backbone structure were selected from the entire database. A set of 680
-helix rotamers was obtained by selecting all residues listed as having an "
-helix" or "
-helix ext" secondary structure as assigned by the DSSP program (Kabsch and Sander 1983) and for which -80 >
> -40 and -60 >
> -20. A set of 562 ß-sheet rotamers was obtained by selecting all residues that had backbone conformations within 40° of (
,
) = (-120,140). (DSSP secondary structure assignments were not considered in this selection).
The database was used to extract a set containing lysine residues with atomic B-factors below 20 Å2 and of these a subset containing those for which the conformation about the CßC
and C
C
bonds is trans (522 residues).
The database was also used to extract a set containing all well-defined Met, Glu, Gln, Arg, and Lys (23,620 residues).
Surface exposure of methionine and lysine
Surface exposure of individual side chains was determined from the size of solvent-accessible surface area determined with the MS program (Connolly 1983a,b). A small probe of radius 0.2 Å was used, and contact points for each of the four atoms defining the torsion angle (Cß, C
, S
, C
for methionine and Cß, C
, C
, C
for lysine) were counted and the numbers added. This was done for all residues in the database; the results were sorted according to atomic B-factors, and averages were calculated for successive ranges of B.
Density surfaces for
1
The experimental data set consisted of the side-chain conformations of amino acids having methylene ß and
carbons (Met, Glu, Gln, Arg, Lys) in the library of high-resolution protein structures (Lovell et al. 2003). Each individual rotamer was assigned to one of three subsets, depending on its value of
1, and this produced three sets of, respectively, 13,971 m conformations, 2009 p conformations, and 7640 t conformations. For each of these subsets, a smooth density surface over Ramachandran space was calculated with a previously described density-dependent smoothing algorithm (Lovell et al. 2003). Briefly, an initial density surface is generated as a sum of identical locally restricted cosine "masks" centered on each conformation in the set,
![]() |
with
a normalization factor, and in a second round of iteration the width of each mask is adjusted, depending on the local density calculated in the first round. The final density surface was calculated on a 5 x 5° grid.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Bryant, S.H. and Lawrence, C.E. 1991. The frequency of ion-pair substructures in proteins is quantitatively related to electrostatic potential: A statistical model for nonbonded interactions. Proteins 9: 108119.[CrossRef][Medline]
Bürgi, H.B. and Dunitz, J.D. 1988. Can statistical analysis of structural parameters from different crystal environments lead to quantitative energy relationships? Acta Crystallogr. B 44: 445448.[CrossRef]
Burley, S.K. and Petsko, G.A. 1985. Aromaticaromatic interaction: A mechanism of protein structure stabilization. Science 229: 2328.
Chakrabartty, A., Kortemme, T., and Baldwin, R.L. 1994. Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Sci. 3: 843852.[Abstract]
Chou, P.Y. and Fasman, G.D. 1978. Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. 47: 45148.
Connolly, M.L. 1983a. Analytical molecular surface calculation. J. Appl. Crystallogr. 16: 548558.[CrossRef]
. 1983b. Solvent-accessible surfaces of proteins and nucleic acids. Science 221: 709713.
Cui, Q., Elstner, M., Kaxiras, E., Frauenheim, T., and Karplus, M. 2001. A QM/MM implementation of the self-consistent-charge density functional tight binding (SCC-DFTB) method. J. Phys. Chem. B 105: 569585.[CrossRef]
Desjarlais, J.R. and Handel, T.M. 1995. De novo design of the hydrophobic cores of proteins. Protein Sci. 4: 20062018.[Abstract]
Dunbrack, R.L. and Cohen, F.E. 1997. Bayesian statistical analysis of protein side chain rotamers in proteins. Protein Sci. 6: 16611681.[Abstract]
Dunbrack, R.L. and Karplus, M. 1993. Backbone-dependent rotamer library for proteins: Application to side chain prediction. J. Mol. Biol. 230: 543571.[CrossRef][Medline]
. 1994. Conformational analysis of the backbone-dependent rotamer preference of protein side chains. Nat. Struct. Biol. 1: 334340.[CrossRef][Medline]
Durig, J.R., Compton, D.A.C., and Jailian, M.R. 1979. Analysis of torsional spectra of molecules with two internal C3v rotors. 11. Low frequency vibrational spectra, methyl torsional potential functions, and internal rotation of ethyl methyl sulfide. J. Phys. Chem. 83: 511515.[CrossRef]
Durig, J.R., Rollins, M.S., and Phan, H.V. 1991. Conformational stability, barriers to internal rotation, ab initio calculations and vibrational assignment of ethyl methyl sulfide. J. Mol. Struct. 263: 95122.[CrossRef]
Elstner, M., Porezag, D., Jungnickel, G., Elsner, J., Haugk, M., Frauenheim, T., Suhai, S., and Seifert, G. 1998. Self-consistent charge density functional tight-binding method for simulation of complex material properties. Phys. Rev. B 58: 72607268.[CrossRef]
Elstner, M., Frauenheim, T., Kaxiras, E., Seifert, G., and Suhai, S. 2000. A self-consistent charge density-functional based tight-binding scheme for large biomolecules. Phys. Status Solidi B 217: 357376.[CrossRef]
Finkelstein, A.V. and Ptitsyn, O.B. 2002. Lecture 16. In Protein physics, pp. 189203. Academic Press-Elsevier Science, Amsterdam.
Finkelstein, A.V., Batretdinov, A.Y., and Gutin, A.M. 1995. Why do protein architectures have Boltzmann-like statistics? Proteins 23: 142150.[CrossRef][Medline]
Frauenheim, T., Seifert, G., Elstner, M., Hajnal, Z., Jungnickel, G., Porezag, D., Suhai, S., and Scholz, R. 2000. A self-consistent charge density-functional based tight-binding method for predictive materials simulations in physics, chemistry and biology. Phys. Status Solidi B 217: 4162.[CrossRef]
Frisch, M.J., Trucks, G.W., Schlegel, H.B., Scuseria, G.E., Robb, M.A., Cheeseman, J.R., Zakrzewski, V.G., Montgomery, J.A., Stratmann, R.E., Burant, J.C., et al. 1998. Gaussian 98. Gaussian, Inc., Pittsburgh, PA.
Görbitz, C.H. 1994. Conformational properties of disulphide bridges. 2. Rotational potentials of diethyl disulphide. J. Phys. Org. Chem. 7: 259267.[CrossRef]
Hayashi, H., Shimanuchi, T., and Mizushima, S. 1957. Raman and infrared spectra of 2-thiabutane. J. Chem. Phys. 26: 608612.[CrossRef]
Hu, H., Elstner, M., and Hermans, J. 2003. Comparison of a QM/MM force field and molecular mechanics force fields in simulations of alanine and glycine "dipeptides" (Ace-Ala-Nme and Ace-Gly-Nme) in water in relation to the problem of modeling the unfolded peptide backbone in solution. Proteins 50: 451463.[CrossRef][Medline]
Iwaoka, M., Takemoto, S., and Tomoda, S. 2002. Statistical and theoretical investigations on the directionality of nonbonded SO interactions. Implications for molecular design and protein engineering. J. Am. Chem. Soc. 43: 1061310620.
Kabsch, W. and Sander, C. 1983. Dictionary of protein structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Lipsitz, R.S., Sharma, Y., Brooks, B.R., and Tjandra, N. 2002. Hydrogen bonding in high-resolution protein structures: A new method to assess NMR protein geometry. J. Am. Chem. Soc. 43: 1062110626.
Liu, H., Elstner, M., Kaxiras, E., Frauenheim, T., Hermans, J., and Yang, W. 2001. Quantum mechanics simulation of protein dynamics on long time scale. Proteins 44: 484489.[CrossRef][Medline]
Lovell, S.C., Word, J.M., Richardson, J.S., and Richardson, D.C. 2000. The penultimate rotamer library. Proteins 40: 389408.[CrossRef][Medline]
Lovell, S.C., Davis, I.W., Arendall, W.B., de Bakker, P.I.W., Word, J.M., Prisant, M.G., Richardson, J.S., and Richardson, D.C. 2003. Structure validation by C
geometry:
,
and Cß deviation. Proteins 50: 437450.[CrossRef][M