|
|
||||||||
1 Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, 33100 Udine, Italy2 Dipartimento di Biologia and CRIBI Biotech Centre, Università di Padova, 35131 Padova, Italy
Reprint requests to: Federico Fogolari, Dipartimento di Scienze e Tecnologie Biomediche, Università di Udine, P.le Kolbe 4, 33100 Udine, Italy; e-mail: ffogolari{at}mail.dstb.uniud.it; fax: ++39-0432-494301.
(RECEIVED July 23, 2004; FINAL REVISION December 20, 2004; ACCEPTED December 20, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: loop decays; MM/PBSA; implicit solvent; Poisson-Boltzmann; colony energy
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041004105.
| Introduction |
|---|
|
|
|---|
Basically predicting a stable molecular conformation may be viewed as a two-step procedure where first a large number of plausible molecular models are generated, possibly involving some rough free energy estimation, and, second, their free energy is accurately evaluated (see, e.g., Bonneau and Baker 2001).
Accurate free energy estimation is difficult because the environment (usually aqueous) must be taken into account and because it is difficult to estimate entropic effects, which depend on the energy landscape.
The progress made in recent years in protein structure prediction based on free energy estimation (see, e.g., Lee et al. 2001) seems of wider impact than just that related to the protein predictors community.
Designing good (relative) free energy estimators for protein conformation is particularly challenging because only the most stable structure is available from experiments. Higher free energy structures (often termed decoys) can be generated by computational methods, but their relative free energy is not available. Thus, predicted free energies cannot be compared with experimental values and the only possible test is to see whether the relative free energy of the decoys is indeed higher than that of the native structure.
The quality of the decoys is not known in advance but may be judged by how many free energy estimators fail to recognize the native structure. Decoy sets conversely may be used to test which free energy estimators perform consistently better than others (Park and Levitt 1996; Samudrala and Levitt 2000). A trade-off between performance and computational time is, however, expected.
Since the native structure is assumed to be at a thermodynamic minimum (Anfinsen 1973), the recognition of a native structure among well-constructed decoys is a prerequisite of any (free) energy function that aims at reproducing protein folding thermodynamics. It should be clear that this task is, however, only loosely related to the problem of protein structure prediction where the native structure is not available and, at best, native-like structures are available in the pool of predicted models. For this reason the necessary stronger requirement for a good free energy estimator is that structures assigned the lowest free energy, in a decoy set containing at least one native-like structure, must be native-like.
The relationship between free energy and similarity to the native structure (measured for instance by backbone coordinate root mean square deviation [RMSD]) is complicated by the fact that small changes in atomic coordinates (e.g., of sidechain atoms) may imply rather large changes in various energy terms, e.g., covalent, van der Waals, and electrostatic terms. This fact led some groups (see, e.g., Dominy and Brooks 2002) to neglect some energy terms, more sensitive to poor local geometry.
Lazaridis and Karplus (2000) classified the approaches to free energy estimation in two broad classes, those based on statistical effective energy functions (SEEF) and those based on physical effective energy functions (PEEF). Evidence has been presented in the literature in favor of one or the other class of approaches whose effectiveness depends on the functions, protocols, and test sets employed. Recent results show that physical effective energy functions based on molecular mechanics force fields and accurate estimation of solvation effects are well suited for estimating the relative free energy of atomic detail molecular models (see, e.g., Gatchell et al. 2000; Lee and Kollman 2001; Feig and Brooks 2002; de Bakker et al. 2003; Hsieh and Luo 2004).
SEEFs employing more or less detailed representations of protein chains have also been developed and used successfully on many decoy sets (see, e.g., Zhou and Zhou 2002; Berrera et al. 2003 and references cited therein; Zhang et al. 2004a,b).
The advantage of PEEFs over SEEFs is that they should be transferable to any system (if the methods they employ are themselves transferable, as they usually are). If methods are accurate they should be able to discard energetically disfavored conformations that are not recognized by many successful SEEFs.
One disadvantage of PEEFs based on molecular mechanics forcefields is that they are extremely sensitive to model-building errors that lead to bad atomic contacts. If not removed by minimization steps or partial relaxation, these lead to higher conformational energy.
The large number of successful molecular dynamics simulations in explicit solvent demonstrate that accurate molecular mechanics forcefields are available and the intra-molecular energy of a protein conformation may be estimated (at least in a relative sense) accurately (see, e.g., MacKerell et al. 1998). Although molecular mechanics energies have been used to discriminate plausible versus non-plausible protein conformations (for an early reference, see Novotny et al. 1984), they perform poorly (Wang et al. 1995) because of the following difficulties:
Many models with varying degrees of accuracy have been proposed for estimation of solvent effects. A minimal, but efficient, model prescribes that solvation free energy is proportional to the solvent-accessible surface area (SASA) distinguishing polar and nonpolar surfaces (Eisenberg and MacLachlan 1986; Fraternali and van Gunsteren 1996). In general, it is widely accepted that solvation effects may be decomposed in a "hydrophobic" component and a "polar" or electrostatic component, and that the two components may be treated, at least approximately, independently (Honig et al. 1993; Roux and Simonson 1999). A large amount of experimental data on hydrocarbon solubility supports the notion that the free energy of solvation (transfer from vacuum or from a nonpolar medium to water) for nonpolar molecules is proportional to SASA, through a surface tension coefficient
(Honig et al. 1993; Roux and Simonson 1999; but see also Blokzijl and Engberts 1993; Levy et al. 2003).
The polar part of the solvation free energy may be estimated in different analytical ways that approximate the computationally expensive solution of the Poisson-Boltzmann equation, which is the underlying model (Davis and McCammon 1991a; Honig and Nicholls 1995; Bashford and Case 2000; Fogolari et al. 2002).
The model based on molecular mechanics forcefields, Poisson-Boltzmann (PB) equation, and SASA (MM/PBSA) is thoroughly described in the Materials and Methods section.
Estimation of entropic effects requires that the ensemble of conformations in the proximity of the conformation under scrutiny is evaluated. Some approaches approximate the energy function close to the minimum with a harmonic oscillator for which entropy can be analytically computed (Srinivasan et al. 1998). Others assign conformational restrictions to torsional degrees of freedom (Doig and Sternberg 1995). A recently proposed approach uses the same ensemble of generated conformations (and their effective energy) to assess the number of conformations belonging to the "colony" of which each conformation is a sample (Xiang et al. 2002). Although this approach has several advantages that are described later, it has not been widely applied.
It is likely that procedures that build a large number of structures do not properly minimize the energy of predictive models and therefore the effective energy is dominated by poor geometry. An energy minimization protocol and/or a short-time molecular dynamics simulation (Vorobjev et al. 1998; Vorobjev and Hermans 2001) should therefore be applied to any predictive model before subjecting the same to more accurate effective energy estimation (see, e.g., Feig and Brooks 2002). Since the energy minimization will be performed without explicit solvent molecules, some care must be taken to include solvent effects, at least approximately. Due to the approximated model that is necessarily adopted, the minimization leads to a minimum that may be quite different from both the starting model and the effective energy minimum. For these reasons, extensive minimization, depending obviously on the quality of starting models, should be avoided.
In this paper, we design a protocol for the estimation of the free energy of molecular conformations by combining the MM/PBSA and the colony energy (Xiang et al. 2002) approaches. For the test sets studied, we find a good correlation between the estimated free energy and the similarity to the native structure, even with a rather conservative choice of parameters. These results suggest that MM/PBSA free energy estimation (and perhaps other MM/implicit solvent methods) may be made more robust, i.e., less sensitive to minor conformational changes, by supplementing it with the colony free energy approach.
| Results |
|---|
|
|
|---|
Then, the MM/PBSA colony free energy estimation protocol has been applied on two sets containing a large number of decoys, a prerequisite for colony energy approach application:
3. The abm_dataset (available at http://dd.stanford.edu/) (Samudrala and Levitt 2000), which includes four eight-and nine-residue decoy sets from an antibody. This test set allowed us to apply a general version of the colony energy approach because the set includes a large number of decoys;
4. A set of almost 1000 decoys generated by a fast divide-and-conquer method (Tosatto et al. 2002), which includes 20 decoys for all eight-residue stretches of a small protein.
Misfold dataset
A first test has been performed on the misfold decoy set (Holm and Sander 1992), mainly for comparison with other published reports and to check the protocols employed. Results (reported in Table 1
) are consistent with the results of Petrey and Honig (2000) and Dominy and Brooks (2002), although free energy differences here are larger probably due to the different model and protocol used. Structures 1rei.pdb and 2ts1
[PDB]
.pdb have been excluded from analysis because they are not monomeric or contain a break in chain connectivity as revealed by the program WHATIF (Vriend 1990).
|
It should be noted that for this decoy set molecular mechanics (MM) energy alone is able to discriminate all but one native structure (decoy 1ppton1cbh) and, with the protocol employed here, it performs slightly better than MM/GBSA energy. For almost all decoys, Coulombic energy is more favorable in the native structure than in the decoy, while PB solvation energy favors the decoy versus the native structure. Overall, the sum of the two contributions, i.e., of direct and solvation electrostatic interactions, is able to recognize the native structure in all but one case (1cbhon1ppt).
This quick test confirmed that the protocol designed to remove possible local poor geometry and at the same time take into account solvation effects is effective.
de Bakker et al. best eight-residue loop decoys
The decoy set of de Bakker et al. (2003) has been extensively studied by the same authors with excellent results, using MM/GBSA, as implemented in AMBER. Before discussing results on this set, a few observations must be made. Models might be incomplete due to missing groups, similar to the structure of ferredoxin in the misfold set. Here, 25 of the 32 structures have been crystallized with heterogroups (different from water) present in the PDB file. These heterogroups are not represented in the analyzed models. Seventeen out of 32 structures have one or more cis-peptide bonds and 14 structures have one or more residues falling in unfavorable regions of the Ramachandran plot. The energetics of these structural features is not guaranteed to be accurately represented by forcefields that have been designed to reproduce small molecule crystal structures.
Related to this observation, a very recent report by Hsieh and Luo (2004) indicates that inaccuracies in the AMBER forcefield are related with the 14 electrostatic energy term, which is subject to large fluctuations. The same authors are developing a modified forcefield that is able to improve the accuracy of the MM/PBSA energy estimation.
Another issue that is worth noting is that the ranking of the native structure among decoys might not be an appropriate way to assess the quality of the free energy estimator, because for some decoy sets many or most structures have very low RMSDs from the native structure. In view of the approximations involved in the methodology, we should tolerate mis-ranking of the native structure compared with very similar structures. Since the native structure is not available in a real prediction test, we should be more concerned with ranking near-native structures at the first places, rather than ranking the native structure first. This principle also conforms to the spirit of the colony energy approach.
Application of MM/PBSA free energy estimator to the 32 decoy sets, modeling eight-residue loops, ranks the native structure first in 18 sets, second in five other sets, within the first five in three other sets. Overall, the native structure is ranked among the lowest five energy structures in 81% of the sets.
When compared with the results of de Bakker et al. (2003), the performance of the MM/PBSA model is slightly worse than that of the AMBER/GBSA model. The protocol employed here is, however, different and these small differences in performance may be possibly attributed to the fact that, contrary to the cited work, we also allow the framework to move during minimization.
The superposition of loops on native structures performed for the lowest energy structure in each decoy set, however, shows that the RMSD is rather low for almost all best-ranked structures (<1.0 Å for 26 out of 32 decoy sets). A number of exceptions are found and closer inspection showed that for three of them (1phf-76, 1thw-18, and 1poa-71) unusual backbone conformations (falling in disallowed Ramachandran regions) are found in the best energy minimized structures and not in the native minimized structure. These unusual conformations are not properly penalized by MM/PBSA free energy estimators. As a remedy, each backbone conformation falling in disallowed regions and in generously allowed regions were weighted 40 kcal mol1 and 5 kcal mol1, respectively. These figures were empirically found to work, but the problem of correct backbone energy estimation must be definitely solved in a more accurate way. The work of Hsieh and Luo (2004) is going in this direction. Besides ad hoc countermeasures, the presence of bad geometry or unusual conformations points out inaccuracies of the force field and nonrobustness of the minimization protocol.
When unfavorable conformations are weighted as mentioned above, results are similar as before, as far as ranking of the native structure is concerned but improve drastically as far as RMSD from the native structure is concerned. This can be appreciated from results reported in Table 2
. In 28 out of 32 decoy sets, the lowest free energy structure has a RMSD from native, computed by superimposing the modeled loop, lower than 0.7 Å. The highest RMSD among all 32 sets is, however, as low as 2.0 Å.
|
|
To show how difficult native loop structure recognition is in this set (and in general), we plotted the contact energy estimated by a successful SEEF versus the RMSD computed on loop C
atoms, keeping framework residues fixed (Fig. 1
). It should be noted that the same empirical contact energies performed well on widely used decoy sets (Berrera et al 2003).
|
restrained structure, followed by 150 steepest descent and 150 conjugate gradients minimization steps. For all conjugate gradient steps, the conjugate gradient cycle was set to 10. The plot of MM/PBSA energy versus the RMSD, computed on loop C
atoms, after superposition of C
atoms not in the loop, is reported in Figure 2
|
When free energy components are analyzed, it is seen that, for this decoy set, the native structure is also correctly discriminated by MM energy. In general, a correlation between MM energy and RMSD is found. The gap in energy between the native structure and the best decoy is larger for MM/PBSA than for MM energy (29.6 and 23.5 kcal mol1, respectively) and slightly larger than that found with MM/GBSA energy (6.8 kcal mol1).
Both MM/PBSA and MM energy rankings place the lowest RMSD structures among the lowest free energy ones. For instance, of the five lowest MM/PBSA and MM free energy structures none exceeds a RMSD (over loop C
carbons) of 0.7 Å and 1.1 Å from the native structure, respectively.
MM/GBSA, on this particular decoy set, performs worse than both the MM and the MM/PBSA energy estimators. The PBSA free energy versus RMSD plot shows a weak anticorrelation. The solvent accessibility distribution shows a weak correlation with RMSD.
The opposing effects of direct Coulombic and solvation electrostatic terms, which have been noted before, are also found for this decoy set with correlation coefficient 0.84.
The total electrostatic component of the free energy shows a weak correlation with RMSD. For this particular decoy set, the free energy is dominated by MM terms. Due to the large uncertainty on parameters employed for continuum calculation, we test the application of different scaling factors on the electrostatic contributions to the free energy (direct Coulombic and PB solvation components) and on the term proportional to solvent accessibility. In this way we determine a plausible range for both dielectric constant and surface tension coefficient, to be used in this approach.
The solvation free energy estimated by the PB equation scales approximately with the inverse of the dielectric constant, just like the direct Coulombic term. We can therefore multiply both terms, for consistency, by a range of constants and assess what values are feasible or unfeasible. For this particular decoy set, the surface tension coefficient can range between 0 and 1200 cal mol1 Å2 with optimal separation between the lowest free energy native structure and the next lowest free energy decoy achieved for 0 cal mol1 Å2. The dielectric constant can range between 1.0 and 5.0 with the largest energy separation, between native and next lowest energy structure, obtained at 1.0.
Although the separation in energy of the native structure from the best decoy may not be a good criterion, overall these observations show that for standard choices of force-field parameters and minimization protocols, the dielectric constant values usually employed for electrostatic calculations (between 1.0 and 4.0) are able to discriminate native structures from decoys and are therefore reliable. The assessment of reasonable values for the surface tension coefficient requires a different test set.
The colony energy approach proposed by Honig and coworkers (Xiang et al. 2002), modified as described in the Materials and Methods section, has been applied on this set. Pairwise RMSDs have been computed on all protein C
carbons because different loops have been modeled. For this reason the computed pairwise RMSDs are extremely low, ranging from 0.2 Å to 1.25 Å. The correlation between the RMSD from the native structure computed on all protein C
atoms and on loop C
atoms are approximately linearly dependent with a correlation coefficient of 0.994. To prevent that all colony energies collapse toward the lowest energy, as discussed in the Materials and Methods section, a 0.2 Å threshold, t, lower than the lowest RMSD of minimized structures from native, has been chosen. The native structure was removed from the ensemble so that no information about RMSD from native is used to compute the colony energy reported in Figure 3
. Only after colony energies are computed, they are plotted against the RMSD of the modeled loop from the native loop, computed after superposition of the remaining residues. The correlation of colony energy with RMSD from the native structure is therefore not artifactual and witnesses the quality of the method and protocol used.
|
|
Anticorrelation is also found here between Coulombic and electrostatic solvation energy with correlation coefficient 0.87.
Application of the colony energy approach is somewhat less effective here, as far as the smooth appearance of the curve is concerned, because the set contains 20 decoys for all octamers in the sequence and there are therefore many uncorrelated structures. For this reason, it may happen that no low energy structure is present in the "colony." The effect of the threshold value may be appreciated in this particular case. The choice of threshold 0.5 Å and 1.0 Å (Figs. 5
7
) produces increasingly regularized curves.
|
|
In this set, the colony energy approach improves the choice of a plausible predictive model significantly. Since the native structure is completely ignored in computing the colony energy for each model, the predictive properties of the approach should not be considered artifactual.
Conclusions
The MM/PBSA approach has been clearly and completely formalized by Kollman et al. (2000), although similar approaches to the estimation of free energies from molecular structures have been used by many other authors (see, e.g., Baginski et al. 1997; Vorobjev et al. 1998). Two problems related to free energy estimation from molecular structures are accurate estimation of solvent effects and entropic contributions.
Recent work of our and other groups has demonstrated that the approach based on molecular mechanics for estimation of intramolecular energy and on PB equation and SASA for estimating solvation effects is accurate (see, e.g., Kollman et al. 2000; Feig and Brooks 2002; Fogolari et al. 2003). Estimation of entropic effects is more difficult because it requires exploration of conformational space.
Moreover, the strong dependence of the estimated energy on small coordinate variations poses a question on the robustness of the approach.
The results described above show that estimation of a solute potential of mean force (intramolecular energy and solvation effects) by the MM/PBSA approach, using for instance the protocol implemented here, is overall accurate. Possible sources of inaccuracies appear to be related to the MM energy estimation of unusual backbone conformations.
As far as the MM/PBSA free energy estimation is concerned, our results compare well with similar works. In particular, the comparison with the work of de Bakker et al. (2003), who studied the same decoy sets using the AMBER/GBSA method, shows that the two approaches have very similar performances. The average RMSD from native of the best model (computed on all backbone atoms, after superposition of framework residues) is slightly better here (by ~0.1 Å), although the native structure is found among the five lowest free energy models in 75% of the sets (~15% less than using the AMBER/GBSA method).
Comparison with the results obtained using the AMBER/GBSA method suggests that the GBSA model is as accurate as the PBSA model, at least at the level of accuracy required by the decoy sets studied. Native structure for almost all decoys considered is found in a deep energy well as expected.
Although native-like structures for most cases have low free energy, it appears that application of the colony energy approach considerably improves the relationship between the RMSD from the native structure and estimated free energy. In particular, the approach has the beneficial effect to greatly hamper the dependence of the estimated free energy on conformational variations. The plots reported in this work, built without making use of information on the native structure, show a much smoother appearance than the corresponding raw MM/PBSA versus RMSD plots.
A similar behavior has been observed by Honig and coworkers (Xiang et al. 2002). Here, the application of the colony energy approach is even more effective due to the more detailed MM/PBSA free energy estimator.
In summary, our results suggest that the MM/PBSA approach with prior minimization using accurate estimation of solvent effects, coupled with the colony approach for estimating entropic effects and for smoothing free energy predictions, is well suited for evaluating predictive protein models, and for the analysis of protein folding simulations.
| Materials and methods |
|---|
|
|
|---|
![]() |
where U is the potential energy and indices 1, . . . , n run over solute coordinates and indices n + 1, . . . , N run on solvent coordinates. Now we may consider rewriting the above equation in terms of the protein potential of mean force W(
1, . . . ,
n), which is by definition (Hill 1956) such that:
![]() |
where V is the volume.
With this notation the free energy of the macroscopic state C, unless a constant term, is
![]() |
We assume that each predictive model is a sample from a macroscopic state C. To estimate the free energy of the macroscopic state C we must be able to (1) reasonably estimate the potential of mean force W(
1, . . . ,
n); (2) treat the integral in a reasonable fashion.
It is often assumed that the potential of mean force W(
1, . . . ,
n) may be written as the sum of a solute energy term U(
1, . . . ,
n) and a solvation free energy term that is further split in a polar (electrostatic) and a nonpolar (hydrophobic) term (Honig et al. 1993; Roux and Simonson 1999).
![]() |
The free energy of the macroscopic state C is often estimated as the sum of the potential of mean force for one or more sampled microscopic states and an entropic contribution arising from restrictions in conformational and/or translational and rotational freedom (see, e.g., Finkelstein and Janin 1989; Doig and Sternberg 1995):
![]() |
If we assume that each conformation is a sample from a basin of the same number of conformations possessing the same potential of mean force, i.e., that
S is the same for all different conformations sampled, then the potential of mean force may be used to discriminate the native (stable) versus unfolded or misfolded (non-stable) conformations. The rationale behind this choice is that folded conformations are usually more compact and therefore conformationally restricted than unfolded states. As a consequence, the entropic term will favor unfolded states. The potential of mean force component of the free energy should therefore be, a fortiori, lower for folded states than for unfolded states.
For folded or misfolded states, the entropic restrictions should be comparable and therefore the enthalpic component should be sufficient to discriminate stable versus less stable, but equally compact, states.
We adopted, where possible, the colony energy approach of Honig and coworkers (Xiang et al. 2002), described later.
The solute energy term U(
1, . . . ,
n) has been computed using CHARMM (version 27b2), a classical and well-tested molecular mechanics forcefield (Brooks et al. 1983; MacKerell et al. 1998). All structures have been energy minimized in a two-step fashion, except where indicated. First, all C
coordinates have been fixed and the energy minimized with 20 steepest descent and 30 conjugate gradients steps to remove high energy spots, like van der Waals clashes. During this minimization, a distance-dependent dielectric constant has been employed (
= 1r) with a cutoff of 12 Å. In the second step, the structure is totally unrestrained and a more accurate solvent model (the Generalized Born Solvent Accessible model) has been employed using the parameters suggested by Qiu et al. (1997), to reduce artifacts ensuing from missing solvent. First, 50 steepest descent minimization steps have been performed followed by 200 conjugate gradients minimization steps. After minimization, the energy was evaluated with no cutoff and with dielectric constant 1.0, in order to be consistent with the forcefield used (see the discussion below).
Gpolar(
1, . . . ,
n) has been computed according to the PB theoretical framework (Marcus 1955; Sharp and Honig 1990; Zhou 1994; Fogolari and Briggs 1997) as the difference in free energy for the hypothetical charging process of the solute in vacuo and in ionic solvent. To get rid of self energy contributions, which are strongly dependent on grid mesh and positioning, two identical computations are performed differing only in the solvent dielectric constant and ionic strength (80 and 100 mM, respectively, for solvent calculation and 1 and 0 mM, respectively, for the in vacuo calculation). All calculations have been performed using the program UHBD (Madura et al. 1994, 1995).
The choice of 1.0 for inner dielectric constant requires a word of caution. In most studies involving the MM/PBSA approach or some of its variants, the inner dielectric constant is assumed to be in the range 24, typical of organic molecules. In applications exploring molecular dynamics within the MM/PBSA approach it was found that substantially higher dielectric constant values (4.017.0) must be employed to achieve reasonable molecular dynamics trajectories, thus posing serious problems of consistency between the MM and PB energy estimations (Fogolari et al. 2001, 2003; Lu et al. 2002). A clear discussion about the choice of dielectric constant has been given by Schutz and Warshel (2001). Simonson (1999) has accurately estimated the dielectric constant resulting from relaxation of protein dipoles and net charges. In the protocol presented here, structures are minimized and therefore some relaxation is taken into account. Moreover, we notice that Ramachandran maps generated with the protocol chosen here and using dielectric constant values larger than 1.0 do not show the low energy region corresponding to typical backbone conformations. Thus, we decided to adopt the value of 1.0 for the dielectric constant more consistent with the MM energy estimation and correctly estimating the energy of backbone conformations for model compounds.
The following protocol has been used: First, the linear PB equation is solved on a large grid, with 403 grid points spaced 2.5 Å, centered on the protein to obtain boundary conditions for the following focusing steps. Then the same equation is solved on smaller and finer grids (333 points spaced 0.45 Å) centered on each aminoacid center of geometry. In each of the focusing steps, the potential at each atomic position of the amino acid is stored and later used to estimate the electrostatic free energy of the protein, which is expressed for the linear PB equation as:
![]() |
where qi is the charge of the ith atom and
i is the potential at the same atom. The linear PB equation is appropriate for moderately charged biomolecules (Fogolari et al. 1999). To speed up calculations, the inner dielectric constant was assigned to space regions within van der Waals radii. The solvent-accessible surface was generated by inflating all atomic radii by 1.4 Å and considering the inflated van der Waals surface.
Dielectric boundaries have been smoothed according to the procedure of Davis and McCammon (1991b).
Gnonpolar(
1, . . . ,
n) is taken to be proportional to the SASA A, that is,
Gnonpolar(
1, . . . ,
n) =
A. Although a large range of values have been used in the literature for the surface tension coefficient
, we have used a value of 20 cal mol1 Å2 lower than the 50 cal/mol Å2 value proposed by Nicholls et al. (1991) for proteins. The rationale for this choice is that, in the absence of solvent molecules, unbalanced intramolecular van der Waals forces provide a strong tendency toward collapse anyway. The employed surface tension value has been previously empirically determined by considering change in van der Waals energy upon folding of a small protein (Fogolari et al. 2003).
The colony energy approach
The problem of estimating the entropic term due to the solute is difficult to solve because the number of conformations accessible from each microscopic state depends on the local energy landscape, and this is not easily available. A simple approach is just to neglect entropic effects assuming that for compact structures they should be similar and therefore cancel out in comparison. This approach has been implicitly assumed in many studies. A few other approaches have been presented that deal explicitly with entropy and deserve being mentioned. A simple line of thought is to consider how many rotamers for each torsion angle can be explored in each state and consider the number of microstates compatible with each macrostate simply as the product of the number of rotamers. The resulting entropy is therefore:
![]() |
Typically for side-chain torsion angles upon folding, an entropic restriction of
![]() |
is assumed (Doig and Sternberg 1995).
Kollman et al. (2000) estimated the entropy of each conformation as the entropy computed for the energy minimized conformation in the harmonic approximation of the potential close to the minimum. This procedure faces the drawback of necessary extensive minimization, which is known to introduce artifacts and should employ some approximation of solvation effects.
A different approach, requiring molecular dynamics simulation, has been used by Baginski et al. (1997), who estimated the solute entropy via thermodynamic integration with reference to a model where all atoms are not interacting and harmonically constrained to the initial position.
Vorobjev et al. (1998) estimated solute entropy from the covariance fluctuation matrix computed on a molecular dynamics trajectory. This approach, however, also requires molecular dynamics simulation.
Honig and coworkers (Xiang et al. 2002) adopted a different approach that is particularly useful for free energy estimation of predictive models. The basic idea, applied to loop modeling, is that each conformation is a sample of a colony of similar conformations. If we assume for the time being that all these conformations are reachable within the colony, the free energy corresponding to the colony would be
![]() |
To estimate the contributions of mostly nonsampled conformations to the colony free energy, the same set of models under scrutiny is used. In practice, (1) it is assumed that the set of generated structures samples conformational space in a statistically significant way (hence the requirement of a large number of structures) and that the sum over all conformations defined as macrostate C is restricted to the sampled conformations; and (2) a weight function ranging from 1 (e.g., for identical backbone conformations) to 0 (for unrelated backbone conformations) is used rather than precisely defining which microstates should be included in the sum. The weight function empirically determined by Honig and coworkers (Xiang et al. 2002) takes the general form
![]() |
where i is the index of the reference structure, j is the index of each other sampled conformation, and t is a threshold RMSD that defines the similarity between two conformations. With this weight function the free energy corresponding to the ith conformation is:
![]() |
Honig and coworkers (Xiang et al. 2002) used a value for t tailored to the loop modeling problem, in particular t was chosen equal to
![]() |
where L is the length of the modeled loop, corresponding, for a loop of eight residues, to a value of 3.6 Å. As already noted in the original paper, besides modeling entropic effects, the approach has the further beneficial effect of smoothing the energy versus RMSD plot. This is particularly important because often, due to minor errors in placing side-chain atoms, the computed energy may be very high. If a low energy structure has a significant weight in the sum over other structures, the colony energy of the reference structure will be lower. Smoothing the appearance of the plot allows to immediately pick up the reliability of a free energy estimator. In particular, if the free energy estimator does not consistently assign lower energy to native-like structures, the plot will be smooth, but it will not show the RMSDenergy correlation that is the ultimate goal of all free energy estimators. With these premises how should the parameter t be chosen? We suggest that the choice should roughly match or be less than the attempted accuracy of the prediction. In other words, t should be set equal to or less than the RMSD value we accept for defining two structures similar. For instance, if the goal is abinitio protein prediction, perhaps a value of 4.06.0 Å should be appropriate. For accurate loop modeling, values in the range 1.02.0 Å should be appropriate.
For the sake of clarity let us consider two extreme cases: If a t value of 0 Å is chosen, then the weight is 0.0 for all conformations except for the reference conformation i, which has a weight of 1.0. The colony approach in this case is simply not applied to the data. If a very large t value is chosen, all conformations will receive the same weight close to 1.0 and all conformations will have the same colony energy. In this case the plot will display a single horizontal line.
1igd octamer decoy generation
The 1igd octamer decoys were generated with a fast divide-and-conquer method (Tosatto et al. 2002). A sliding window running from the N to the C terminus was placed on protein G immunoglobulin binding domain (PDB code: 1igd
[PDB]
), a small 61-residue protein containing both
-helices and
-strands. All eight residue fragments under the sliding window were individually predicted using the fast loop modeling method. A total of 500 conformations were constructed and ranked according to coarse geometric and energy criteria.
The top 20 solutions were retained as part of the decoy set. Since the loop modeling method does not predict the conformation of side-chain atoms, these were predicted using the SCWRL3 method (Canutescu et al. 2003). The whole procedure guarantees the generation of an ensemble of decoys that differ from the native structure in a single eight-residue fragment. Because of the selection process, all decoys are also ensured to have at least acceptable backbone geometry and interaction energies.
|
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Anfinsen, C.B. 1973. Principles that govern the folding of protein chains. Science 181: 23230.
Baginski, M., Fogolari, F., and Briggs, J.M. 1997. Electrostatic and non-electrostatic contributions to the binding free energies of anthracycline antibiotics to DNA. J. Mol. Biol. 274: 253267.[CrossRef][Medline]
Bashford, D. and Case, D.A. 2000. Generalized born models of macromolecular solvation effects. Ann. Rev. Phys. Chem. 51: 129152.[CrossRef][Medline]
Berrera, M., Molinari, H., and Fogolari, F. 2003. Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics 4: 8.[CrossRef][Medline]
Blokzijl, W. and Engberts, J.B.F.N. 1993. Hydrophobic effects. Opinions and facts. Angew. Chem. Int. Ed. Engl. 32: 15451579.[CrossRef]
Bonneau, R. and Baker, D. 2001. Ab initio protein structure prediction: Progress and prospects. Ann. Rev. Biophys. Biomol. Struct. 30: 173189.[CrossRef][Medline]
Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., and Karplus, M. 1983. Charmm: A program for macromolecular energy minimization and dynamics calculations. J. Comput. Chem. 4: 187217.
Canutescu, A.A., Shelenkov, A.A., and Dunbrack, R.L. 2003. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 12: 20012014.
Davis, M.E. and McCammon, J.A. 1991a. Electrostatic in biomolecular structure and dynamics. Chem. Rev. 90: 509521.
. 1991b. Dielectric boundary smoothing in finite difference solutions of the Poisson equation: An approach to improve accuracy and convergence. J. Comp. Chem. 12: 909912.[CrossRef]
de Bakker, P.I.W., De Pristo, M.A., Burke, D.F., and Blundell, T.L. 2003. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the amber force field with the generalized born solvation model. Proteins 51: 2140.[CrossRef][Medline]
Doig, A.J. and Sternberg, M.J. 1995. Side-chain conformational entropy in protein folding. Protein Sci. 4: 22472251.[Abstract]
Dominy, B.N. and Brooks III, C.L. 2002. Identifying native-like protein structures using physics-based potentials. J. Comput. Chem. 23: 147160.[CrossRef][Medline]
Eisenberg, D. and McLachlan, A.D. 1986. Solvation energy in protein folding and binding. Nature 319: 199203.[CrossRef][Medline]
Feig, M. and Brooks III, C.L. 2002. Evaluating casp4 predictions with physical energy functions. Proteins 49: 232245.[CrossRef][Medline]
Finkelstein, A.V. and Janin, J. 1989. The price of lost freedom: Entropy of bimolecular complex formation. Protein Eng. 3: 13.
Fogolari, F. and Briggs, J.M. 1997. On the variational approach to the Poisson-Boltzmann free energies. Chem. Phys. Lett. 281: 135139.[CrossRef]
Fogolari, F., Zuccato, P., Esposito, G., and Viglino, P. 1999. Biomolecular electrostatics with the linearized Poisson-Boltzmann equation. Biophys. J. 76: 116.
Fogolari, F., Esposito, G., Viglino, P., and Molinari, H. 2001. Molecular mechanics and dynamics of biomolecules using a solvent continuum model. J. Comput. Chem. 22: 18301842.[CrossRef][Medline]
Fogolari, F., Brigo, A., and Molinari, H. 2002. The Poisson-Boltzmann equation for biomolecular electrostatics: A tool for structural biology. J. Mol. Recogn. 15: 377392.[CrossRef][Medline]
. 2003. Protocol for MM/PBSA molecular dynamics simulations of proteins. Biophys. J. 85: 159166.
Fraternali, F. and Van Gunsteren, W.F. 1996. An efficient mean solvation force model for use in molecular dynamics simulations of proteins in aqueous solution. J. Mol. Biol. 256: 939948.[CrossRef][Medline]
Gatchell, D.W., Dennis, S., and Vajda, S. 2000. Discrimination of near native protein structures from misfolded models by empirical free energy functions. Proteins 41: 518534.[CrossRef][Medline]
Gilson, M.K., Given, J.A., Bush, B.L., and McCammon, J.A. 1997. The statistical-thermodynamic basis for computation of binding affinities: A critical review. Biophys J. 72: 10471069.
Hill, T. 1956. An introduction to statistical mechanics. Dover Publications, New York.
Holm, L. and Sander, C. 1992. Evaluation of protein models by atomic solvation preference. J. Mol. Biol. 225: 93105.[CrossRef][Medline]
Honig, B. and Nicholls, A. 1995. Classical electrostatic in biology and chemistry. Science 268: 11441149.
Honig, B. and Yang, A.-S. 1995. Free energy balance in protein folding. Adv. Prot. Chem. 46: 2758.[Medline]
Honig, B., Sharp, K., and Yang, A.-S. 1993. Macroscopic models of aqueous solution: Biological and chemical applications. J. Phys. Chem. 97: 11011109.[CrossRef]
Hsieh, M.-J. and Luo, R. 2004. Physical scoring function based on amber force field and Poisson-Boltzmann implicit solvent for protein structure prediction. Proteins 56:475486.[CrossRef][Medline]
Kollman, P.A., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Duan, Y., Wang, W., Donini, O., et al. 2000. Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc. Chem. Res. 33: 889897.[CrossRef][Medline]
Lazaridis, T. and Karplus, M. 2000. Effective energy functions for protein structure prediction. Curr. Op. Struct. Biol. 10: 139145.[CrossRef][Medline]
Lee, M.R. and Kollman, P.A. 2001. Free-energy calculations highlight differences in accuracy between x-ray and NMR structures and add value to protein structure prediction. Structure 9: 905916.[Medline]
Lee, M.R., Baker, D., and Kollman, P.A. 2001. 2.1 and 1.8 Å average c
RMSD structure predictions on two small proteins, hp-36 and s15. J. Am. Chem. Soc. 123: 10401046.[CrossRef][Medline]
Levy, R.M., Zhang, L.Y., Gallicchio, E., and Felts, A.K. 2003. On the nonpolar hydration free energy of proteins: Surface area and continuum solvent models for the solute-solvent interaction energy. J. Am. Chem. Soc. 125: 95239530.[CrossRef][Medline]
Lu, B.Z., Chen, W.Z., Wang, C.X., and Xu, X.-J. 2002. Protein molecular dynamics with electrostatic force entirely determined by a single Poisson-Boltzmann calculation. Proteins 48: 497504.[CrossRef][Medline]
MacKerell Jr., A.D., Bashford, D., Bellott, M., Dunbrack Jr., R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S., et al. 1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 102: 35863616.[CrossRef]
Madura, J.D., Davis, M.E., Gilson, M.K., Wade, R., Luty, B.A, and McCammon, J.A. 1994. Biological applications of electrostatics calculations and Brownian dynamics simulations. Rev. Comp. Chem. 5: 229267.
Madura, J.D., Briggs, J.M., Wade, R., Davis, M.E., Luty, B.A., Ilin, A., Antosiewicz, J.A., Gilson, M.K., Bagheri, B., Ridgway Scott, L., et al. 1995. Electrostatics and diffusion of molecules in solution: Simulations with the University of Houston Brownian Dynamics program. Comput. Commun. Phys. 91: 5795.[CrossRef]
Marcus, R.A. 1955. Calculation of thermodynamic properties of polyelectrolytes. J. Chem. Phys. 23: 10571068.[CrossRef]
Nicholls, A., Sharp, K.A., and Honig, B. 1991. Protein folding and association: Insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins 11: 281296.[CrossRef][Medline]
Novotny, J., Bruccoleri, R., and Karplus, M. 1984. An analysis of incorrectly folded protein models. Implications for structure predictions. J. Mol. Biol. 177: 787818.[CrossRef][Medline]
Park, B. and Levitt, M. 1996. Energy functions that discriminate x-ray and near-native folds from well-constructed decoys. J. Mol. Biol. 258: 367392.[CrossRef][Medline]
Petrey, D. and Honig, B. 2000. Free energy determinants of tertiary structure and the evaluation of protein models. Protein Sci. 9: 21812191.[Abstract]
Qiu, D., Shenkin, P., Hollinger, F., and Still, W. 1997. The GB/SA continuum model for solvation. A fast analytical method for the calculation of approximate born radii. J. Phys. Chem. 101: 30053014.
Roux, B. and Simonson, T. 1999. Implicit solvent models. Biophys. Chem. 78: 120.
Samudrala, R. and Levitt, M. 2000. Decoys r us: A database of incorrect protein conformations to improve protein structure prediction. Protein Sci. 9:13991401.[Abstract]
Samudrala, R. and Moult, J. 1998. A graph-theoretic algorithm for comparative modelling of protein structure. J. Mol. Biol. 279: 287302.[CrossRef][Medline]
Schapira, M., Totrov, M., and Abagyan, R. 1999. Prediction of the binding energy for small molecules, peptides, and proteins. J. Mol. Recogn. 12: 177190.[CrossRef][Medline]
Schutz, C.N. and Warshel, A. 2001. What are the dielectric "constants" of proteins and how to validate electrostatic models? Proteins 44: 400417.[CrossRef][Medline]
Sharp, K.A., and Honig, B. 1990. Calculating total electrostatic energies with the non-linear Poisson-Boltzmann equation. J. Phys. Chem. 94: 76847692.[CrossRef]