|
|
||||||||
1 Department of Chemistry, and W.M. Keck Center for Computational and Structural Biology, Rice University, Houston, Texas 77005, USA
2 Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
3 Department of Physics and Astronomy, University of British Columbia, Vancouver, BC V6T1Z1, Canada
Reprint requests to: Cecilia Clementi, Department of Chemistry, Rice University, 6100 Main Street, Houston, TX 77005, USA; e-mail: cecilia{at}rice.edu; fax: (713) 348-5155; or Steven S. Plotkin, Department of Physics and Astronomy, University of British Columbia, 6224 Agricultural Road, Vancouver, BC V6T1Z1, Canada; e-mail: steve{at}physics.ubc.ca; fax: (604) 822-5324.
(RECEIVED December 17, 2003; FINAL REVISION March 22, 2004; ACCEPTED March 25, 2004)
| Abstract |
|---|
|
|
|---|
model of the src-SH3 domain. The strong agreement between results from simulations and theory confirm the nontrivial result that a relatively small amount of nonnative interaction energy can actually assist the folding to the native structure. Keywords: protein folding; frustration; free energy landscape; folding rate; minimalist model; molecular dynamics
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03580104.
Supplemental material: see www.proteinscience.org
| Introduction |
|---|
|
|
|---|
Theoretical and computational studies have recently achieved noticeable success in reproducing various features of the folding mechanisms of several small- to medium-sized fast-folding proteins (see, e.g., Shoemaker et al. 1999; Sorenson and Head-Gordon 2000 Sorenson and Head-Gordon 2002; Shea and Brooks III 2001; Shimada and Shakhnovich 2002; Clementi et al. 2003; Karanicolas and Brooks 2003; Kaya and Chan 2003); at the same time, the improved spatial and temporal resolution of recent experimental techniques is now allowing researchers to combine theoretical and experimental data to give a more robust characterization of the folding free energy landscape (Lapidus et al. 2000; Ervin and Gruebele 2002; Schuler et al. 2002; Snow et al. 2002; Kubelka et al. 2003; Pande 2003). However, in spite of these recent successes, a microscopically detailed observation of the individual conformational motions that occur during folding remains elusive. Knowledge of the time-dependence of every degree of freedom in the system is, however, not of inherent interest, because no additional insight to the underlying physics of the folding process is gained from this information by itself. Nor is any particular degree of freedom especially important to folding, because the transition involves the cooperation of many weakly (noncovalently) interacting constituents. For these reasons, a statistical description of the process of folding, in terms of the behavior of an ensemble of systems, is appropriate for distinguishing general (self-averaging) properties from sequence-specific ones (Bryngelson et al. 1995). The characterization of the folding process in statistical mechanical terms can pinpoint crucial questions that may be computationally or experimentally addressed in more detail.
The idea of considering ensemble properties to characterize the folding landscape underpinned studies of the transition state and folding mechanism as arising from the native-state topology (Shoemaker et al. 1997; Alm and Baker 1999; Galzitskaya and Finkelstein 1999; Munoz and Eaton 1999; Clementi et al. 2000a, b, 2001, 2003; Shea et al. 2000). As a general rule, the transition-state structure does not differ dramatically between homologous proteins (Baker 2000; Plaxco et al. 2000a; Gunasekaran et al. 2001), and any exceptions are fairly readily explained (Ferguson et al. 1999; Kim et al. 2000). Consistent with the above-mentioned notion of self-averaging, folding rates of homologous proteins are seldom seen to differ by more than an order of magnitude when tuned to the same stability (Mines et al. 1996; Plaxco et al. 2000b). This indicates that the folding free energy barrier is not particularly sensitive to the details of sequences folding to a given native structure, but depends rather on more general features of that ensemble of sequences, including the kinetic accessibility of that native structure. In this sense, the topology of the native structure largely determines the folding free energy barrier for those homologous sequences (Plaxco et al. 2000b).
These ideas motivated many studies of folding rates and mechanisms using so-called G
models (Ueda et al. 1975), which neglect interactions not present in the native state. In these studies, the possibility of structure prediction is traded for the possibility of rate and mechanism prediction. Moreover, because of the robustness of rate and mechanism for homologous proteins, the coarse graining of the G
model (i.e., removing the molecular details of side chains and solvent) is often assumed a reasonable approximation.
Topology-based approaches seek to predict mechanism by calculating
-values (Fersht et al. 1986; Matouschek et al. 1990) or analogous quantities, which in an accurate theory give values that correlate with experiment for the measured cases. Occasionally one finds residues whose
-values are negative. This is most likely caused by the presence of nonnative contacts that stabilize the transition state, but cannot be present in the native state. The presence of nonnative interactions in the transition state is supported by all-atom simulations using a Charmm-based effective energy function, where it was found that ~20%25% of the energy in the transition state arose from nonnative contacts (Paci et al. 2002).
Hence, for a more realistic protein potential energy function, nonnative interactions must be taken into account. In this paper, we analyze the effect of increasing the strength of nonnative interactions on the folding rate as well as the free energy barrier. Nonnative interactions are introduced as additional contacts between pairs of residues not in contact in the native structure, which are allowed to have a nonzero mean and a nonzero variance. The nonnative interactions are added perturbatively to the G
model: All nonnative contacts are given a random energy with mean
NN and a variance b2, which is progressively increased to examine more frustrated proteins, while the native contact energies are all held fixed to the same number. The limiting case of
NN = 0 and b = 0 corresponds to the plain G
model. This procedure essentially preserves the stability of the native state, where approximately no nonnative interactions are present. However, the stability of the unfolded state is lowered (as shown below).
At first glance, one would expect that introducing progressively larger nonnative contact energies to an otherwise energetically unfrustrated G
protein would slow the folding rate, for straightforward reasons: It would seem that "noise" in the system would make the native basin harder to recognize. One might argue by analogy that it is easier to read a page of text without random misspellings. However, the folding rate has been predicted to initially increase under the introduction of weak, nonnative interactions, added as a perturbation to the strong native interactions driving folding (Plotkin 2001). This was a fold-independent result derived from general principles of energy landscape theory. This prediction was subsequently verified in simulations of a 36-mer lattice model (Fan et al. 2002), as well as off-lattice molecular dynamics simulations of Crambin, in which attractive nonnative contacts were successively added (Cieplak and Hoang 2002). Independently, it was found that nonnative interactions were present in the transition state of a 28-mer lattice-model protein with side chains, and increased the folding rate when strengthened (Li et al. 2000). Similar observations were also seen in two-dimensional 24-mer lattice models (Treptow et al. 2002). A different computational study on a 36-mer lattice-model protein found that at the temperature of fastest folding in simulation models, the folding rate monotonically decreases with increasing ruggedness (Fan et al. 2002; the temperature of fastest folding, of course, varies with the ruggedness). However, this typically barrierless regime is rarely seen in the laboratory (Gruebele 1999; Sabelko et al. 1999).
The prediction that strengthening nonnative interactions that were initially weak would accelerate folding is also consistent with experimental observations that strengthening nonspecific hydrophobic stabilization in the
-spectrin Src homology 3 (SH3) domain sped up folding (and unfolding) for that protein (Viguera et al. 2002). This result was significantly nontrivial, to the extent that the experimental observation was originally interpreted (mistakenly) as evidence against the energy landscape theory.
In this paper, we test this prediction with simulations of a coarse-grained protein model, by using an off-lattice C
model (see e.g., Honeycutt and Thirumalai 1992; Clementi et al. 2000b) of the SH3 domain of src tyrosine-protein kinase (src SH3). We use a Hamiltonian function that has tunable amounts of nonnative energy (see Supplemental Material for details). The results from simulations are compared with the predictions of an improved version of the existing theory (Plotkin 2001). The theory is improved by introducing a finite-size treatment of packing fraction as a function of polymer length, which takes better account of the polymer physics involved in collapse as folding progresses. Moreover, the previous study treated the rate enhancement at fixed stability. Here, we show a perhaps even less intuitive result, namely, that the rate enhancement can happen at fixed temperature, and we derive the conditions required for this to happen.
As the strength of nonnative interactions is increased to larger values, we find that eventually the folding rate decreases drastically, as expected. In the limit of large non-native contact energies, the chain behaves like a random heteropolymer, having misfolded structures more stable than the native state.
The folding mechanism is also nontrivially affected by the introduction of nonnative interactions. In this regard, the analysis of the robustness of the folding mechanism against an increasingly strong perturbation on the non-native interactions can provide a critical assessment on the validity of unfrustrated protein models for the prediction of folding mechanism, for different protein topologies. This analysis goes beyond the scope of the present paper, and it will be addressed separately (S.S. Plotkin and C. Clementi, unpubl.).
The paper is organized as follows: In the next section, we present the theory. After presenting the general ideas and overall strategy, we discuss in detail how an explicit expression for the conformational entropy can be obtained in terms of the packing fraction. We then use this result to show how the thermodynamic free energy barrier is lowered by the presence of nonnative interactions. In the third section, we test the theoretical predictions with direct simulation of the src-SH3 domain. We first compare the definition of reaction coordinates and the relative approximations of theory and simulations; thermodynamic and kinetic quantities obtained from simulations are then quantitatively compared with the corresponding theoretical predictions.
The strong agreement between results from simulations and theory confirms the nontrivial result that a relatively small amount of nonnative interaction energy can actually assist the folding to the native structure.
| Theory of folding with nonnative interactions |
|---|
|
|
|---|
There are several relevant energy and entropy scales governing the thermodynamics of folding. Let the energy of the native structure be given by EN. Let the total number of contact interactions in a fully collapsed polymer globule be given by M. Asymptotically, M scales like the total number of residues in the chain, N, essentially because surface terms are negligible compared with the bulk. However, for a finite-size system, the mean number of contacts per residue (native or nonnative), that is, the coordination number z, is itself a function of N. We can write the native energy as
![]() | (1) |
where
is then defined as the mean native attraction energy
(
< 0); that is, the native state is assumed to be fully collapsed with the maximal number of contacts, and this is the maximal number of total contacts of a fully collapsed polymer globule. We neglect here the separate effects that arise from the variance in the native interaction energies: 
2 = 0
Let the conformational entropy of an ensemble of polymer structures characterized by the order parameters Q and A be given by Sc(Q,A). We can write the entropy in terms of the entropy per residue Sc(Q,A) as:
![]() | (2) |
In addition to the energy scales
and 
2 governing native contacts, there are also two energy scales governing nonnative interactions. One is the mean energy of a nonnative interaction
NN, and the other is the energetic variance of nonnative interactions b2. We keep both of these terms, as they enter the analysis on essentially the same footing. For configurations with MA nonnative contacts, the total nonnative energy is taken to be Gaussianly distributed with mean MA
NN and variance MAb2. Both of these terms contribute to the overall ruggedness of the energy landscape by favoring nonnative configurations.
The strength of nonnative interactions is taken to be weak, so that
![]() | (3a) |
![]() | (3b) |
are both satisfied. Condition 3a implies that the ratio of the folding transition temperature TF to the thermodynamic glass temperature TG is large (Goldstein et al. 1992),
![]() | (4) |
that is, the proteins we consider are strongly (but not infinitely) unfrustratedwe are perturbing away from the G
model. Condition 3b implies that collapse and folding occur concurrently (Klimov and Thirumalai 1996), that is,
![]() | (5) |
where T
is the temperature below which nonnative states tend to be collapsed. For a given choice of nonnative interaction energies, the energies of configurations for the ensemble of states characterized by (Q,A) is assumed Gaussianly distributed with a mean of QM
+ AM
NN and a variance of AMb2. Then the extensive part of the log number of states having energy E and order parameters (Q,A) is given by:
![]() | (6) |
From the definition of equilibrium temperature T1 =
S/
E, one can then find the thermal energy, entropy, and free energy, which are given by (in units where kB = 1):
![]() | (7a) |
![]() | (7b) |
![]() | (7c) |
These expressions can be understood straightforwardly. In the absence of nonnative interactions (
NN = b = 0), the thermal energy is just the energy of native contacts times the number of native contacts, and the entropy is just the configurational entropy. When nonnative energies are present, just as
couples the order parameter Q, so does
NN couple the order parameter A. When nonnative energies have a variance, the lower energy conformations (with stronger nonnative contacts) tend to be thermally occupied. This is why
NN and -b2/T enter on the same footing in the energy. The fact that the system spends more time in fewer states means that the thermal entropy is reduced. However, the entropy (times temperature) is only reduced by half as much as the energy, so that there is a residual contribution to the free energy E - TS due to the variance of nonnative interactions.
A plot of the free energy at the folding temperature of the G
model TF° as a function of (Q,A) is shown in the first row of Figure 1
, for equation 7c together with the analytical model of the configurational entropy SC(Q,A) described below.
|
model, for several different values of b indicated.
Conformational entropy in terms of packing fraction
The fraction of nonnative contacts A is not independent of Q. As more native interactions are present, less nonnative interactions are allowable, and eventually there can be no nonnative contacts in the native structure. Previous studies that investigated the folding rate at fixed stability have explicitly included this Q-dependence in equation 7c (Plotkin 2001). Here, our intention is to plot the folding rate at fixed temperature rather than at fixed stability. For this purpose, it is formally more convenient to keep this Q-dependence implicit in A. Again this manifests itself only as a region of allowed values of (Q,A), which can be seen in Figure 1
.
The entropy loss due to native contacts is of a different functional form than the entropy loss due to nonnative contacts. The entropy loss caused by native contacts arises from a specific set of polymeric constraints. The entropy loss caused by nonnative contact formation arises from an increase in polymer density, a nonspecific constraint. There are many collapsed unfolded states with nonnative interactions present, but only one folded state (neglecting the much smaller entropy caused by native conformational fluctuations).
We note that the conformational entropy SC(Q,A) takes into account the extent to which polymer configurations tend to have residue pairs in proximity, such that if they interacted, that interaction would be considered a nonnative contact. However, the strength of the typical nonnative interaction (~
NN ± b) is controlled by two free parameters in the theory. When both
NN and b are set to zero, the thermal entropy reduces to that of the putative G
model, with the configurational entropy SC(Q,A) remaining unchanged.
The A-dependence in SC(Q,A) is related to the physics of collapse, because at a given value of Q, the fraction of nonnative contacts A depends on the packing fraction
of nonnative polymer. When MQ native contacts are present, AMAX
M(1 - Q) nonnative contacts are allowable, and AMAX nonnative contacts are present when
= 1.
As detailed in the Supplemental Material, a mean field approximation allows one to estimate the conformational entropy Sc(Q,
) of a disordered polymer at Q with packing fraction
as:
|
| (8) |
Here
(Q) =
(Q)1/2 = [nL(Q)/N(1 - Q)]1/2, where
is the mean loop length formed by native contacts at Q (see equation A.14 in the Supplemental Material), and nL(Q) is the total number of loops at Q (see equation A.15 in the Supplemental Material). In equation 8, the quantity in curly brackets is the entropy per residue for the remaining disordered polymer at Q.
Figure 2
shows a plot of the entropy per disordered residue at Q, Snn(Q,
) = S(Q,
)/N(1 - Q), as a function of
, for various values of Q. This shows that the nonnative polymer density where most of the states are [where Snn(
) is maximal] is an increasing function of nativeness, Q.
|
![]() | (9) |
The entropy per residue sc(Q,A) in equation 7b is then obtained from equation 8 using
![]() | (10) |
(see equation A.3 in the Supplemental Material). The free energy surface on which dynamics occurs can then be obtained from equation 7c, and is plotted in the first row of Figure 1
. This is the reaction surface for the coordinates (Q,A).
Effect of nonnative interactions on the free energy barrier and folding rate
In the G
model, nonnative contacts are given coupling energies of zero. The G
folding temperature TF° is taken to be the temperature at which the unfolded and folded thermodynamic states have equal probability. This is given through equation 7c when F(0,A)
F(1,0) and
NN = b2 = 0. We are taking Q
0 in the unfolded state and A = 0 in the folded state (see Fig. 1
). This yields a G
folding temperature of
![]() | (11) |
where A*(Q) is the most probable value of A at a given Q, as determined below.
When considering the simulation data, the folding temperature is taken to be the temperature in the G
model at which the unfolded and folded thermodynamic minima have equal free energies (these minima need not be precisely at Q = 0 and Q = 1).
The most probable value of A at a given Q for a protein in thermal equilibrium, A*(Q), is obtained from:
![]() | (12) |
Using equations 7c and 8 this gives
![]() | (13) |
where
*(Q) is the most probable packing fraction at a given value of Q.
Using the following definitions,
![]() |
![]() |
the minimal free energy at Q, F(Q,A*(Q)), relative to the minimal free energy F(0,A*(0)) in the unfolded state, is obtained from equation 7c:
![]() | (14) |
![]() | (15) |
With the temperature set to the G
transition temperature TF°, the first term in brackets in equation 15 vanishes. The free energy barrier (over TF°) at the G
transition temperature can then be written as
![]() | (16) |
where
F°
is the barrier height at TF° with
NN = b2 = 0, that is, the putative G
barrier height, and is given by
![]() | (17) |
where the saddle point is located at (Q
,A
A*(Q
)). Note that
snn(Q) < 0 because disordered polymer dressing larger native cores is more collapsed than that for smaller native cores. One can see that the barriers scale extensively as a result of the mean field approximations made above.
Thus we see from equation 16 that the folding barrier lowers with increasing nonnative interaction strength, namely, if
NN < 0 (b2 > 0 always), so long as
A*(Q
)
A
> 0. We therefore investigate the conditions for which
A
> 0.
From equation A.1 in the Supplemental Material, the condition
A
> 0 is equivalent to
![]() | (18) |
where
* is determined from equation 13.
We are interested in the effect on the barrier when non-native interactions are imagined to initially increase from zero. For
NN, b
0, the most probable packing fraction is interpreted geometrically through equation 13 as the value of
where the entropy per disordered residue is maximal, that is, the maximum of the curves in Figure 2
. When
NN < 0 and/or b > 0, 3* is determined as the value of 3 slightly to the right of the maximum in the curves in Figure 2
. The most probable packing fraction as a function of Q is plotted in Figure 3
.
|
*(Q) is certainly a monotonically increasing function of Q as can be seen from Figure 3
(Q) by itself (Plotkin 2001).
The derivative of snn in equation 13 can be straightforwardly determined from equation 8, and equation 13 then becomes a nonlinear equation for
*(Q) that can be solved numerically. The result is shown in Figure 3
. The packing fraction increases as the length of disordered loops becomes shorter (inset of Fig. 3
), and thus increases monotonically with nativeness Q.
Once
*(Q) is known,
A*(Q) can be obtained from equation 18. This determines the trend in the barrier height in equation 16. A plot of
A*(Q) is shown in Figure 4
. We can see that if the barrier position Q
resides in a window of Q where
A(Q) > 0, the barrier decreases with increasing nonnative interaction strength, for weak nonnative interactions. Otherwise, the barrier increases with increasing nonnative interaction strength.
|
![]() | (19) |
Increasing the strength of nonnative interactions slows the prefactor k0, owing to the effects of transient trapping. However, as
NN and b are increased from zero, this slowing effect on k0 does not become significant until a nonzero characteristic value, which would indicate the onset of a dynamic glass transition in an infinite-sized system (see Takada et al. 1997; Wang et al. 1997; Eastwood and Wolynes 2001; Plotkin 2001 for more detailed treatments of this effect). In a finite system, the activation time ~k0 1 increases dramatically but only when b bA or
NN >
ANN. The values of the energy scales bA and 
are of order T, thus there is a fairly large window upon increasing b,
NN from zero where the prefactor k0 is unaffected to the first approximation. In this regime, the effects on rate are governed solely by the effects on barrier height. Hence, the decrease in barrier height shown above as
NN, b are increased from zero may be associated with an increase in folding rate.
In the next section we test the theoretical prediction directly with simulations of a model protein. The upshot is shown in Figure 10
, B and C (below), which shows, indeed, an increase in folding rate with increasing nonnative interaction strength.
|
| Comparison between theoretical prediction and simulation results |
|---|
|
|
|---|
model increasingly perturbed by the addition of nonnative interactions (see Supplemental Material for details on simulation). A close and quantitative comparison of the results from theory and simulations is possible if corresponding thermodynamic quantities and reaction coordinates are identified. For this purpose, before we proceed to test the prediction on rate enhancement, three main points of the theory have to be examined in comparison with the results from simulations:
These points are clearly interconnected, and all affect the detailed shape of the free energy landscape, the value of the folding temperature, and the identification of the folded, unfolded, and transition state ensembles. We expect that the assumptions we have made in the analytical theory do not qualitatively change the theoretical predictions; nevertheless, a careful dissection of the basic ingredients we have used is needed for a quantitative assessment of the results.
In the following, we discuss in detail each of the points above. Unless otherwise specified, the following results are all obtained from simulations at the G
folding temperature Tf°, for all values of b/
.
Definition of reaction coordinates
The reaction coordinate Q, defined as the fraction of native contacts formed in a given protein configuration, is readily associated to configurations sampled by simulations (see Supplemental Material). More care has to be used in transposing the other reaction coordinate we have used in the theory, A (defined as the fraction of nonnative contacts formed), to the analysis of simulations data. In the analytical theory, we have assumed that the maximum number of nonnative contacts that can be formed at a certain stage of the folding reaction does not depend on the perturbation strength, and is a function of the degree of nativeness, Q, that is, Amax(Q) = 1 - Q,
b (see equation A.1 in the Supplemental Material). This implies that no nonnative contacts can be formed in the native state (Amax ~ 0 if Q ~ 1), and vice versa (Amax ~ 1 if Q ~ 0). This assumption in the theory allows us to simplify the analytical calculations but does not qualitatively affect the results. The dependence on Q of the maximum number of nonnative contacts can be directly checked in simulations. In this regard, an important difference between theory and simulations is that a certain number (typically ~5) of nonnative contacts can be accommodated in a protein configuration with Q ~ 1 and minimal (<1 Å) RMSD from the PDB native structure. The increased number of contacts around the native configurations arises mainly from the fact that native or nonnative contacts are considered formed in a small but finite length range (typically ~1 Å) around the minimum of the interaction potential. This leads to probable formation of some nonnative contacts as the protein undergoes fluctuations around the native state.
Figure 5
shows that a subset of six nonnative contacts is formed with probability >0.25 in the native state ensemble for b/
= 1.3. Similar results are obtained for all values of b/
used in this study, although the particular set of nonnative contacts formed in each case depends on the choice of nonnative interactions (data not shown).
|
-like Hamiltonian. In fact, contacts that can be made in the native state are not competing against the formation of the native structure; rather, they are assisting it. To remove this effect, we introduce a new reaction coordinate A', defined as the fraction of nonnative contacts formed, restricting the list of nonnative interactions only to the ones with a probability of contact formation in the native state ensemble smaller than a cutoff value pc. The native ensemble for each sequence is identified as all configurations with Q > 0.9 sampled in simulation for that sequence. The results presented in the following are all obtained with a probability cutoff pc = 0.1. Smaller values of pc yield essentially the same results. The reaction coordinate A' is then used in this study to compare results from simulation with the theoretical predictions.
Another approximation that can be directly checked in simulation is on the maximum number of nonnative contacts that can be formed at different stages of the folding reaction. In the analytical theory, the fraction of nonnative contacts, A, is a function of the fraction of native contacts formed in a configuration, Q, and of the packing fraction
of the nonnative part of the protein: MA =
(1 - Q)M (see equation A.1 in the Supplemental Material), with 0
1,
Q. The maximum number of nonnative contacts is then AmaxM = (1 - Q)M, and the maximum total fraction of all contacts (native and nonnative) is (A + Q)max = 1,
Q. Indeed, the maximum number of all contacts (both native and nonnative) recorded in simulations is close to the number of native contacts formed in the native state, that is, M(Q + A')max
M, for all values of the parameter b examined in this study (see Fig. 6A
). Figure 6B
shows the behavior of the average number of nonnative contacts formed in simulation (both coordinates A and A' are plotted), as a function ofQ, for a perturbation b/
= 0.5 (right panel), and the value of Q corresponding to the maximum of
A'
(the corresponding Q for the uncorrected coordinate
A
is also shown). Interestingly, the peak in the average number of nonnative contacts is detected for a value of Q corresponding to a pretransition state stage of the folding. A pre-TS peak is observed in both theory and simulations, although in the theory it is closer to the unfolded state than what is detected in simulations (see Figs. 6B
and 7A
).
|
|
, it is clear from Figures 6A
yield a larger number of nonnative contacts formed, particularly in the unfolded ensemble. Interestingly, however, the number of nonnative interactions rapidly decreases to zero in regions with very small Q. The cause of this effect is not contained in the analytical expressions 7c, where it is assumed that Amax = 1 - Q. This result is caused partly by coupling between nonnative contacts and the angle and dihedral terms in the simulation Hamiltonian (which are not present in the theory). This is a finite size effect that tends to increase the polymer stiffness relative to that in the theory, which used a bulk approximation for thermodynamic quantities. Compact states with
~ 1 in which only nonnative interactions are present have large energetic cost and are formed very rarely. Another source of this effect is that forming collapsed conformations induces some native contacts to be formed, owing to the finite range of interactions. This effect is particularly important for short-range contacts among residues closely separated in sequence, and does not necessarily go away as one considers larger-size systems. This is the complementary effect to the already mentioned fact that in simulations nonnative interactions are formed in the native state (that has led us to a redefinition of the simulation reaction coordinate A').
To quantify this effect, we have generated a large (50) set of nonnative energy distributions with high and very high variance (b/
2 and b/
>> 2). Sequences with these high values of b/
are not able to fold to the selected native structure, but are useful to explore the region of the configurational space corresponding to compact structures with the maximum number of nonnative contacts formed. We expect the glass temperature of these sequences to be higher than their folding temperature (see the next section). After an initialization at very high temperature (T >> Tf°), a large number (>1000) of quenching simulations (T << Tf°) has been performed for each sequence to generate a representative sample of compact misfolded structures. The maximum fraction of nonnative contacts that can be formed, A'max, is thus defined as the largest values of A' among the vast pool of structures obtained by adding the results from the quenching simulations for high b/
values to all configurations collected in simulations at any temperature and for any value of b/
used in this study. Figure 7B
shows the behavior of A'max as a function of the fraction of native contacts present in the structures. The theoretical assumption on the maximum fraction of nonnative contacts Amax = 1 - Q holds remarkably well up to the values of Q
0.15 that correspond to the unfolded state minimum in the free energy landscape (see Fig. 1
). From these results, we then expect the unfolded region of a free energy landscape associated with the simulated protein Hamiltonian to be somewhat compressed toward smaller values of A with respect to the theoretical prediction.
Energy, entropy, and free energy landscape
Figure 1
presents the energy, entropy, and free energy profiles obtained from simulations, as a function of the reaction coordinates Q and A', for three different values of the perturbation parameter b/
. The corresponding quantities obtained from the analytical theory, with all the parameters set equal to the simulations parameters (i.e., rightmost column in Table 1
) and b = 0.3
, is also shown for comparison. For a more direct comparison with the results from simulations, the thermodynamic quantities from theory are only plotted in regions populated with probability >2 x 107, as we have typical samplings of ~5 x 106 configurations in folding/unfolding simulations. It is apparent from Figure 1
that the region of the (Q,A') space populated with high probability in simulations differs somewhat from the (Q,A) region predicted by theory. Several factors are responsible for this difference and have to be considered before one tests the predictive power of the theory with the simulation results:
|
), and shifts the most populated regions of the folding landscape toward lower values of A. This effect also accounts for most of the differences in the energy landscape between theory and simulation results (see Fig. 1
Overall, the destabilization of the folding free energy landscape upon introduction of nonnative energy perturbation is strongly reduced in simulations with respect to what is predicted by the theory. In fact, although in the theory a perturbation of b/
~ 1 renders a protein unfoldable (i.e., Tf / Tg ~ 1; see equation 21 and Plotkin 2001), it is found in simulations that all sequences generated with a perturbation parameter b/
1.7 (entering the Hamiltonian, equation D.33 in the Supplemental Material) are able to reversibly fold/unfold at the G
transition temperature Tf°. The next section quantifies this difference in the destabilization of the folding mechanism by comparing the folding and glass temperatures computed in simulations with their corresponding theoretical predictions.
Folding temperature and glass temperature
The folding temperature Tf of each protein model is estimated in simulations as the temperature corresponding to the peak in the specific heat curve (see Fig. 8A
). This value is in good agreement (within the error bars) with the value obtained from the alternative definition of Tf as the temperature at which the folding and unfolding states have the same free energy (see Fig. 8B
).
|
NN| with
NN < 0), the free energy F(Q) lowers with respect to the G
free energy at fixed temperature T = TF°. During this change of Hamiltonian, the free energy of the native structure remains roughly constant at EN (see Fig. 8C
The thermodynamic glass temperature Tg can be estimated by using the results obtained in the framework of the random energy model (REM; Derrida 1981; Bryngelson and Wolynes 1987). As the energetic frustration of the system arises from randomly assigned nonnative interactions, we assume that the energy of compact (misfolded) structures in the unfolded ensemble is Gaussianly distributed, with mean value
Enn (Qu)
= MAmax
NN and variance
Enn2(Qu) = b2AmaxM, where MAmax is the maximum number of nonnative contacts the protein can form. In the theory, the maximum number of nonnative contacts was approximated at Q ~ 0 as the total number of native contacts MAmax = M. As we have already discussed above, the actual maximum number of nonnative contacts detected in simulations is smaller than the theoretical value, and it is expected to (slightly) vary with different realizations of the nonnative noise (see Fig. 6
).
The REM glass temperature is defined by the vanishing of the thermal entropy (Derrida 1981; Bryngelson and Wolynes 1987), which corresponds to setting equation 7b to 0:
![]() | (20) |
However, here we let Amax be a new parameter. This gives for the glass temperature:
![]() | (21) |
where
Enn(Qu) = MAmaxb2 is the energetic variance over the set of misfolded structures. For each protein model (i.e., each value of b/
), we have performed several (>500) short quenching simulations to explore the compact configurations in the unfolded ensemble. A different open configuration is initially created by means of ancillary high-temperature simulations (with T >Tf), then rapidly quenched to very low temperatures (T ~ Tf/10, T ~ Tf/25, and T ~ Tf/50). The fluctuations of the nonnative energy in the compact misfolded configurations recorded during the quenching simulations are used to compute
Enn(Qu) entering expression 21.
Figure 9A
shows the folding temperature Tf and the glass temperature Tg obtained from simulation, as a function of the strength of the nonnative energy perturbation, b (in units of the native energy per contact,
). The folding temperature is almost constant in the range shown, while the glass temperature rises from zero (b = 0 corresponds to the plain G
-like model with no energetic frustration; see equation D.3 in the Supplemental Material), to values close to Tf for large nonnative perturbations (b
1.6). When Tg/Tf
1, many low-energy misfolded structures compete with the native state, and folding is dramatically slowed down. As the ratio Tg/Tf increases beyond unity, the system is no longer self-averaging, and different realizations of the non-native perturbation can lead to different folding mechanisms consistent with the same native topology. This point is discussed in a separate publication (S.S. Plotkin and C. Clementi, unpu