Protein Science Sheba protein
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2004), 13:1750-1766. Published by Cold Spring Harbor Laboratory Press. Copyright © 2004 The Protein Society
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Clementi, C.
Right arrow Articles by Plotkin, S. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Clementi, C.
Right arrow Articles by Plotkin, S. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

The effects of nonnative interactions on protein folding rates: Theory and simulation

Cecilia Clementi1,2 and Steven S. Plotkin3

1 Department of Chemistry, and W.M. Keck Center for Computational and Structural Biology, Rice University, Houston, Texas 77005, USA
2 Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas 77030, USA
3 Department of Physics and Astronomy, University of British Columbia, Vancouver, BC V6T1Z1, Canada

Reprint requests to: Cecilia Clementi, Department of Chemistry, Rice University, 6100 Main Street, Houston, TX 77005, USA; e-mail: cecilia{at}rice.edu; fax: (713) 348-5155; or Steven S. Plotkin, Department of Physics and Astronomy, University of British Columbia, 6224 Agricultural Road, Vancouver, BC V6T1Z1, Canada; e-mail: steve{at}physics.ubc.ca; fax: (604) 822-5324.

(RECEIVED December 17, 2003; FINAL REVISION March 22, 2004; ACCEPTED March 25, 2004)


    Abstract
 TOP
 Abstract
 Introduction
 Theory of folding with...
 Comparison between theoretical...
 Conclusions
 References
 
Proteins are minimally frustrated polymers. However, for realistic protein models, nonnative interactions must be taken into account. In this paper, we analyze the effect of nonnative interactions on the folding rate and on the folding free energy barrier. We present an analytic theory to account for the modification on the free energy landscape upon introduction of nonnative contacts, added as a perturbation to the strong native interactions driving folding. Our theory predicts a rate-enhancement regime at fixed temperature, under the introduction of weak, nonnative interactions. We have thoroughly tested this theoretical prediction with simulations of a coarse-grained protein model, by using an off-lattice C{alpha}model of the src-SH3 domain. The strong agreement between results from simulations and theory confirm the nontrivial result that a relatively small amount of nonnative interaction energy can actually assist the folding to the native structure.

Keywords: protein folding; frustration; free energy landscape; folding rate; minimalist model; molecular dynamics

Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03580104.

Supplemental material: see www.proteinscience.org


    Introduction
 TOP
 Abstract
 Introduction
 Theory of folding with...
 Comparison between theoretical...
 Conclusions
 References
 
The mechanism of protein folding is of central importance to structural and functional biology (see, e.g., Winkler and Gray 1998; Fersht 2000; Creighton 2002; Plotkin and Onuchic 2002a, b). An understanding of the fundamental physical–chemical factors regulating the folding process may help provide answers to some of the long-outstanding problems in both functional genomics and biotechnology: Rational design of drugs and enzymes, potential control of genetic diseases, and a deeper understanding of the connection between biological structure and function are among the applications that may benefit from advances in protein folding.

Theoretical and computational studies have recently achieved noticeable success in reproducing various features of the folding mechanisms of several small- to medium-sized fast-folding proteins (see, e.g., Shoemaker et al. 1999; Sorenson and Head-Gordon 2000 Sorenson and Head-Gordon 2002; Shea and Brooks III 2001; Shimada and Shakhnovich 2002; Clementi et al. 2003; Karanicolas and Brooks 2003; Kaya and Chan 2003); at the same time, the improved spatial and temporal resolution of recent experimental techniques is now allowing researchers to combine theoretical and experimental data to give a more robust characterization of the folding free energy landscape (Lapidus et al. 2000; Ervin and Gruebele 2002; Schuler et al. 2002; Snow et al. 2002; Kubelka et al. 2003; Pande 2003). However, in spite of these recent successes, a microscopically detailed observation of the individual conformational motions that occur during folding remains elusive. Knowledge of the time-dependence of every degree of freedom in the system is, however, not of inherent interest, because no additional insight to the underlying physics of the folding process is gained from this information by itself. Nor is any particular degree of freedom especially important to folding, because the transition involves the cooperation of many weakly (noncovalently) interacting constituents. For these reasons, a statistical description of the process of folding, in terms of the behavior of an ensemble of systems, is appropriate for distinguishing general (self-averaging) properties from sequence-specific ones (Bryngelson et al. 1995). The characterization of the folding process in statistical mechanical terms can pinpoint crucial questions that may be computationally or experimentally addressed in more detail.

The idea of considering ensemble properties to characterize the folding landscape underpinned studies of the transition state and folding mechanism as arising from the native-state topology (Shoemaker et al. 1997; Alm and Baker 1999; Galzitskaya and Finkelstein 1999; Munoz and Eaton 1999; Clementi et al. 2000a, b, 2001, 2003; Shea et al. 2000). As a general rule, the transition-state structure does not differ dramatically between homologous proteins (Baker 2000; Plaxco et al. 2000a; Gunasekaran et al. 2001), and any exceptions are fairly readily explained (Ferguson et al. 1999; Kim et al. 2000). Consistent with the above-mentioned notion of self-averaging, folding rates of homologous proteins are seldom seen to differ by more than an order of magnitude when tuned to the same stability (Mines et al. 1996; Plaxco et al. 2000b). This indicates that the folding free energy barrier is not particularly sensitive to the details of sequences folding to a given native structure, but depends rather on more general features of that ensemble of sequences, including the kinetic accessibility of that native structure. In this sense, the topology of the native structure largely determines the folding free energy barrier for those homologous sequences (Plaxco et al. 2000b).

These ideas motivated many studies of folding rates and mechanisms using so-called Go models (Ueda et al. 1975), which neglect interactions not present in the native state. In these studies, the possibility of structure prediction is traded for the possibility of rate and mechanism prediction. Moreover, because of the robustness of rate and mechanism for homologous proteins, the coarse graining of the Go model (i.e., removing the molecular details of side chains and solvent) is often assumed a reasonable approximation.

Topology-based approaches seek to predict mechanism by calculating {phi}-values (Fersht et al. 1986; Matouschek et al. 1990) or analogous quantities, which in an accurate theory give values that correlate with experiment for the measured cases. Occasionally one finds residues whose {phi}-values are negative. This is most likely caused by the presence of nonnative contacts that stabilize the transition state, but cannot be present in the native state. The presence of nonnative interactions in the transition state is supported by all-atom simulations using a Charmm-based effective energy function, where it was found that ~20%–25% of the energy in the transition state arose from nonnative contacts (Paci et al. 2002).

Hence, for a more realistic protein potential energy function, nonnative interactions must be taken into account. In this paper, we analyze the effect of increasing the strength of nonnative interactions on the folding rate as well as the free energy barrier. Nonnative interactions are introduced as additional contacts between pairs of residues not in contact in the native structure, which are allowed to have a nonzero mean and a nonzero variance. The nonnative interactions are added perturbatively to the Go model: All nonnative contacts are given a random energy with mean {epsilon}NN and a variance b2, which is progressively increased to examine more frustrated proteins, while the native contact energies are all held fixed to the same number. The limiting case of {epsilon}NN = 0 and b = 0 corresponds to the plain Go model. This procedure essentially preserves the stability of the native state, where approximately no nonnative interactions are present. However, the stability of the unfolded state is lowered (as shown below).

At first glance, one would expect that introducing progressively larger nonnative contact energies to an otherwise energetically unfrustrated Go protein would slow the folding rate, for straightforward reasons: It would seem that "noise" in the system would make the native basin harder to recognize. One might argue by analogy that it is easier to read a page of text without random misspellings. However, the folding rate has been predicted to initially increase under the introduction of weak, nonnative interactions, added as a perturbation to the strong native interactions driving folding (Plotkin 2001). This was a fold-independent result derived from general principles of energy landscape theory. This prediction was subsequently verified in simulations of a 36-mer lattice model (Fan et al. 2002), as well as off-lattice molecular dynamics simulations of Crambin, in which attractive nonnative contacts were successively added (Cieplak and Hoang 2002). Independently, it was found that nonnative interactions were present in the transition state of a 28-mer lattice-model protein with side chains, and increased the folding rate when strengthened (Li et al. 2000). Similar observations were also seen in two-dimensional 24-mer lattice models (Treptow et al. 2002). A different computational study on a 36-mer lattice-model protein found that at the temperature of fastest folding in simulation models, the folding rate monotonically decreases with increasing ruggedness (Fan et al. 2002; the temperature of fastest folding, of course, varies with the ruggedness). However, this typically barrierless regime is rarely seen in the laboratory (Gruebele 1999; Sabelko et al. 1999).

The prediction that strengthening nonnative interactions that were initially weak would accelerate folding is also consistent with experimental observations that strengthening nonspecific hydrophobic stabilization in the {alpha}-spectrin Src homology 3 (SH3) domain sped up folding (and unfolding) for that protein (Viguera et al. 2002). This result was significantly nontrivial, to the extent that the experimental observation was originally interpreted (mistakenly) as evidence against the energy landscape theory.

In this paper, we test this prediction with simulations of a coarse-grained protein model, by using an off-lattice C{alpha} model (see e.g., Honeycutt and Thirumalai 1992; Clementi et al. 2000b) of the SH3 domain of src tyrosine-protein kinase (src SH3). We use a Hamiltonian function that has tunable amounts of nonnative energy (see Supplemental Material for details). The results from simulations are compared with the predictions of an improved version of the existing theory (Plotkin 2001). The theory is improved by introducing a finite-size treatment of packing fraction as a function of polymer length, which takes better account of the polymer physics involved in collapse as folding progresses. Moreover, the previous study treated the rate enhancement at fixed stability. Here, we show a perhaps even less intuitive result, namely, that the rate enhancement can happen at fixed temperature, and we derive the conditions required for this to happen.

As the strength of nonnative interactions is increased to larger values, we find that eventually the folding rate decreases drastically, as expected. In the limit of large non-native contact energies, the chain behaves like a random heteropolymer, having misfolded structures more stable than the native state.

The folding mechanism is also nontrivially affected by the introduction of nonnative interactions. In this regard, the analysis of the robustness of the folding mechanism against an increasingly strong perturbation on the non-native interactions can provide a critical assessment on the validity of unfrustrated protein models for the prediction of folding mechanism, for different protein topologies. This analysis goes beyond the scope of the present paper, and it will be addressed separately (S.S. Plotkin and C. Clementi, unpubl.).

The paper is organized as follows: In the next section, we present the theory. After presenting the general ideas and overall strategy, we discuss in detail how an explicit expression for the conformational entropy can be obtained in terms of the packing fraction. We then use this result to show how the thermodynamic free energy barrier is lowered by the presence of nonnative interactions. In the third section, we test the theoretical predictions with direct simulation of the src-SH3 domain. We first compare the definition of reaction coordinates and the relative approximations of theory and simulations; thermodynamic and kinetic quantities obtained from simulations are then quantitatively compared with the corresponding theoretical predictions.

The strong agreement between results from simulations and theory confirms the nontrivial result that a relatively small amount of nonnative interaction energy can actually assist the folding to the native structure.


    Theory of folding with nonnative interactions
 TOP
 Abstract
 Introduction
 Theory of folding with...
 Comparison between theoretical...
 Conclusions
 References
 
Definition of the general strategy
Thermodynamic quantities relevant to folding may be obtained from an analysis of the density of states in the presence of energetic correlations (Plotkin and Onuchic 2002a, b,b). In this context, we introduce two order parameters. We let Q be the fraction of contacts shared between an arbitrary structure and the native structure, and we let A be the fraction of possible nonnative contacts present in that structure, that is, the number of nonnative contacts divided by the total possible number of nonnative contacts. These two order parameters are natural for the study of nonnative interactions in protein folding. Both take on values between zero and unity.

There are several relevant energy and entropy scales governing the thermodynamics of folding. Let the energy of the native structure be given by EN. Let the total number of contact interactions in a fully collapsed polymer globule be given by M. Asymptotically, M scales like the total number of residues in the chain, N, essentially because surface terms are negligible compared with the bulk. However, for a finite-size system, the mean number of contacts per residue (native or nonnative), that is, the coordination number z, is itself a function of N. We can write the native energy as


(1)

where {epsilon} is then defined as the mean native attraction energy {epsilon} ({epsilon} < 0); that is, the native state is assumed to be fully collapsed with the maximal number of contacts, and this is the maximal number of total contacts of a fully collapsed polymer globule. We neglect here the separate effects that arise from the variance in the native interaction energies: {delta}{epsilon}2 = 0

Let the conformational entropy of an ensemble of polymer structures characterized by the order parameters Q and A be given by Sc(Q,A). We can write the entropy in terms of the entropy per residue Sc(Q,A) as:


(2)

In addition to the energy scales {epsilon} and {delta}{epsilon}2 governing native contacts, there are also two energy scales governing nonnative interactions. One is the mean energy of a nonnative interaction {epsilon}NN, and the other is the energetic variance of nonnative interactions b2. We keep both of these terms, as they enter the analysis on essentially the same footing. For configurations with MA nonnative contacts, the total nonnative energy is taken to be Gaussianly distributed with mean MA{epsilon}NN and variance MAb2. Both of these terms contribute to the overall ruggedness of the energy landscape by favoring nonnative configurations.

The strength of nonnative interactions is taken to be weak, so that


(3a)


(3b)

are both satisfied. Condition 3a implies that the ratio of the folding transition temperature TF to the thermodynamic glass temperature TG is large (Goldstein et al. 1992),


(4)

that is, the proteins we consider are strongly (but not infinitely) unfrustrated—we are perturbing away from the Go model. Condition 3b implies that collapse and folding occur concurrently (Klimov and Thirumalai 1996), that is,


(5)

where T{theta} is the temperature below which nonnative states tend to be collapsed. For a given choice of nonnative interaction energies, the energies of configurations for the ensemble of states characterized by (Q,A) is assumed Gaussianly distributed with a mean of QM{epsilon}+ AM{epsilon}NN and a variance of AMb2. Then the extensive part of the log number of states having energy E and order parameters (Q,A) is given by:


(6)

From the definition of equilibrium temperature T–1 = {partial}S/ {partial}E, one can then find the thermal energy, entropy, and free energy, which are given by (in units where kB = 1):


(7a)


(7b)


(7c)

These expressions can be understood straightforwardly. In the absence of nonnative interactions ({epsilon}NN = b = 0), the thermal energy is just the energy of native contacts times the number of native contacts, and the entropy is just the configurational entropy. When nonnative energies are present, just as {epsilon} couples the order parameter Q, so does {epsilon}NN couple the order parameter A. When nonnative energies have a variance, the lower energy conformations (with stronger nonnative contacts) tend to be thermally occupied. This is why {epsilon}NN and -b2/T enter on the same footing in the energy. The fact that the system spends more time in fewer states means that the thermal entropy is reduced. However, the entropy (times temperature) is only reduced by half as much as the energy, so that there is a residual contribution to the free energy E - TS due to the variance of nonnative interactions.

A plot of the free energy at the folding temperature of the Go model TF° as a function of (Q,A) is shown in the first row of Figure 1Go, for equation 7c together with the analytical model of the configurational entropy SC(Q,A) described below.



View larger version (82K):
[in this window]
[in a new window]
 
Figure 1. Free energy (first column from the left), energy (second column), and entropy (third column) surfaces as functions of the fraction Q of native contacts, and the fraction A of nonnative contacts, as obtained from theory and simulations. Also shown is the fraction of states, n(E), populated as a function of the energy E. The distribution of the energy in the unfolded ensemble is shown, along with the distribution in the native state (fourth column). The distribution n(E) is normalized; that is, the integral of n(E) over all energies is 1 in all the cases plotted here. All free energy contours are spaced at ~1{epsilon} (where {epsilon} is the energy per native contact). Values of the parameters are given in Table 1Go. (Top row) Theoretical free energy, energy, and entropy surface at the folding temperature, obtained from equation 7c and equation A.16 in the Supplemental Material with all parameters set equal to the corresponding simulation values (see Table 1Go) and btheory = 0.3{epsilon}, where {epsilon} is the energy per native contact (this corresponds to 0.9{epsilon} < b < 1.3{epsilon} in the simulations; see text for detail). The transition state has more nonnative contacts than the unfolded state. The difference in the theoretical model is {Delta}A{ddagger} ~ 0.035. This amounts to an increase in the total number of nonnative contacts of MA ~ 5. The barrier height is ~3.47{epsilon} ~ 3.3kBTf. (Bottom three rows) Corresponding results obtained from simulations, for three different values of the nonnative energy perturbation parameter b: b = 0.5{epsilon} (second row), b = 0.9{epsilon} (third row), and b = 1.3{epsilon} (bottom row). The barrier heights and values of {Delta}A{ddagger} obtained in simulations are plotted in Figure 10Go as a function of the nonnative energy perturbation parameter b.

 
Figure 1Go also shows plots of E(Q,A), S(Q,A), and F(Q,A), as well as the number of states at energy E, taken from the simulation data for the off-lattice model (see below). Plots are at the folding temperature TF° of the Go model, for several different values of b indicated.

Conformational entropy in terms of packing fraction
The fraction of nonnative contacts A is not independent of Q. As more native interactions are present, less nonnative interactions are allowable, and eventually there can be no nonnative contacts in the native structure. Previous studies that investigated the folding rate at fixed stability have explicitly included this Q-dependence in equation 7c (Plotkin 2001). Here, our intention is to plot the folding rate at fixed temperature rather than at fixed stability. For this purpose, it is formally more convenient to keep this Q-dependence implicit in A. Again this manifests itself only as a region of allowed values of (Q,A), which can be seen in Figure 1Go.

The entropy loss due to native contacts is of a different functional form than the entropy loss due to nonnative contacts. The entropy loss caused by native contacts arises from a specific set of polymeric constraints. The entropy loss caused by nonnative contact formation arises from an increase in polymer density, a nonspecific constraint. There are many collapsed unfolded states with nonnative interactions present, but only one folded state (neglecting the much smaller entropy caused by native conformational fluctuations).

We note that the conformational entropy SC(Q,A) takes into account the extent to which polymer configurations tend to have residue pairs in proximity, such that if they interacted, that interaction would be considered a nonnative contact. However, the strength of the typical nonnative interaction (~{epsilon}NN ± b) is controlled by two free parameters in the theory. When both {epsilon}NN and b are set to zero, the thermal entropy reduces to that of the putative Go model, with the configurational entropy SC(Q,A) remaining unchanged.

The A-dependence in SC(Q,A) is related to the physics of collapse, because at a given value of Q, the fraction of nonnative contacts A depends on the packing fraction {eta} of nonnative polymer. When MQ native contacts are present, AMAX {equiv} M(1 - Q) nonnative contacts are allowable, and AMAX nonnative contacts are present when {eta} = 1.

As detailed in the Supplemental Material, a mean field approximation allows one to estimate the conformational entropy Sc(Q,{eta} ) of a disordered polymer at Q with packing fraction {eta} as:



(8)

Here (Q) = (Q)–1/2 = [nL(Q)/N(1 - Q)]1/2, where is the mean loop length formed by native contacts at Q (see equation A.14 in the Supplemental Material), and nL(Q) is the total number of loops at Q (see equation A.15 in the Supplemental Material). In equation 8, the quantity in curly brackets is the entropy per residue for the remaining disordered polymer at Q.

Figure 2Go shows a plot of the entropy per disordered residue at Q, Snn(Q,{eta}) = S(Q, {eta})/N(1 - Q), as a function of {eta}, for various values of Q. This shows that the nonnative polymer density where most of the states are [where Snn({eta}) is maximal] is an increasing function of nativeness, Q.



View larger version (21K):
[in this window]
[in a new window]
 
Figure 2. The entropy per residue Snn(Q, {eta}) = Sc(q, {eta})/N(1 - Q) in equation A.16 in the Supplemental Material, for the disordered part of a protein of nativeness Q, as a function of the disordered polymer’s packing fraction {eta}.

 
From equations 2 and 8,


(9)

The entropy per residue sc(Q,A) in equation 7b is then obtained from equation 8 using


(10)

(see equation A.3 in the Supplemental Material). The free energy surface on which dynamics occurs can then be obtained from equation 7c, and is plotted in the first row of Figure 1Go. This is the reaction surface for the coordinates (Q,A).

Effect of nonnative interactions on the free energy barrier and folding rate
In the Go model, nonnative contacts are given coupling energies of zero. The Go folding temperature TF° is taken to be the temperature at which the unfolded and folded thermodynamic states have equal probability. This is given through equation 7c when F(0,A) {approx} F(1,0) and {epsilon}NN = b2 = 0. We are taking Q {approx} 0 in the unfolded state and A = 0 in the folded state (see Fig. 1Go). This yields a Go folding temperature of


(11)

where A*(Q) is the most probable value of A at a given Q, as determined below.

When considering the simulation data, the folding temperature is taken to be the temperature in the Go model at which the unfolded and folded thermodynamic minima have equal free energies (these minima need not be precisely at Q = 0 and Q = 1).

The most probable value of A at a given Q for a protein in thermal equilibrium, A*(Q), is obtained from:


(12)

Using equations 7c and 8 this gives


(13)

where {eta}*(Q) is the most probable packing fraction at a given value of Q.

Using the following definitions,



the minimal free energy at Q, F(Q,A*(Q)), relative to the minimal free energy F(0,A*(0)) in the unfolded state, is obtained from equation 7c:


(14)


(15)

With the temperature set to the Go transition temperature TF°, the first term in brackets in equation 15 vanishes. The free energy barrier (over TF°) at the Go transition temperature can then be written as


(16)

where {Delta}F° != is the barrier height at TF° with {epsilon}NN = b2 = 0, that is, the putative Go barrier height, and is given by


(17)

where the saddle point is located at (Q!=,A!= {equiv} A*(Q!=)). Note that {Delta}snn(Q) < 0 because disordered polymer dressing larger native cores is more collapsed than that for smaller native cores. One can see that the barriers scale extensively as a result of the mean field approximations made above.

Thus we see from equation 16 that the folding barrier lowers with increasing nonnative interaction strength, namely, if {epsilon}NN < 0 (b2 > 0 always), so long as {Delta}A*(Q!=) {equiv} {Delta}A!= > 0. We therefore investigate the conditions for which {Delta}A!= > 0.

From equation A.1 in the Supplemental Material, the condition {Delta}A!= > 0 is equivalent to


(18)

where {eta} * is determined from equation 13.

We are interested in the effect on the barrier when non-native interactions are imagined to initially increase from zero. For {epsilon}NN, b {approx} 0, the most probable packing fraction is interpreted geometrically through equation 13 as the value of {eta} where the entropy per disordered residue is maximal, that is, the maximum of the curves in Figure 2Go. When {epsilon}NN < 0 and/or b > 0, 3* is determined as the value of 3 slightly to the right of the maximum in the curves in Figure 2Go. The most probable packing fraction as a function of Q is plotted in Figure 3Go.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 3. The most probable packing fraction {eta} * is a monotonically increasing function of nativeness Q. The dashed curve shows the characteristic packing fraction when the disordered loops are assumed to obey ideal chain statistics. The solid curve accounts for the effects of excluded volume, which are included in equation 8. (Inset) The most probable packing fraction is a decreasing function of the mean disordered loop length in equation A.14 in the Supplemental Material.

 
Equation 18 is not a particularly robust condition. Whereas {eta} *(Q) is certainly a monotonically increasing function of Q as can be seen from Figure 3Go, the factor of (1 - Q) in equation 18 de-emphasizes, or may reverse, the trend in A*(Q). In the earlier work addressing the trend in rates at fixed stability rather than fixed temperature, the factor determining whether rates would increase was merely the increase in packing fraction {Delta}{eta} (Q) by itself (Plotkin 2001).

The derivative of snn in equation 13 can be straightforwardly determined from equation 8, and equation 13 then becomes a nonlinear equation for {eta}*(Q) that can be solved numerically. The result is shown in Figure 3Go. The packing fraction increases as the length of disordered loops becomes shorter (inset of Fig. 3Go), and thus increases monotonically with nativeness Q.

Once {eta}*(Q) is known, {Delta}A*(Q) can be obtained from equation 18. This determines the trend in the barrier height in equation 16. A plot of {Delta}A*(Q) is shown in Figure 4Go. We can see that if the barrier position Q!= resides in a window of Q where {Delta}A(Q) > 0, the barrier decreases with increasing nonnative interaction strength, for weak nonnative interactions. Otherwise, the barrier increases with increasing nonnative interaction strength.



View larger version (17K):
[in this window]
[in a new window]
 
Figure 4. Fractional change in the number of nonnative interactions as a function of nativeness Q, for the theoretical model. We can see that the number of nonnative interactions initially increases before decreasing. For the model considered here, the barrier position Q!= is well within this region of values where the number of nonnative interactions has increased. There are generically more nonnative interactions present in the transition state than in the unfolded state, for strongly minimally frustrated proteins. The effect is fairly modest—for a 100-residue protein, there are about six more nonnative interactions in the transition state. The shape of the curve is obtained from setting {partial}F/{partial}A|Q = 0 in the first row of Figure 1Go.

 
When nonnative interactions are weak, the folding kinetics are single exponential:


(19)

Increasing the strength of nonnative interactions slows the prefactor k0, owing to the effects of transient trapping. However, as {epsilon}NN and b are increased from zero, this slowing effect on k0 does not become significant until a nonzero characteristic value, which would indicate the onset of a dynamic glass transition in an infinite-sized system (see Takada et al. 1997; Wang et al. 1997; Eastwood and Wolynes 2001; Plotkin 2001 for more detailed treatments of this effect). In a finite system, the activation time ~k0 –1 increases dramatically but only when b bA or {epsilon}NN > {epsilon}ANN. The values of the energy scales bA and {epsilon} are of order T, thus there is a fairly large window upon increasing b, {epsilon}NN from zero where the prefactor k0 is unaffected to the first approximation. In this regime, the effects on rate are governed solely by the effects on barrier height. Hence, the decrease in barrier height shown above as {epsilon}NN, b are increased from zero may be associated with an increase in folding rate.

In the next section we test the theoretical prediction directly with simulations of a model protein. The upshot is shown in Figure 10Go, B and C (below), which shows, indeed, an increase in folding rate with increasing nonnative interaction strength.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 10. (A) Average difference in the number of nonnative contacts in the transition state and unfolded state ensemble, as detected from simulation, as a function of the perturbation parameter b/{epsilon}. The values obtained by considering all nonnative contacts are shown as open circles, whereas filled gray circles correspond to the corrected values (i.e., considering only nonnative contacts that are not formed in the folded state; see text for details). The average of these quantities over all the considered values of b/{epsilon} are plotted as continuous straight lines of the same color. These numbers are comparable to the theoretical estimates of approximately six more nonnative contacts in the transition state for a 100-residue protein (see Fig. 4Go). (B) Barrier height {Delta}F{dagger} (open circles) and log folding rate k (filled gray circles) as a function of the nonnative energy perturbation strength b (for {epsilon}NN = 0), and (C) log folding rate k as a function of the average nonnative energy {epsilon}NN (for b = 0). The parameters controlling the strength of the nonnative energy, b2/2T and {epsilon}NN, enter the free energy at the same footing in the theoretical model (see equation 7c). Results from simulations are in very good agreement with the theoretical prediction (dashed black curve in B and C) obtained when the value of {Delta}A'!= —shown in A—is used as an estimate of {Delta}A!= {equiv} {Delta} A*(Q!=) entering equation 16 predicted by the theory. Values of lnk and {Delta}F{dagger} are normalized to the corresponding values for the unperturbed case (lnk0 and {Delta}F0{dagger}). For large nonnative energy perturbations (b > 1, or {epsilon}NN > 0.5), both {Delta}F{dagger}(Tf°) and lnk rapidly decrease (see also Figure 8B,CGo). The energy parameters b and {epsilon}NN are measured in units of native energy per contact, {epsilon}. Barrier heights are measured in units of the folding temperature for the unperturbed case (kBT0).

 

    Comparison between theoretical prediction and simulation results
 TOP
 Abstract
 Introduction
 Theory of folding with...
 Comparison between theoretical...
 Conclusions
 References
 
We have thoroughly explored the range of validity of the approximations made in the analytic theory by comparing the predictions with the results obtained from simulations on a Go model increasingly perturbed by the addition of nonnative interactions (see Supplemental Material for details on simulation).

A close and quantitative comparison of the results from theory and simulations is possible if corresponding thermodynamic quantities and reaction coordinates are identified. For this purpose, before we proceed to test the prediction on rate enhancement, three main points of the theory have to be examined in comparison with the results from simulations:

These points are clearly interconnected, and all affect the detailed shape of the free energy landscape, the value of the folding temperature, and the identification of the folded, unfolded, and transition state ensembles. We expect that the assumptions we have made in the analytical theory do not qualitatively change the theoretical predictions; nevertheless, a careful dissection of the basic ingredients we have used is needed for a quantitative assessment of the results.

In the following, we discuss in detail each of the points above. Unless otherwise specified, the following results are all obtained from simulations at the Go folding temperature Tf°, for all values of b/{epsilon}.

Definition of reaction coordinates
The reaction coordinate Q, defined as the fraction of native contacts formed in a given protein configuration, is readily associated to configurations sampled by simulations (see Supplemental Material). More care has to be used in transposing the other reaction coordinate we have used in the theory, A (defined as the fraction of nonnative contacts formed), to the analysis of simulations data. In the analytical theory, we have assumed that the maximum number of nonnative contacts that can be formed at a certain stage of the folding reaction does not depend on the perturbation strength, and is a function of the degree of nativeness, Q, that is, Amax(Q) = 1 - Q, {forall}b (see equation A.1 in the Supplemental Material). This implies that no nonnative contacts can be formed in the native state (Amax ~ 0 if Q ~ 1), and vice versa (Amax ~ 1 if Q ~ 0). This assumption in the theory allows us to simplify the analytical calculations but does not qualitatively affect the results. The dependence on Q of the maximum number of nonnative contacts can be directly checked in simulations. In this regard, an important difference between theory and simulations is that a certain number (typically ~5) of nonnative contacts can be accommodated in a protein configuration with Q ~ 1 and minimal (<1 Å) RMSD from the PDB native structure. The increased number of contacts around the native configurations arises mainly from the fact that native or nonnative contacts are considered formed in a small but finite length range (typically ~1 Å) around the minimum of the interaction potential. This leads to probable formation of some nonnative contacts as the protein undergoes fluctuations around the native state.

Figure 5Go shows that a subset of six nonnative contacts is formed with probability >0.25 in the native state ensemble for b/{epsilon} = 1.3. Similar results are obtained for all values of b/{epsilon} used in this study, although the particular set of nonnative contacts formed in each case depends on the choice of nonnative interactions (data not shown).



View larger version (97K):
[in this window]
[in a new window]
 
Figure 5. Probability of formation of nonnative contacts in the native configuration of SH3. Black dots in the contact map represent native contacts, whereas nonnative contacts formed with probability >0.25 are shown with probabilities given by the gray scale on top. Probability values are computed by averaging the formation of nonnative contacts over {gtrsim} 50,000 configurations with Q > 0.9 from folding/unfolding simulations. The data shown in this figure are for a nonnative perturbation strength b/{epsilon} = 1.3. Similar results are obtained for different values of the parameter b (see Fig. 6Go).

 
Contacts that are easily formed in the native state cannot be considered nonnative, even when they are not listed as native contacts in the unperturbed Go-like Hamiltonian. In fact, contacts that can be made in the native state are not competing against the formation of the native structure; rather, they are assisting it. To remove this effect, we introduce a new reaction coordinate A', defined as the fraction of nonnative contacts formed, restricting the list of nonnative interactions only to the ones with a probability of contact formation in the native state ensemble smaller than a cutoff value pc. The native ensemble for each sequence is identified as all configurations with Q > 0.9 sampled in simulation for that sequence. The results presented in the following are all obtained with a probability cutoff pc = 0.1. Smaller values of pc yield essentially the same results. The reaction coordinate A' is then used in this study to compare results from simulation with the theoretical predictions.

Another approximation that can be directly checked in simulation is on the maximum number of nonnative contacts that can be formed at different stages of the folding reaction. In the analytical theory, the fraction of nonnative contacts, A, is a function of the fraction of native contacts formed in a configuration, Q, and of the packing fraction {eta} of the nonnative part of the protein: MA = {eta} (1 - Q)M (see equation A.1 in the Supplemental Material), with 0 ≤ {eta} ≤ 1, {forall}Q. The maximum number of nonnative contacts is then AmaxM = (1 - Q)M, and the maximum total fraction of all contacts (native and nonnative) is (A + Q)max = 1, {forall}Q. Indeed, the maximum number of all contacts (both native and nonnative) recorded in simulations is close to the number of native contacts formed in the native state, that is, M(Q + A')max ~= M, for all values of the parameter b examined in this study (see Fig. 6AGo). Figure 6BGo shows the behavior of the average number of nonnative contacts formed in simulation (both coordinates A and A' are plotted), as a function ofQ, for a perturbation b/{epsilon} = 0.5 (right panel), and the value of Q corresponding to the maximum of <A'> (the corresponding Q for the uncorrected coordinate <A> is also shown). Interestingly, the peak in the average number of nonnative contacts is detected for a value of Q corresponding to a pretransition state stage of the folding. A pre-TS peak is observed in both theory and simulations, although in the theory it is closer to the unfolded state than what is detected in simulations (see Figs. 6BGo and 7AGo).



View larger version (28K):
[in this window]
[in a new window]
 
Figure 6. (A, lower panel) The maximum number of nonnative contacts registered in simulations for different values of the perturbation parameter b (in units of {epsilon}). Open circles indicate the maximum in the reaction coordinate A, whereas filled gray circles correspond to the corrected coordinate A' (see text for details). (Upper panel) The maximum number of all contacts (both native and nonnative) for different values of b. Open squares indicate the maximum value obtained when all contacts separated by at least three residues along the sequence are considered (i.e., Q + A); filled gray squares correspond to the values obtained when nonnative contacts likely to be formed in the native structure are removed (i.e., Q + A'). (B, right panel) Average number of nonnative contacts <A> formed in simulations as a function of the number of native contacts formed, for a perturbation parameter b/{epsilon} = 0.5. Horizontal bars at the maximum of <A> correspond to the standard deviation around the average at the peak value. The black curve corresponds to the coordinate A, and the gray line to A'. (Left panel) The values of Q corresponding to the maximum of <A> and <A'> for different values of the perturbation parameter b (open circles and filled gray circles, respectively).

 


View larger version (27K):
[in this window]
[in a new window]
 
Figure 7. (A, upper panel) Continuous curves illustrate the behavior of <A> versus Q as predicted by the theory (with all parameters set to the simulation values; see Table 1Go). The different curves correspond to values of b/{epsilon} = 0.1, 0.3, 0.5, 0.7, 0.9 (increasing values of b lead to higher values of <A>). The thick black line represents the maximum value of A allowed in the theory at different values of Q (independent of the value b). (Lower panel) <A'> vs. Q (continuous curves) as obtained from simulations, for values of b/{epsilon} = [0.2, 1.6] (increasing values of b lead to higher values of <A'>). Dotted curves represent the highest values of A' found in simulations at different values of Q, for the same values of b/{epsilon}. The maximum value of A allowed in the theory is also plotted for comparison (thick black line). (B) Filled gray circles show the maximum value of A' detected in all equilibrium and quenched simulations for many values of b (see text), as a function of the fraction of native contacts formed, Q. Open circles correspond to the maximum packing fraction of the nonnative part of the protein, as obtained by using equation A.1 in the Supplemental Material, that is, {eta}max = A'max/(1 - Q). Dotted lines show the maximum values for A (in black) and {eta} (in gray) allowed in the theory. Continuous lines in the corresponding colors represent the best fit of the data to a phenomenological exponential decay of A'max at small values of Q: A'max = (1 - Q)[1 - exp(-Q/Qc)]. Regression analysis yields Qc = 0.12. The best fit for A'max is shown in gray, and in black for {eta}max.

 
Figures 6Go and 7Go present a thorough comparison between the allowed and most probable values for the fraction of nonnative contacts at different stages of the folding, as obtained from theory and simulations. Although the maximum number of nonnative contacts is always detected in a pre-TS region, independently on the value of b/{epsilon}, it is clear from Figures 6AGo and 7AGo that larger values of b/{epsilon} yield a larger number of nonnative contacts formed, particularly in the unfolded ensemble. Interestingly, however, the number of nonnative interactions rapidly decreases to zero in regions with very small Q. The cause of this effect is not contained in the analytical expressions 7c, where it is assumed that Amax = 1 - Q. This result is caused partly by coupling between nonnative contacts and the angle and dihedral terms in the simulation Hamiltonian (which are not present in the theory). This is a finite size effect that tends to increase the polymer stiffness relative to that in the theory, which used a bulk approximation for thermodynamic quantities. Compact states with {eta} ~ 1 in which only nonnative interactions are present have large energetic cost and are formed very rarely. Another source of this effect is that forming collapsed conformations induces some native contacts to be formed, owing to the finite range of interactions. This effect is particularly important for short-range contacts among residues closely separated in sequence, and does not necessarily go away as one considers larger-size systems. This is the complementary effect to the already mentioned fact that in simulations nonnative interactions are formed in the native state (that has led us to a redefinition of the simulation reaction coordinate A').

To quantify this effect, we have generated a large (50) set of nonnative energy distributions with high and very high variance (b/{epsilon} ≥ 2 and b/{epsilon} >> 2). Sequences with these high values of b/{epsilon} are not able to fold to the selected native structure, but are useful to explore the region of the configurational space corresponding to compact structures with the maximum number of nonnative contacts formed. We expect the glass temperature of these sequences to be higher than their folding temperature (see the next section). After an initialization at very high temperature (T >> Tf°), a large number (>1000) of quenching simulations (T << Tf°) has been performed for each sequence to generate a representative sample of compact misfolded structures. The maximum fraction of nonnative contacts that can be formed, A'max, is thus defined as the largest values of A' among the vast pool of structures obtained by adding the results from the quenching simulations for high b/{epsilon} values to all configurations collected in simulations at any temperature and for any value of b/{epsilon} used in this study. Figure 7BGo shows the behavior of A'max as a function of the fraction of native contacts present in the structures. The theoretical assumption on the maximum fraction of nonnative contacts Amax = 1 - Q holds remarkably well up to the values of Q 0.15 that correspond to the unfolded state minimum in the free energy landscape (see Fig. 1Go). From these results, we then expect the unfolded region of a free energy landscape associated with the simulated protein Hamiltonian to be somewhat compressed toward smaller values of A with respect to the theoretical prediction.

Energy, entropy, and free energy landscape
Figure 1Go presents the energy, entropy, and free energy profiles obtained from simulations, as a function of the reaction coordinates Q and A', for three different values of the perturbation parameter b/{epsilon}. The corresponding quantities obtained from the analytical theory, with all the parameters set equal to the simulations parameters (i.e., rightmost column in Table 1Go) and b = 0.3{epsilon}, is also shown for comparison. For a more direct comparison with the results from simulations, the thermodynamic quantities from theory are only plotted in regions populated with probability >2 x 10–7, as we have typical samplings of ~5 x 106 configurations in folding/unfolding simulations. It is apparent from Figure 1Go that the region of the (Q,A') space populated with high probability in simulations differs somewhat from the (Q,A) region predicted by theory. Several factors are responsible for this difference and have to be considered before one tests the predictive power of the theory with the simulation results:


View this table:
[in this window]
[in a new window]
 
Table 1. Table of values for parameters in the model
 
  1. The unperturbed energy function used in simulations includes a self-avoiding term for all nonnative contacts, which is maintained in the perturbed Hamiltonian (see equations D.32 and D.33 in the Supplemental Material). This energy term is not explicitly considered in the theory. The short-distance repulsive interactions limit the formation of nonnative contacts (especially for small values of b/{epsilon}), and shifts the most populated regions of the folding landscape toward lower values of A. This effect also accounts for most of the differences in the energy landscape between theory and simulation results (see Fig. 1Go).
  2. The analytic expressions are obtained in the thermodynamic limit, whereas simulations are performed for a small protein (57 residues). The theoretical expressions do not explicitly keep track of finite-size effects due to polymer stiffness. However, the extra effect of polymer stiffness seen in the simulations only enhances the theoretically predicted rate acceleration effect (see below).
  3. The functional form for the entropy is approximated in the theory, and it is not expected to quantitatively reproduce the simulation results exactly. Particularly, the theoretical assumption on the allowed values of A at different Q (i.e., equation A.1 in the Supplemental Material) directly enters the derivation of the entropy (see Supplemental Material for details), and contributes to the relative "distortion" of the theoretical free energy landscape with respect to the landscape in the simulations. Nevertheless, the overall qualitative behavior of the entropy is correctly captured by the theory (see third column of Fig. 1Go).
  4. The position of the folded and unfolded free energy minima emerging from simulation data differs from Q = 0 and Q = 1, as assumed in the theory (see above).

Overall, the destabilization of the folding free energy landscape upon introduction of nonnative energy perturbation is strongly reduced in simulations with respect to what is predicted by the theory. In fact, although in the theory a perturbation of b/{epsilon}~ 1 renders a protein unfoldable (i.e., Tf / Tg ~ 1; see equation 21 and Plotkin 2001), it is found in simulations that all sequences generated with a perturbation parameter b/{epsilon} 1.7 (entering the Hamiltonian, equation D.33 in the Supplemental Material) are able to reversibly fold/unfold at the Go transition temperature Tf°. The next section quantifies this difference in the destabilization of the folding mechanism by comparing the folding and glass temperatures computed in simulations with their corresponding theoretical predictions.

Folding temperature and glass temperature
The folding temperature Tf of each protein model is estimated in simulations as the temperature corresponding to the peak in the specific heat curve (see Fig. 8AGo). This value is in good agreement (within the error bars) with the value obtained from the alternative definition of Tf as the temperature at which the folding and unfolding states have the same free energy (see Fig. 8BGo).



View larger version (20K):
[in this window]
[in a new window]
 
Figure 8. (A) Heat capacity as a function of temperature, as obtained from simulations for different values of the parameter b. Temperature is measured in units of native energy per contact, {epsilon}. (B) Free energy as a function of the fraction Q of native contacts, as obtained from simulations for several different values of the nonnative energy perturbation parameter b. Free energy curves for all values of b shown in B are obtained at their corresponding folding temperatures Tf(b) (estimated from the heat capacity curves, plotted in A), whereas all curves in C are at the folding temperature of the unperturbed case Tf° = Tf(b = 0).

 
From equation 7c, upon increasing nonnative interaction strength (increasing b, and/or increasing |{epsilon}NN| with {epsilon}NN < 0), the free energy F(Q) lowers with respect to the Go free energy at fixed temperature T = TF°. During this change of Hamiltonian, the free energy of the native structure remains roughly constant at EN (see Fig. 8CGo). Even though the unfolded state is stabilized with respect to the folded state during the process of increasing nonnative interaction strength at fixed temperature, the folding rate nevertheless accelerates, because the free energy of the transition state lowers more than the unfolded state does. This is described in more detail below, with the result shown in Figure 10BGo.

The thermodynamic glass temperature Tg can be estimated by using the results obtained in the framework of the random energy model (REM; Derrida 1981; Bryngelson and Wolynes 1987). As the energetic frustration of the system arises from randomly assigned nonnative interactions, we assume that the energy of compact (misfolded) structures in the unfolded ensemble is Gaussianly distributed, with mean value <Enn (Qu)> = MAmax{epsilon}NN and variance {delta} Enn2(Qu) = b2AmaxM, where MAmax is the maximum number of nonnative contacts the protein can form. In the theory, the maximum number of nonnative contacts was approximated at Q ~ 0 as the total number of native contacts MAmax = M. As we have already discussed above, the actual maximum number of nonnative contacts detected in simulations is smaller than the theoretical value, and it is expected to (slightly) vary with different realizations of the nonnative noise (see Fig. 6Go).

The REM glass temperature is defined by the vanishing of the thermal entropy (Derrida 1981; Bryngelson and Wolynes 1987), which corresponds to setting equation 7b to 0:


(20)

However, here we let Amax be a new parameter. This gives for the glass temperature:


(21)

where {delta}Enn(Qu) = MAmaxb2 is the energetic variance over the set of misfolded structures. For each protein model (i.e., each value of b/{epsilon}), we have performed several (>500) short quenching simulations to explore the compact configurations in the unfolded ensemble. A different open configuration is initially created by means of ancillary high-temperature simulations (with T >Tf), then rapidly quenched to very low temperatures (T ~ Tf/10, T ~ Tf/25, and T ~ Tf/50). The fluctuations of the nonnative energy in the compact misfolded configurations recorded during the quenching simulations are used to compute {delta}Enn(Qu) entering expression 21.

Figure 9AGo shows the folding temperature Tf and the glass temperature Tg obtained from simulation, as a function of the strength of the nonnative energy perturbation, b (in units of the native energy per contact, {epsilon}). The folding temperature is almost constant in the range shown, while the glass temperature rises from zero (b = 0 corresponds to the plain Go-like model with no energetic frustration; see equation D.3 in the Supplemental Material), to values close to Tf for large nonnative perturbations (b {gtrsim} 1.6). When Tg/Tf {approx} 1, many low-energy misfolded structures compete with the native state, and folding is dramatically slowed down. As the ratio Tg/Tf increases beyond unity, the system is no longer self-averaging, and different realizations of the non-native perturbation can lead to different folding mechanisms consistent with the same native topology. This point is discussed in a separate publication (S.S. Plotkin and C. Clementi, unpu