|
|
||||||||
1 Department of Chemistry and Biochemistry and Institute for Theoretical Chemistry, University of Texas at Austin, Austin, Texas 78712, USA
2 Department of Chemistry and Biochemistry and Interdepartmental Program in Biomolecular Science and Engineering, University of California, Santa Barbara, California 93106, USA
Reprint requests to: Kevin W. Plaxco, Department of Chemistry and Biochemistry and Interdepartmental Program in Biomolecular Science and Engineering, University of California, Santa Barbara, CA 93106, USA; e-mail: kwp{at}chem.ucsb.edu; fax: (805) 893-4120.
(RECEIVED June 20, 2002; FINAL REVISION September 24, 2002; ACCEPTED October 3, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0220003.
Abstract
Most small, single-domain proteins fold with the uncomplicated, single-exponential kinetics expected for diffusion on a smooth energy landscape. Despite this energetic smoothness, the folding rates of these two-state proteins span a remarkable million-fold range. Here, we review the evidence in favor of a simple, mechanistic description, the topomer search model, which quantitatively accounts for the broad scope of observed two-state folding rates. The model, which stipulates that the search for those unfolded conformations with a grossly correct topology is the rate-limiting step in folding, fits observed rates with a correlation coefficient of
0.9 using just two free parameters. The fitted values of these parameters, the pre-exponential attempt frequency and a measure of the difficulty of ordering an unfolded chain, are consistent with previously reported experimental constraints. These results suggest that the topomer search process may dominate the relative barrier heights of two-state protein-folding reactions.
Keywords: Contact order; diffusion-collision; nucleation
The folding kinetics of most simple, single-domain proteins is well fitted as a single-exponential, two-state process (Jackson and Fersht 1991; Guijarro et al. 1998; Jackson 1998; Plaxco et al. 1999), even at the lowest temperatures accessible to experiment (Gillespie and Plaxco 2000). This observation confirms that the rapid, biologically relevant folding rates that distinguish naturally occurring proteins are associated with a smooth energy landscape lacking both significant discrete traps (well-populated intermediates and misfolded states) and fine-scale heterogeneous roughness (Bryngelson and Wolynes 1987; Bryngelson et al. 1995; Onuchic et al. 1997). In the absence of these complications, folding is free to progress unimpeded to the native state with the greatest possible speed (Dill and Chan 1997; Dobson et al. 1998; Dinner et al. 2000). But this observation begs the question, if the folding energy landscapes of two-state proteins are generally smooth, why do some simple proteins fold a million times more rapidly than others (van Nuland et al. 1998; Wittung-Stafshede et al. 1999)? Here, we describe a simple, near-first principles model that quantitatively accounts for this well-established experimental observation.
Although evolution can presumably smooth the energy landscape arbitrarily, there is one aspect of protein chemistry that selective pressures cannot optimize, namely, a polypeptide is a covalent chain that cannot cross through itself. An unavoidable consequence of this connectivity is that the rate with which unfolded polypeptides diffuse between distinct topologies is limited, and thus, even imaginary proteins with perfectly smooth energy landscapes will exhibit varying folding rates due to topological frustration (Clementi et al. 2000) and the difficulty of diffusing into the correct, native topology. Consistent with this hypothesis, by the mid- to late nineties, numerous authors had suggested the search for the correct gross topology may be an important contributor to the folding barrier (Sosnick et al. 1994; Gross 1996; Sosnick et al. 1996; Guo et al. 1997; Kolinski et al. 1998; Sheinerman and Brooks 1998; Socci et al. 1998; Bergasa-Caceres et al. 1999; Debe et al. 1999; Shea et al. 1999).
In 1998, we serendipitously discovered that a simple, empirical measure of topological complexity is highly correlated with the experimentally observed folding rates of two-state proteins (Plaxco et al. 1998, 2000). The measure of topology in question, termed relative contact order, is simply the average sequence separation between all pairs of residues in contact in the native structure relative to the total length of the protein. The surprising strength of this correlation [r
0.9; all correlation coefficients in this review are for linearized equations. This correlation coefficent, r, thus reflects the significance of the linear relationship between log (kf) and contact order. The square of the correlation coefficient, r2, is a measure of the fraction of all of the variance in the data set that is captured by the model.] demonstrates that, against the background of the smooth energy landscapes of two-state proteins, this perhaps naïve measure captures in excess of 3/4 of the variance in reported (log) folding rates.
Whereas the contact order-rate relationship hints at the mechanistic underpinnings of the folding reaction, the measure has not lent itself to any simple, quantitative reconciliation with first principles models of the process. For example, because contact order is related to the sequence separation between contacting residues, it has been suggested that it relates to the entropic cost of the loop closures required to surmount the rate-limiting step in folding (Plaxco et al. 1998; Alm and Baker 1999a; Galzitskaya and Finkelstein 1999; Fersht 2000). Unfortunately, however, loop closure entropy is proportional to the logarithm of loop length rather than loop length per se (Jacobson and Stockmayer 1950) and the average log (separation) between contacting residues is more poorly correlated with rates than is contact order as originally defined (K.W. Plaxco, unpubl.). Similarly, relative contact order (the average contact separation in terms of fraction of total peptide length) predicts rates significantly more accurately than absolute measures of the average sequence separation of contacting residues (Grantcharova et al. 2001; Ivankov et al. 2002). This produces the counterintuitive result that, of two proteins with the same average contact separation, the longer protein folds faster. Observations such as these lead inevitably to the possibility that contact order predicts rates, not because it is directly related to the underlying mechanism of folding, but because it is a proxy for some other, physically more reasonable parameter. Consistent with this sugges-tion, a number of additional, empirical measures of topology correlate approximately equally well with folding rates. These include the number of sequence-distant contacts per residue (Gromiha and Selvaraj 2001), the fraction of contacts that are sequence distant (Mirny and Shakhnovich 2001), and the total contact distance (Zhou and Zhou 2002).
Motivated by the quantitative dependence of kinetics on topology, several groups have attempted to define mechanistic models of folding that predict rates with accuracy equal to or surpassing that of these empirical relationships. One approach is based on calculating the loop-entropy cost of sequentially creating the stabilizing interactions that define the native state (Alm and Baker 1999b; Muñoz and Eaton 1999; Grantcharova et al. 2001; Ivankov and Finkelstein 2001). Whereas these models have achieved real success in predicting folding kinetics, their relative complexity and slightly poorer correlation with experiment again emphasizes the question of whether the entropic cost of specific loop closures really underlies theperhaps deceptivelysimple relationship between a proteins topology and the rate with which it folds.
The topomer search model
The topomer search model provides a simple, alternative explanation for the topology-rate relationship. This model postulates that relative barrier heights are dominated by the diffusive search for the set of unfolded conformations that share a common, global topology with the native state (i.e., are in the native topomer; Debe et al. 1999), and that once this is achieved, the rate-limiting step has been surmounted and specific native contacts rapidly zipper to form the fully folded protein (Fig. 1
). The model implies that the various empirical, topological metrics correlate with rates because they correlate with the probability of the unfolded chain diffusing into this native topomer. Here, we review the simple, quantitative arguments in support of the topomer search model of two-state folding.
|
The demonstration that contact orderor any of the many related topological parameterscorrelates with the probability of finding the native topomer would provide critical support for this hypothesis. On first inspection, however, one might think that determining the probability of a chain diffusing into a given topomer is an overwhelmingly complex exercise in conditional probabilities (Fig. 2A
); the probability of bringing a given residue pair into proximity may depend acutely on which other pairs are already ordered (Chan and Dill 1990). The critical question is whether a simple mathematical description exists that reasonably approximates this complex set of conditional probabilities and accurately predicts the probability of achieving the native topomer.
|
0.80.9), the folding rates of the non-helical two-state proteins (Debe and Goddard 1999). However, this model fails to predict the folding rates of predominantly helical, two-state proteins. More recently, we have described a rather simpler and still more general version of the topomer search model that accurately predicts the folding rates of all classes of two-state proteins (Makarov et al. 2002).
Our model stems from simulations of the properties of inert, Gaussian chains. These simulations demonstrate that, due to two simplifying effects, a straightforward approximation describes the probability of a random-coil polymer adopting a given gross topology. The first simplifying effect is that, because the probability of sequence-neighboring residues being in proximity is high (their locations are highly correlated) the probability of forming the native topomer is dominated by pairs of residues that are distant in the sequence. Thus, sequence-local interactions contribute little to the probability of being in the native topomer. The second is that the probability of ordering the chain is well described by a mean-field approximation. That is, once a sufficient number of sequence-distant pairs of residues are brought into proximity, the remaining ordering events become independent of the precise nature of the pre-existing order (i.e., become independent of one another), and the probability of each of these orderings becomes approximately constant. The nature of this approximation can be understood in qualitative terms by considering the ordering of a native pair in a chain with a significant amount of pre-existing, native-like order (Fig. 2B
). In such a situation, it is plausible that the entropic cost of bringing such an additional native pair into proximity to form a bundle of residues in the native topomer could be described in terms of the bulk characteristics of this bundle instead of its precise structure (Flory 1956; Gutin and Shakhnovich 1994; Plotkin et al. 1996; Shoemaker and Wolynes 1999). The simplest approximation of the probability of forming the native topomer would then be to replace the unique probability of ordering each specific pair by the average probability of ordering all pairs. Numerical simulations of the exactly solvable Gaussian chain model provide quantitative support for this qualitative argument (Makarov and Metiu 2002; Makarov et al. 2002).
If, as suggested by Gaussian chain simulations, the probability of bringing each additional sequence-distant pair into proximity in the unfolded state is constant, then the probability that the unfolded polypeptide is in a given topomer, P(QD), is given by
![]() | 1 |
in which QD is the number of sequence-distant pairs whose proximity defines the topomer, <K> is the average equilibrium constant for residue pairs being in proximity (and is less than unity) and
is a proportionality constant. We note that this probability is proportional to <K>QD rather than equal to it; this is because, as suggested by the Gaussian chain studies, the additional entropic cost associated with the formation of the first few ordered pairs results in a prefactor that is less than unity (Makarov et al. 2002). Because of this, Equation 1
is only a valid approximation when QD is sufficiently large (in practice greater than
3). We also note that <K> may depend generally on the length of the chain; that is, whereas P(QD) has approximately an exponential dependence on the number of sequence-distant pairs that must be brought into proximity, this dependence may be different for different chain lengths. For the present, we will ignore the length dependence of <K>, and will return to these considerations later in the review.
The topomer search model predicts that the rate-limiting step in two-state folding is the formation of a conformation in which every residue is roughly in proximity to the residues that it contacts in the native state. We thus have the prediction that, by analogy to transition state theory, folding rates (kf) should scale approximately as
![]() | 2 |
in which
QD is the attempt frequency (proportional to QD due to the QD possible pairs of native residues that can be ordered), and
<K>QD
exp(-
G
/kBT) is the equilibrium constant for the formation of the native topomer. This relationship is reminiscent of the contact order-rate relationship (kf exponentially related to contact order). Nevertheless, the physical meaning of QDthe number of sequence-distant native pairings that define the native topomerdiffers fundamentally from that of the earlier, entirely empirical measure.
Testing the topomer search model
The prediction that folding rates relate to QD provides a means of testing the topomer search model. To perform this test, however, we must define QD in terms of experimental observables. This is readily performed if we assume that any pair of sequence-distant residues (separated by more than lc residues) that are in contact in the native state (i.e., within a cutoff distance, rc) must be in proximity to form the native topomer. The precise values of rc and lc, however, are not well constrained by the model. Typical choices are for rc to reflect pairs of C
atomsthe model is independent of specific chemical interactions and thus ignores side chainsthat approach to within 6 Å8 Å in the native state and for lc in the range of 412 residues [i.e., 0.51.5 times reported persistence lengths (Schwalbe et al. 1997; Penkett et al. 1998)]. Fortunately, the topomer search model is rather insensitive to the precise details of how these native pairs (and thus how QD) are defined; the range of QD that correspond to this wide range of parameters are all strongly correlated with one another, and critically, with experimentally observed folding rates. For example, if lc = 12 residues and rc = 6 Å, we obtain the statistically significant (r = 0.88), predictive correlation illustrated in Figure 3
. Thus, this simple model captures in excess of three-fourths of the variance in our kinetic data set using only two fitted parameters (<K> and the product 
).
|
3. Within the remaining set of two-state proteins, Equation 2The model parameters
The fitted parameters in the topomer search model are physically reasonable. The relationship between QD and folding rates stems from first principles arguments that allow us to assign meaning to the slope and intercept of the relationship and to test their validity experimentally. The value of these fitted parameters depends only weakly on how QD is defined, and the range of these parameters suggested by the model are consistent with a number of experimental and simulations-based studies of folding and the denatured ensemble.
Despite the potentially significant approximation that all two-state folding reactions exhibit the same 
irrespective of, for example, chain length (Portman et al. 2001; Kaya and Chan 2002), the pre-exponential produced by the model is physically reasonable. We base this assertion on the following first-principles argument. The attempt frequency,
QD, is the rate of moving residue pairs into or out of proximity (Makarov et al. 2002). Assuming this is a purely entropic event,
is the rate with which sequence-distant pairs diffuse apart and is given by (Szabo et al. 1980)
![]() | 3 |
in which D
4 x 10-7 cm2/s is the loop-closure diffusion coefficient (Hagen et al. 1997) and d is the characteristic distance at which a residue pair is no longer in sufficient proximity to rapidly zipper. Whereas the precise value of d is unclear, it must lie between 6 Å and 24 Å (respectively, the typical distance between residues in physical contact and the typical dimensions of a single domain protein). Across this range of d,
108 s-1, and as the fitted value of 
is
3800 s-1 (Fig. 3
),
4 x 10-5. Because
arises due to the extra entropy associated with the first few ordering events, this suggests an additional Rln
-85 J/mole.K can be assigned to this step in the topomer search process. Consistent with the arguments presented above (that the mean-field approximation becomes valid after
3 sequence-distant pairs have been ordered), this is comparable with approximately three times the entropic cost of closing a typical 1225 residue loop (Poland and Scheraga 1965).
It is more difficult to ascertain whether the value of <K> is reasonable. The value obtained from fitting experimental folding rates is
0.80.9, depending on the choice of lc and rc. These values imply that, once more than approximately three sequence distant native pairs have been brought into proximity (and Equation 2
becomes a valid approximation), any remaining native pairs have an
45% chance (corresponding to an equilibrium constant of 0.80.9) of being in proximity in the unfolded molecule. Whereas this may suggest that the unfolded state is relatively well-ordered, an important consideration is that the model defines proximity as any orientation in which elements can collide to form native contacts more rapidly than the rate-limiting step in folding. As the rate-limiting step in folding is orders of magnitude slower than the rate of loop closure, proximity need not imply that two residues are particularly close in space. Indeed, this is precisely how the topomer search model solves Levinthals paradox; whereas the number of conformations in the native topomer is small relative to the total number of conformations available in the unfolded ensemble, it is enormously larger than unity. Because of this, the entropic cost of finding the native topomer may be reasonable even in the absence of native-like interactions that may favor this set of conformations. That said, recent experimental (Hodsdon and Frieden 2001; Plaxco and Gross 2001; Shortle and Ackerman 2001; Baldwin 2002; Klein-Seetharaman et al. 2002) and simulations-based (Choy and Forman-Kay 2000; Zagrovic et al. 2002) reports of residual long-range order in the equilibrium denatured state are consistent with the seemingly high value of <K>. If, as suggested by these studies, the denatured state adopts a native-like topology, then it is perhaps not surprising that any given sequence distant native pair has, on average, a
45% chance of being in proximity.
With these considerations, we now have all of the elements required to draw a complete picture of the topomer search model of two-state protein folding. The fitted value of <K> argues that, once the first few sequence-distant native pairs are in proximity, about half of the remaining sequence-distant pairs are likely to be in the correct topomeric state. That is, they are in sufficient proximity that they rapidly sample (and because of the relative instability of partially folded states, "unsample") their native interactions. The pre-exponential suggests that these correctly oriented elements will be rapidly fluctuating out ofand incorrectly oriented elements back intothe correct topomeric state. The rate-limiting step in folding is then the set of random fluctuations that simultaneously brings every element in the chain into the native topomer. Once this is achieved, the rate-limiting step is surmounted and specific native contacts rapidly and productively zipper to form the fully folded protein.
The relationship between rates and contact order
The relationship between rates and contact order thus appears to arise indirectly. That is, the behavior of Gaussian chains suggests that QD defines the probability of achieving the native topomerand thus defines folding ratesand that contact order predicts rates not because it is related to the folding mechanism per se but because it is a proxy for QD. Although a strong correlation between contact order and QD for most proteins renders it difficult to prove this hypothesis directly, recent counterexamples provide significant evidence in support of it. For example, circular permutation of the S6 domain allows us to distinguish between the two parameters, whereas permutation significantly alters the proteins contact order, it does not significantly alter QD (Miller et al. 2002). Consistent with the predictions of the topomer search model, it has been reported recently that these permutations do not significantly alter folding rates (Lindberg et al. 2001). Similarly, the covalent circularization of a protein should significantly alter its contact order (presumably, one counts the shortest covalent path between contacting residues), leading to orders of magnitude rate accelerations. The topomer search model, in contrast, predicts relatively small rate accelerations, circularization pre-orders only one sequence-distant native pair. This will reduce the entropic cost of the first few ordering events by, at most, about one-third, increasing
and thus folding ratesby no more than a factor of 10. Consistent with this prediction, the relevant, reported circularizations produce only three- to sevenfold rate enhancements (Otzen and Fersht 1998; Grantcharova and Baker 2001; Camarero et al. 2001).
Native interactions and the topomer search
The topomer search model ignores the contributions of native-like interactions to the rate-limiting step in folding, obviously a potentially significant omission. For example, the strong, perfectly exponential denaturant dependencies of folding rates demonstrate that the folding transition state contains interactions similar to those that stabilize the native state (Plaxco et al. 2000). This suggestion is further supported by reports that native-state stability is an important determinant of the relative folding rates of topologically similar proteins (Guijarro et al. 1998; Clarke et al. 1999). Moreover, exhaustive mutagenasis studies (termed
-value analysis) have firmly established that many side chains are in near-native environments during the rate-limiting step in folding (for review, see Fersht 1997). It is thus abundantly clear that, in addition to the topomer search process, the formation of specific, native interactions also contributes to the relative free energy of the folding transition state. However, despite its studied lack of specific, nucleating interactions, it ignores all chemistrythe topomer search model captures three-fourths of the variance in the log of relative two-state folding rates. This suggests that, although specific, native-like interactions are an obligatory feature of the folding transition state (Fig. 1
, C to D transition), these interactions are neither sufficient to ensure folding nor the dominant determinant of relative barrier heights.
Of course, the topomer search model need not completely ignore the energetically favorable interactions that may exist in the folding transition state; they are spun into the factor <K> (Makarov et al. 2002). That is, any stabilizing interactions that bias the chain toward the native geometry will increase the average probability of a native-like orientation of structural elements. As noted above, however, the observed value of <K> may be reasonable even in the absence of significant stabilizing interactions simply because proximity only implies "close enough to collide more rapidly than the rate-limiting step." As the rate-limiting step in folding is slow (relative to loop closure rates), "close enough" may, in reality, be rather distant, and thus, energetically favorable interactions are not necessarily required to generate <K>
0.80.9 and the rapid folding rates this produces.
Relationship to previous folding models
The topomer search model unifies several previous models of protein-folding kinetics. For example, the topomer search model is grounded in the energy landscape picture of protein folding (Bryngelson and Wolynes 1987; Bryngelson et al. 1995; Dill and Chan 1997; Onuchic et al. 1997; Dobson et al. 1998; Dinner et al. 2000); it is precisely because the energy landscapes of two-state proteins are exceedingly smooth that the topomer search, rather than diffusion over a rough landscape or escape from discrete traps, defines the folding barrier (Sosnick et al. 1994; Debe et al. 1999; Gillespie and Plaxco 2000; Millet et al. 2002). Notably, the energy landscape of the topomer search process itself is smooth; recent studies of the rate with which sequence-distant residue pairs are brought into proximity in unfolded cytochrome c demonstrate that inter-residue interactions (i.e., energetic roughness) do not control large-scale conformational diffusion even under native conditions (Hagen et al. 2001).
The topomer search model can also be considered a limiting (albeit simple, general, and easily quantified) case of the hierarchical folding models (Rose 1979; Baldwin and Rose 1999). The diffusion-collision model, for example, stipulates that protein folding occurs via the diffusive, hierarchical assembly of more-or-less preformed elements of secondary structure (Karplus and Weaver 1979; Zhou and Karplus 1999; Myers and Oas 2001). The topomer search model, in contrast, stipulates that, except for those few, rapidly folding proteins for which QD < 4 (see Islam et al. 2002), the sampling of local structure is orders of magnitude more rapid than the sampling of topomers. For most two-state proteins, the barrier is thus largely defined by the latter, with the sampling of local structural elements playing a much lesser role in determining relative folding rates.
How can the topomer search model be improved?
The strong correlation between Equation 2
and observed two-state folding rates suggests that, despite its seemingly excessive simplicity, the topomer search model captures the dominant contributor to relative barrier heights. There is, nevertheless, clearly room to improve the models accuracy and generalizability. Here, we discuss likely future efforts in these directions.
Chain-length dependence
Numerous theoretical studies suggest that both the pre-exponential (via the diffusion coefficient) and the activation barrier (due to the entropic cost of the search) of folding are strong functions of chain length, N (Thirumalai 1995; Gutin et al. 1996; Zhdanov 1998; Debe et al. 1999). Most models predict that folding rates scale exponentially with N with a large, negative exponent (i.e., longer chains fold more slowly). No statistically significant length dependence is evident, however, in the experimentally observed folding rates of simple, single-domain proteins (Plaxco et al. 1998, 2000), perhaps because the effects of differing topologies overwhelm the more subtle, length-rate relationship. The topomer search model provides a convenient opportunity to account for the effects of topological variations and thus investigate the length dependence of folding independently of topology. When this is performed, a statistically significant relationship between rates and N arises, but in the counter-intuitive direction; longer proteins tend to fold more rapidly than predicted. This leads to a small, but statistically significant improvement in the relationship between log (kf) and QDN
versus QD alone (Fig. 4
) via the equation
|
![]() | 4 |
in which
is a negative number in the range of -0.5 to -1.0 (r =
0.920.93 over this range), J is a constant of magnitude < 1, and ß is a constant analogous to 
. Because J and
are interdependent variables, it is impossible to pinpoint the value of
more precisely. It is clear, however, that
is negative, leading to the counterintuitive result that, all other parameters being equal, longer proteins fold more rapidly.
It is not hard to rationalize this length dependence in the context of the topomer search model. It is consistent with the generalization of the model in which the mean equilibrium constant for the ordering of native pairs is dependent on chain length
![]() | 5 |
with an exponent,
that is negative. A possible origin of this relationship (and the counterintuitive length dependence it gives rise to) is crowding effects. That is, if a sequence-distant interaction occurs, on average, once every 5 residues along the chain steric and geometric constraints may render the native topomer more difficult to achieve than if, on average, sequence-distant interactions occur only every 10 residues. Critically, the Gaussian chain model is unlikely to capture crowding correctly, as it rather poorly mimics the stiffness of an unfolded polypeptide and entirely ignores excluded volume interactions. This suggests that simulations of more realistic chains are in order if we are to verify the validity of this currently empirical correction.
The mean-field approximation
A second concern is that the mean-field approximation is simply that, an approximation. It is certain that the inclusion of additional parameters (beyond simply counting the number of sequence-distant native pairs) will be required in order to define the probability of achieving a given topomer more accurately. The equilibrium constant for bringing sequence-distant native pairs into proximity, for example, is at least a weak function of the chain length separating the pair from itself and from other, preordered pairs. This effect may be illustrated by studies in which the extension of solvent-exposed loops slows folding rates; such extension does not significantly alter QD, but does change the accuracy of the approximation that <K> is a constant. That said, the effect of extending a loop by less than lc residues is relatively subtle; extensions of 1013 residues reduce rates by less than a factor of 4 (Ladurner and Fersht 1997; Viguera and Serrano 1997; Grantcharova et al. 2000). Only the longest reported loop-extensions (e.g., a 59-residues loop inserted in an artificially engineered, monomeric arc repressor) produce significant changes in two-state folding rates (Robinson and Sauer 1998).
Native interactions
A potentially more serious omission is that the topomer search model ignores all of the detailed chemical interactions that define the native state. As noted above, it is abundantly clear that the topomer search is only part of the folding barrier and the inclusion of specific, stabilizing interactions is clearly critical if we are to develop a more predictive model of folding kinetics. Recent experimental results, however, suggest that the native-like interactions occurring in the folding transition state are rather plastic (i.e., can be altered significantly without significantly altering folding rates), and thus, their effect on folding kinetics may prove difficult to model accurately (Grantcharova and Baker 2001; Nauli et al. 2001). Nevertheless, progress has already been reported on this front for the folding of the topologically simple proteins (QD < 4), for which native-like interactions play the greatest role in defining relative rates (Myers and Oas 2001; Islam et al. 2002).
Non-two-state folding
Further generalization of the model to fit non-two-state proteins may also prove difficult. The topomer search model is rooted in the observation that the folding energy landscape of two-state proteins is extremely smooth and, in the absence of energetic roughness, the connectivity-induced difficulty of the topomer search dominates relative barrier heights. In contrast, non-two-state folding necessarily implies that well-populated intermediates dominate the folding landscape, leading to deviations from single-exponential kinetics. Under these circumstances, folding kinetics could be defined by the rate of escape from these intermediate states rather than by the rate of topomer sampling (Sosnick et al. 1994; Debe et al. 1999; Millet et al. 2002). As the free energy of these states are defined by specific chemical interactions, predicting the kinetics with which they are escaped will probably not prove as simple as describing the kinetics of the topomer search.
Conclusions
The topomer search model stipulates that the random, diffusive process by which an unfolded polypeptide achieves its native topomer dominates the relative folding rates of two-state proteins. This native topomer is defined as the set of conformations in which every pair of residues in contact in the native state are in sufficient proximity that they can collide and form native interactions more rapidly than therelatively slowrate-limiting step in folding. Simulations of the diffusion of an inert, Gaussian chain indicate that the probability of such an occurrence relates simply to the number of sequence-distant residue pairs required to define the native topomer. Consistent with this result, the experimentally observed folding rates of two-state proteins correlate strongly with QD, the number of sequence-distant residue pairs in contact in the native state. The predictive value of this result supports the argument that the topomer search process is the dominant contributor to the relative barrier heights of two-state protein folding reactions.
Acknowledgments
The quantitative topomer search model was originally developed in collaboration with our colleagues Horia Metiu and Craig Keller and was motivated in part by the pioneering work of Derek Debe and William Goddard. The authors would also like to acknowledge numerous informative discussions with David Baker, Buzz Baldwin, Hue Sun Chan, Ken Dill, Chris Dobson, Carl Frieden, Blake Gillespie, Michael Gross, Jim Hu, Bob Matthews, Vijay Pande, Rohit Pappu, George Rose, David Shortle, and Tobin Sosnick.
References
Alm, E. and Baker, D. 1999a. Matching theory and experiment in protein folding. Curr. Opin. Struct. Biol. 9: 189196.[CrossRef][Medline]
. 1999b. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc. Natl. Acad. Sci. 96: 1130511310.
Baldwin, R.E. 2002. Protein foldingMaking a network of hydrophobic clusters. Science 295: 16571658.
Baldwin, R.L. and Rose, G.D. 1999. Is protein folding hierarchic? II. Folding intermediates and transition states. Trends. Biochem. Sci. 24: 7783.[CrossRef][Medline]
Bergasa-Caceres, F., Ronneberg, T.A., and Rabitz, H.A. 1999. Sequential collapse model for protein folding pathways. J. Phys. Chem. B 103: 97499758.[CrossRef]
Bieri, O., Wirz, J., Hellrung, B., Schutkowski, M., Drewello, M., and Kiefhaber, T. 1999. The speed limit for protein folding measured by triplettriplet energy transfer. Proc. Natl. Acad. Sci. 96: 95979601
Bryngelson, J.D. and Wolynes, P.G. 1987. Spin-glasses and the statistical-mechanics of protein folding. Proc. Natl. Acad. Sci. 84: 75247528.
Bryngelson, J.D., Onuchic, J.N., Socci, N.D., and Wolynes, P.G. 1995. Funnels, pathways, and the energy landscape of protein-foldinga synthesis. Prot. Sruct. Func. Gen. 21: 167195.
Camarero, J.A., Fushman, D., Sato, S., Giriat, I., Cowburn, D., Raleigh, D.P., and Muir, T.W. 2001. Rescuing a destabilized protein fold through backbone cyclization. J. Mol. Biol. 308: 10451062.[CrossRef][Medline]
Chan, H.S. and Dill, K.A. 1990. The effects of internal constraints on the configurations of chain molecules. J. Chem. Phys. 92: 31183135.[CrossRef]
Choy, W.Y. and Forman-Kay, J. 2000. Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J. Mol. Biol. 308: 10111032.
Clarke, J., Cota, E., Fowler, S.B., and Hamill, S.J. 1999. Folding studies of immunoglobulin-like ß-sandwich proteins suggest that they share a common folding pathway. Structure 7: 11451153.[Medline]
Clementi, C., Nymeyer, H., and Onuchic, J.N. 2000. Topological and energetic factors: What determines the structural details of the transition state ensemble and en-route intermediates for protein folding? An investigation for small globular proteins. J. Mol. Biol. 298: 937953.[CrossRef][Medline]
Debe, D.A. and Goddard, W.A. 1999. First principles prediction of protein folding rates. J. Mol. Biol. 294: 619625.[CrossRef][Medline]
Debe, D.A., Carlson, M.J., and Goddard, W.A. 1999. The topomer-sampling model of protein folding. Proc. Natl. Acad. Sci. 96: 25962601.
Dill, K.A. and Chan, H.S. 1997. From Levinthal to pathways to funnels. Nat. Struc. Biol. 4: 1019.[CrossRef][Medline]
Dinner, A.R., Sali, A., Smith, L.J., Dobson, C.M., and Karplus, M. 2000. Understanding protein folding via free-energy surfaces from theory and experiment. Trend. Bioch. Sci. 25: 331339.
Dobson, C.M., Sali, A., and Karplus, M. 1998. Protein folding: A perspective from theory and experiment. Ang. Chem. Int. Ed. 37: 868893.[CrossRef]
Eaton, W.A., Munoz, V., Hagen, S.J., Jas, G.S., Lapidus, L.J., Henry, E.R., and Hofrichter, J. 2000. Fast kinetics and mechanisms in protein folding. Annu. Rev. Biomol. Struct. 29: 327359.[CrossRef][Medline]
Fersht, A.R. 1997. Nucleation mechanisms in protein folding. Curr. Opin. Struct. Biol. 7: 39.[CrossRef][Medline]
. 2000. Transition-state structure as a unifying basis in protein-folding mechanisms: Contact order, chain topology, stability, and the extended nucleus mechanism. Proc. Natl. Acad. Sci. 97: 15251529.
Flanagan, J.M., Kataoka, M., Shortle, D., and Engelman, D.M. 1992. Truncated staphylococcal nuclease is compact but disordered. Proc. Natl. Acad. Sci. 89: 748752.
Flory, P.J. 1956. Theory of elastic mechanisms in fibrous proteins. J. Am. Chem. Soc. 78: 52225234.[CrossRef]
Galzitskaya, O.V. and Finkelstein, A.V. 1999. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc. Natl. Acad. Sci. 96: 1129911304.
Gillespie, B. and Plaxco, K.W. 2000. Non-glassy kinetics in the folding of a simple, single domain protein. Proc. Natl. Acad. Sci. 97: 1201412019.
Grantcharova, V.P. and Baker, D. 2001. Circularization changes the folding transition state of the src SH3 domain. J. Mol. Biol. 306: 555563.[CrossRef][Medline]
Grantcharova, V.P., Riddle, D.S., and Baker, D. 2000. Long-range order in the src SH3 folding transition state. Proc. Natl. Acad. Sci. 97: 70847089.
Grantcharova, V.P., Alm. E.J., Baker, D., and Horowitz, A.L. 2001. Mechanisms of protein folding. Curr. Opin. Struct. Biol. 11: 7082.[CrossRef][Medline]
Gromiha, M.M. and Selvaraj, S. 2001. Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: Application of long-range order to folding rate prediction. J. Mol. Biol. 310: 2732.[CrossRef][Medline]
Gross, M. 1996. Linguistic analysis of protein folding. FEBS Lett. 390: 249252.[CrossRef][Medline]
Guijarro, J.I., Morton, C.J., Plaxco, K.W., Campbell, I.D., and Dobson, C.M. 1998. Folding kinetics of the SH3 domain of PI3 by real-time NMR and optical techniques. J. Mol. Biol. 275: 657667.
Guo, Z.Y., Brooks, C.L., and Boczko, E.M. 1997. Exploring the folding free energy surface of a three-helix bundle protein. Proc. Natl. Acad. Sci. 94: 1016110166.
Gutin, A.M. and Shakhnovich, E.I. 1994. Statistical mechanics of polymers with distance constraints. J. Chem. Phys. 100: 52905293.[CrossRef]
Gutin, A.M., Abkevich, V.I., and Shakhnovich, E.I. 1996. Chain length scaling of protein folding time. Phys. Rev. Lett. 77: 54335436.[CrossRef][Medline]
Hagen, S.J., Hofrichter, J., Szabo, A., and Eaton, W.A. 1996. Diffusion-limited contact formation in unfolded cytochrome c: Estimating the maximum rate of protein folding. Proc. Natl. Acad. Sci. 93: 1161511617.
Hagen, S.J., Hofrichter, J., and Eaton, W.A. 1997. Rate of intrachain diffusion of unfolded cytochrome c. J. Phys. Chem. B 101: 23522365.[CrossRef]
Hagen, S.J., Carswell, C.W., and Sjolander, E.M. 2001. Rate of intrachain contact formation in an unfolded protein: Temperature and denaturant effects. J. Mol. Biol. 305: 11611171.[CrossRef][Medline]
Hodsdon, M.E. and Frieden, C. 2001. Intestinal fatty acid binding protein: The folding mechanism as determined by NMR studies. Biochemistry 40: 732742.[CrossRef][Medline]
Islam, S.A., Karplus, M., and Weaver, D.L. 2002. Application of the diffusion-collision model to the folding of three-helix bundle proteins. J. Mol. Biol. 318: 199215.[CrossRef][Medline]
Ivankov, D.N. and Finkelstein, A.V. 2001. Theoretical study of a landscape of protein folding-unfolding pathways. Folding rates at midtransition. Biochemistry 40: 99579961.[CrossRef][Medline]
Jackson, S.E. 1998. How do small single domain proteins fold? Fold. Des. 3: R81R91.[CrossRef][Medline]
Jackson, S.E. and Fersht, A.R. 1991. The folding of chymotrypsin inhibitor-2. 1. Evidence for a two-state transition. Biochemistry 30: 1042810435.[CrossRef][Medline]
Jacobson, H. and Stockmayer, W.H. 1950. Intramolecular reaction in polycondensations. I. The theory of linear systems. J. Chem. Phys. 18: 16001606.[CrossRef]
Karplus, M. and Weaver, D.L. 1979. Diffusion-collision model or protein folding. Biopolymers 18: 14211437.[CrossRef]
Kaya, H. and Chan, H.S. 2002. Towards a consistent modeling of protein thermodynamic and kinetic cooperativity: How applicable is the transition state picture to folding and unfolding? J. Mol. Biol. 315: 899909.[CrossRef][Medline]
Klein-Seetharaman, J., Oikawa, M., Grimshaw, S.B., Wirmer, J., Duchardt, E., Ueda, T., Imoto, T., Smith, L.J., Dobson, C.M., and Schwalbe, H. 2002. Long-range interactions within a nonnative protein. Science 295: 17191722.
Kolinski, A., Galazka, W., and Skolnick, J. 1998. Monte Carlo studies of the thermodynamics and kinetics of reduced protein models: Application to small helical, ß, and
/ß proteins. J. Chem. Phys. 108: 26082617.[CrossRef]
Ladurner, A.G. and Fersht, A.R. 1997. Glutamine, alanine, or glycine repeats inserted into the loop of a protein have minimal effects on stability and folding rates. J. Mol. Biol. 273: 330337.[CrossRef][Medline]
Ladurner, A.G., Itzhaki, L.S., Gay, G.D., and Fersht, A.R. 1997. Complementation of peptide fragments of the single domain protein chymotrypsin inhibitor 2. J. Mol. Biol. 273: 317329.[CrossRef][Medline]
Lapidus, L.J., Eaton, W.A., and Hofrichter, J. 2000. Measuring the rate of intramolecular contact formation in polypeptides. Proc. Natl. Acad. Sci. 97: 72207225.
Lindberg, M.O., Tangrot, J., Otzen, D.E., Dolgikh, D.A., Finkelstein, A.V., and Oliveberg, M. 2001. Folding of circular permutants with decreased contact order: General trend balanced by protein stability. J. Mol. Biol. 314: 891900.[CrossRef][Medline]
Makarov, D.E. and Metiu, H. 2002. A model for the kinetics of protein folding: Kinetic Monte Carlo simulations and analytical results. J. Chem. Phys. 116: 52055216.[CrossRef]
Makarov, D.E., Keller, C.A., Plaxco, K.W., and Metiu, H. 2002. How the folding rate constant of simple-single domain proteins depends on number of native contacts. Proc. Natl. Acad. Sci. 99: 35353539.
Mayor, U., Johnson, C.M., Daggett, V., and Fersht, A.R. 2000. Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc. Natl. Acad. Sci. 97: 1351813522.
Miller, E.J., Fischer, K.F., and Marqusee, S. 2002. Experimental evaluation of topological parameters determining protein-folding rates. Proc. Natl. Acad. Sci. 99: 1035910363.
Millet, I.S., Townsley, L., Chiti, F., Doniach, S., and Plaxco, K.W. 2002. Equilibrium collapse and the kinetic foldability of proteins. Biochemistry 41: 321325.[CrossRef][Medline]
Mirny, L. and Shakhnovich, E. 2001. Protein folding theory: From lattice to all-atom models. Annu. Rev. Biophys. Biomol. Struc. 30: 361396.[CrossRef][Medline]
Muñoz, V. and Eaton, W.A. 1999. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc. Natl. Acad. Sci. 96: 1131111316.
Muñoz, V., Thompson, P.A., Hofrichter, J., and Eaton, W.A. 1997. Folding dynamics and mechanism of ß-hairpin formation. Nature 390: 196199.[CrossRef][Medline]
Myers, J.K. and Oas, T.G. 2001. Preorganized secondary structure as an important determinant of fast protein folding. Nat. Struct. Biol. 8: 552558.[CrossRef][Medline]
Nauli, S., Kuhlman, B., and Baker, D. 2001. Computer-based redesign of a protein folding pathway. Nat. Struct. Biol. 8: 602605.[CrossRef][Medline]
Onuchic, J.N., Luthey-Schulten, Z., and Wolynes, P.G. 1997. Theory of protein folding: The energy landscape perspective. Annu. Rev. Phys. Chem. 48: 545600.[CrossRef][Medline]
Otzen, D.E. and Fersht, A.R. 1998. Folding of circular and permuted chymotrypsin inhibitor 2: Retention of the folding nucleus. Biochemistry 37: 81398146.[CrossRef][Medline]
Penkett, C.J., Redfield, C., Jones, J.A., Dodd, I., Hubbard, J., Smith, R.A.G., Smith, L.J., and Dobson, C.M. 1998. Structural and dynamical characterization of a biologically active unfolded fibronectin-binding protein from Staphylococus aureus. Biochemistry 37: 1705417067.[CrossRef][Medline]
Plaxco, K.W. and Gross, M. 2001. Unfolded, yes, but random? Never! Nat. Struct. Biol. 8: 659670.[CrossRef][Medline]
Plaxco, K.W., Simons, K.T., and Baker, D. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277: 985994.[CrossRef][Medline]
Plaxco, K.W., Millett, I.S., Segel, D.J., Doniach, S., and Baker, D. 1999. Polypeptide chain collapse can occur concomitantly with the rate limiting step in protein folding. Nat. Struct. Biol. 6: 554557.[CrossRef][Medline]
Plaxco, K.W., Simons, K.T., Ruczinski, I., and Baker, D. 2000. Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics. Biochemistry 37: 1117711183.
Plotkin, S.S., Wang, J. and Wolynes, P.G. 1996. Correlated energy landscape model for finite, random heteropolymers. Phys. Rev. E 53: 62716296.[CrossRef]
Poland, D.C. and Scheraga, H.A. 1965. Statistical mechanics of noncovalent bonds in polyamino acids. 8. Covalent loops in proteins. Biopolymers 3: 379385.[CrossRef]
Portman, J.J., Takada, S., and Wolynes, P.G. 2001. Microscopic theory of protein folding rates. II. Local reaction coordinates and chain dynamics. J. Chem. Phys. 114: 50825096.[CrossRef]
Robinson, C.R. and Sauer, R.T. 1998. Optimizing the stability of single-chain proteins by linker length and composition mutagenasis. Proc. Natl. Acad. Sci. 95: 59295934.
Rose, G.D. 1979. Hierarchic organization of domains in globular-proteins. J. Mol. Biol. 134: 447470.[CrossRef][Medline]
Schwalbe, H., Fiebig, J.M., Buck, M., Jones, J.A., Grimshaw, S.B., Spencer, A., Glaser, S.J., Smith, L.J., and Dobson, C.M. 1997. Structural and dynamical properties of a denatured protein. Heteronuclear 3D NMR experiments and theoretical simulations of lysozyme in 8 M urea. Biochemistry 36: 89778991.[CrossRef][Medline]
Shea, J.E., Onuchic, J.N., and Brooks, C.L. 1999. Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc. Natl. Acad. Sci. 96: 1251212517.
Sheinerman, F.B. and Brooks, C.L. 1998. Molecular picture of folding of a small
/ß protein. Proc. Natl. Acad. Sci. 95: 15621567.
Shoemaker, B.A. and Wolynes, P.G. 1999. Exploring structures in protein folding funnels with free energy functionals: The denatured ensemble. J. Mol. Biol. 287: 657674.[CrossRef][Medline]
Shortle, D. and Ackerman, M.S. 2001. Persistence of native-like topology in a denatured protein in 8 M urea. Science 293: 487489.
Socci, N.D., Onuchic, J.N., and Wolynes, P.G. 1998. Protein folding mechanisms and the multidimensional folding funnel. Prot. Struc. Func. Gen. 32: 136158.
Sosnick, T.R., Mayne, L., Hiller, R., and Englander, S.W. 1994. The barriers in protein folding. Nat. Struct. Biol. 1: 149156.[CrossRef][Medline]
Sosnick, T.R., Mayne, L., and Englander, S.W. 1996. Molecular collapse: the rate-limiting step in two-state cytochrome c folding. Proteins 24: 413426.[CrossRef][Medline]
Szabo, A., Schulten, K., and Schulten, Z. 1980. 1st passage time approach to diffusion controlled reactions. J. Chem. Physics 72: 43504357.[CrossRef]
Thirumalai, D. 1995. From minimal models to real proteins: Time scales for protein folding. J. Physique I 5: 14571467.
Thompson, P.A., Eaton, W.A., and Hofrichter, J. 1997. Laser temperature jump study of the helix reversible arrow coil kinetics of an alanine peptide interpreted with a kinetic zipper model. Biochemistry 36: 92009210.[CrossRef][Medline]
Van Nuland, N.A.J., Chiti, F., Taddei, N., Raugei, G., Ramponi, G., and Dobson, C.M. 1998. Slow folding of muscle acylphosphatase in the absence of intermediates. J. Mol. Biol. 283: 883891.[CrossRef][Medline]
Viguera, A.R. and Serrano, L. 1997. Loop length, intramolecular diffusion and protein folding. Nat. Struct. Biol. 4: 939946.[CrossRef][Medline]
Wittung-Stafshede, P., Lee, J.C., Winkler, J.R., and Gray, H.B. 1999. Cytochrome b562 folding triggered by electron transfer: Approaching the speed limit for formation of a four-helix-bundle protein. Proc. Natl. Acad. Sci. 96: 65876590.
Zagrovic, B., Snow, C., Khaliq, S., Shirts, M., and Pande, V. 2002. Native-like mean structure in the unfolded ensemble of small proteins. J. Mol. Biol. (in press)
Zhdanaov, V.P. 1998. Folding time of ideal ß-sheets vs. chain length. Europhys. Lett. 42: 577581.[CrossRef]
Zhou, H.Y. and Zhou, Y.Q. 2002. Folding rate prediction using total contact distance. Biophys. J. 82: 458463.
Zhou, Y.Q. and Karplus, M. 1999. Interpreting the folding kinetics of helical proteins. Nature 401: 400403.[CrossRef][Medline]