|
|
||||||||
Department of Biochemistry and Department of Medical Genetics & Microbiology, University of Toronto, Toronto, Ontario M5S 1A8, Canada
Reprint requests to: Hue Sun Chan, Department of Biochemistry, University of Toronto, 1 Kings College Circle, Toronto, Ontario M5S 1A8, Canada; e-mail: chan{at}arrhenius.med.utoronto.ca; fax: 1-416- 978-8548.
(RECEIVED December 16, 2004; FINAL REVISION February 23, 2005; ACCEPTED February 23, 2005)
| Abstract |
|---|
|
|
|---|
chain models. The theoretical native topomers of four experimentally well-studied two-state proteins are characterized. Consistent with the TSM perspective, we found that the sizes of the native topomers increase with experimental folding rate. However, a careful determination of the corresponding probabilities that the native topomers are populated during a random search fails to reproduce the previously predicted folding rates. Instead, our results indicate that an unbiased TSM search for the native topomer amounts to a Levinthal-like process that would take an impossibly long average time to complete. Furthermore, intraprotein contacts in all four native topomers considered exhibit no apparent correlation with the experimental
-values determined from the folding kinetics of these proteins. Thus, the present findings suggest that certain basic, generic yet essential energetic features in protein folding are not accounted for by TSM scenarios to date. Keywords: protein folding; topomer search model; topomer-sampling model; Levinthal search; explicit-chain modeling
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041317705.
| Introduction |
|---|
|
|
|---|
-helical proteins) were found to fold faster than proteins with high CO values (typically
/
- and
-proteins). Since this seminal finding, other topology-based parameters have been found that exhibit similar correlation with folding rates. These include long-range order (Selvaraj and Gromiha 2001), total contact distance (Zhou and Zhou 2002), "cliquishness" (Micheletti 2003), local secondary structure content (Gong et al. 2003), and the topomer-derived parameter considered in this work (Makarov and Plaxco 2003). Several theoretical constructs that did not directly consider explicit chain representations have had notable successes in reproducing this empirically observed rate topology relationship (Alm and Baker 1999; Galzitskaya and Finkelstein 1999; Muñoz and Eaton 1999; Weikl and Dill 2003). For more realistic explicit-chain models that involve direct simulations of folding kinetics, however, it has proven less straightforward (Koga and Takada 2001; Faisca and Ball 2002; Cieplak and Hoang 2003); nonetheless, significant recent advances have been made (Jewett et al. 2003; Kaya and Chan 2003c; Chavez et al. 2004; Ejtehadi et al. 2004). Among the theoretical efforts aiming to elucidate the remarkable contact order-dependent folding rates, a series of closely related models under the name of "topomer search model" or "topomer-sampling model" (TSM) has been proposed in the past five years as a possible mechanistic basis for the empirically observed ratetopology relationship (Debe and Goddard 1999; Debe et al. 1999; Makarov and Metiu 2002; Makarov et al. 2002; Makarov and Plaxco 2003; Gillespie and Plaxco 2004). These topomer constructs share the idea that the folding of small single-domain proteins is a two-step process: The first step is an essentially unbiased, diffusive search for an overall native-like topological state termed the "native topomer state." In the second step, after the chain has formed the correct gross topology, the native contacts rapidly zipper to form the completed native structure. Since this second step is assumed to occur very rapidly, the rate-limiting step in folding is the essentially unbiased search process in the first step, and folding rates are therefore determined by the "size," or extent in conformational space, of the native topomer state.
The topomer folding picture may be schematically illustrated by the highly simplified energy landscape caricatures in Figure 1
. Here the energy landscape "outside" of the native topomer (the part represented by a steep funnel in Fig. 1
) has been drawn flat to highlight the unbiased nature of the hypothetical TSM process (Debe and Goddard 1999). This corresponds to the case in Makarov et al. (2002) for which "the barrier to folding is purely entropic," although the possibility of a general energy favoring contact formation has also been considered as part of a TSM process (Makarov et al. 2002, p. 3538; see Discussion below). Thus, in a proposed unbiased topomer-search scenario, the first step of the folding process is a search on this flat region of the energy landscape. Then, as soon as the native topomer is found, folding proceeds quickly and downhill to the native state until all native contacts are formed. As a consequence of this folding picture, the folding rate of a protein is controlled by a quantity Ptop, the probability that the native topomer state is populated during a random search. Slowand fast-folding proteins thus have "tight" and "loose" native topomers, respectively, corresponding to the two landscapes in Figure 1
with smaller and larger "golf holes."
|
- and
/
-proteins using a similar chain growth method (all-
proteins were excluded from this data set). The folding rates that resulted from their calculations showed a good correlation with experimental data (correlation coefficient 0.78).
In all TSM scenarios, the proposed folding process is quickly completed once the native topomer is located. It follows that if TSM is to provide a rationalization of apparent two-state folding mechanisms, the precise definition of the native topomer state is of critical importance. In the original TSMstudy by Debe and colleagues, "topomers are tubes of topologically equivalent conformations" (Debe and Goddard 1999): Two structures (conformations) were classified as topomeric (i.e., belonging to the same topomer state) based on a test procedure in which equal-strength harmonic forces were applied between corresponding C
atoms of two optimally superimposed structures. If a conjugate gradient minimization procedure could completely relax the spring forces, that is, if small backbone moves of one structure could bring it onto the other without getting trapped in a local minimum, the two structures were classified as belonging to the same topomer (Debe et al. 1999).
In the subsequent development of the TSM approach by Makarov and colleagues, however, the native topomer state was defined by specific collections of native contacts, as the set of conformations in which all sequence-distant native contacts are formed. In other words, TSM is the conformational ensemble in which all sequence-distant residues (as defined by the formalism) that are in contact in the native state are in close spatial proximity. Hence, the number of sequence-distant native contacts,
D, plays a central role in this more recent TSM formulation, as will be detailed below. This approach is different in many respects from the original TSM of Debe and Goddard (1999) and Debe et al. (1999). Certain features introduced in the later formulation of Makarov and colleagues are seen as instrumental in extending the TSMs ability to predict folding rates to include also purely
-helical proteins, as opposed to only
- and
/
-proteins (Makarov and Metiu 2002; Makarov et al. 2002; Makarov and Plaxco 2003).
The kinetic process of topomer sampling was envisioned by Debe and colleagues to be mainly among relatively compact conformations with low (favorable) solvation energies, as was depicted by the kinetic path on their "Rose Bowl" (a stadium in Pasadena, California) landscape (see Debe et al. 1999, Fig. 5
). This amounts to postulating a folding mechanism that involves a fast, "burst-phase" collapse to an ensemble of compact conformations that are partitioned into different topomers. In that case, the flat areas in Figure 1
represent only such compact conformations but not the open conformations that presumably have higher solvation energies. The bulk of the search time for the native topomer is then spent in different compact conformations. But such a folding mechanism does not appear to correspond to that of many real, apparent two-state proteinstypified by chymotrypsin inhibitor 2 (Jackson and Fersht 1991)that exhibit no significantly populated compact folding intermediates. In contrast, the topomer search approach of Makarov and colleagues seeks the probability of locating the native topomer among all possible conformations, most of which are presumably open (Makarov et al. 2002), and is therefore conceptually more in line with the principles of cooperative protein folding (Chan et al. 2004; Gillespie and Plaxco 2004).
|
D). The parameter
D (which is denoted QD in Makarov and Plaxco 2003) is closely related to the long-range order parameter LRO proposed previously by Selvaraj and Gromiha (2001), namely,
D/N=LRO provided that the same definition of a contact is used. Based on results from inert Gaussian chains (without consideration of polymer excluded volume), Makarov and Plaxco (2003) introduced the approximate formula P(
D)=
a
D where a and
are constants. This expression leads to a simple topology-based rate-prediction formula that was tested on a set of 24 two-state proteins (Makarov and Plaxco 2003), a diverse set that did not exclude all-
proteins as had been done previously (Debe and Goddard 1999). The rate-prediction formula showed excellent correlation with experimental folding rate data with a correlation coefficient ~0.9 (Makarov and Plaxco 2003), even though folding rates predicted using
D can sometimes be drastically different from those predicted using CO (Jones and Wittung-Stafshede 2003).
Because of this empirical success and the potential physical insights it may offer, it is imperative to take an in-depth look at the TSM perspective as several key aspects of it remain to be better elucidated. As it stands, TSM is not a self-contained, explicit-chain model. TSM investigations thus far have not used direct simulations of folding kinetics. Instead, kinetic behaviors and folding rates were deduced from presumed thermodynamicskinetics relationships. In this basic respect, TSM is akin to other non-explicit-chain modeling of CO-dependent folding that have also enjoyed high degrees of success (Alm and Baker 1999; Muñoz and Eaton 1999). However, the relationship between the thermodynamics and kinetics of chain molecules can be subtle, and explicit-chain results do not always agree with expectations from non-explicitchain considerations (Karanicolas and Brooks 2003; Chan et al. 2004). Therefore, ultimately, TSM or any other non-explicit-chain model of protein folding has to be evaluated by ascertaining whether and how the model assumptions can emerge from polymer physics. With this in mind, this article explores the feasibility of the proposed TSM folding mechanism by performing extensive Langevin molecular dynamics simulations using an explicit C
chain model. We focus on four apparent two-state proteins with diverse CO values. These chain models allow us to assess the logic of the physical picture afforded by TSM, and to perform structural comparisons with experiments.
In the analysis below, we first derive a clear structural characterization of the native topomers for the four proteins. Then, by using standard histogram reweighting techniques, we estimate their P(
D) values. We find them to be much smaller than that stipulated by the TSM approximation P(
D) =
a
D. In fact, our calculations show that finding the correct native topomer state in a random, unbiased TSM search is so unlikely that it is comparable to the hypothetical Levinthal search process. Finally, to connect TSM with common understanding of folding kinetics, we compare experimental
-values with TSM-predicted
-values calculated by taking the ensemble of conformations constituting a native topomer state to be the TSM-prescribed folding transition state.
| Results |
|---|
|
|
|---|
|
s of amino acids i and j are separated by a spatial distance rnij<rc and their sequence separation |i j| > lc. Thus, the number of contacts
D in this set depends on the choice of lc and rc. In Table 2
D values for our four proteins, for several different choices of lc and rc within the limits 4
lc
12, and 6 Å
rc
8 Å . Table 2
D depends rather strongly on lc and rc, and this dependence is particularly strong for 1lmb. For this protein,
D varies almost by a factor 10 between the different values of lc and rc considered here.
|
D amino acid pair has formed a "contact" or is otherwise in close spatial proximity. To simplify the terminology, in the following we will use "contact" to indicate simply that two C
atoms are close in space. For this purpose, two definitions of contact are used. Unless stated otherwise, we use the contact criterion rij<1.2rnij , where rij is the C
C
distance between i and j. This criterion has been used in previous studies of native-centric C
models (Clementi et al. 2000; Kaya and Chan 2003a). In certain considerations, we find it useful to use a slightly more permissive contact criterion, rij<rnij+3.0 Å , which entails a broader definition of a native topomer. It should be noted that both of these contact criteria are independent of the cutoff parameter rc.
Chain simulation of the native topomer search process
To assess the TSM folding picture by an explicit-chain approach, it is necessary to (1) construct an overall conformational ensemble that is essentially unbiased, (2) determine the size of the conformational space covered by the native topomer as a sub-ensemble of the overall conformational ensemble, and (3) estimate by explicit construction the probability that this conformational sub-ensemble can be located by a diffusive search, as envisioned by TSM. We adopt a simple C
chain model for this purpose, with the following potential energy function:
![]() | (1) |
where bi,
i,
i, and rij are the virtual bond lengths, bond angles, torsion angles, and C
C
distances, respectively, and bni,
ni,
ni, and rnij are the corresponding native values. This energy function is designed to capture the basic polymer statistics of a generic self-avoiding polypeptide chain. The use of the
ni and
ni values in the last Here we focus on the cutoff values lc and rc within the limits suggested by Makarov and Plaxco (2003). two terms introduces a small bias toward the native state. These terms are included here for computational efficiency. Their presence does not alter our main conclusion, based on the results from Equation 1, that the probability of locating the native topomer by an unbiased search is vanishingly small (see below). This is because the native-centric nature of these terms in Equation 1 implies that the relative size of the native topomer sub-ensemble would be even smaller in an entirely unbiased overall conformational ensemble. The first term of the above potential function accounts for polymer excluded volume. Its summation is over all amino acid pairs ij, and the parameters
ij are chosen in the following manner:
ij=0.85 rnij for pairs ij in contact in the native state (rnij <rc); otherwise,
ij=4.0 Å . To speed up the calculations, the repulsive energy term is evaluated using a cutoff procedure with cutoff radius 2
ij. This choice of the
ij parameters ensures that the native conformation would not be made inaccessible by unphysical repulsive forces.
A motivation for adopting a simplified C
model in this study rather than using chain representations that account for more structural details is computational efficiency: A simplified chain model allows for extensive conformational sampling and reliable probability estimations that are essential in addressing issues pertinent to the TSM. In this regard, it is noteworthy that the TSM itselflike several other topology-based folding scenariosis also an inherently structurally low-resolution formulation that does not refer directly to sequence-specific interactions (Makarov and Plaxco 2003). Hence, a C
chain model enjoys the advantage of making direct connections to the underlying mathematical approximations of the TSM. Continuum (off-lattice) C
models have been used extensively in the literature for studying protein folding, mostly in combination with G
-type energy functions (Clementi et al. 2000) but also for structure prediction (Nanias et al. 2003). In particular, our model is similar to the one introduced in Clementi et al. (2000) and further studied in Kaya and Chan (2003a). Following Kaya and Chan (2003a), we choose the strengths of energy terms in Equation 1 to be krep=
, kbon=100
, kben=20
, and ktor=0.5
, respectively, where the energy unit
is set to 1, and use Langevin dynamics to calculate the thermodynamic behavior. The temperature is kept constant at T=1.0, and snapshots are taken every 100 time steps. Simulation details are otherwise the same as in Kaya and Chan (2003a).
The native topomers
The energy function in Equation 1 describes a disordered and "floppy" protein chain that samples many different conformations. To obtain a subset of these conformations that corresponds to the native topomer state of each of the four proteins studied here, we impose harmonic constraints representing the sequence-distant topomer contact pairs on the C
chain. This is achieved by using the energy function
![]() | (2) |
where the summation shown is over the
D sequence-distant native contacts as defined by the cutoff parameters lc and rc (see Table 1
). The conformational sampling procedure here is similar to the determination of ensembles of partially disordered protein conformational states that uses experimental data as constraints (Choy and Forman- Kay 2001; Vendruscolo et al. 2001), the main difference being that the constraints in this study are supplied by the TSM hypothesis instead of experimental measurements. In Equation 2, the topomer constraint strength is taken to be ktop=10
, making these harmonic forces one order of magnitude weaker than the strength kbon of the virtual bonds between sequentially adjacent C
atoms. We perform Langevin dynamics simulations of the energy function Etop for each of our four proteins and different choices of the parameters lc and rc. Each simulation is started from the native structure. The system is first equilibrated for 108 simulation time steps, after which sampling is performed for 109 time steps.
In general, a smaller lc and a larger rc will produce a larger number of topomer constraints, resulting in a topomer ensemble that more closely resembles the native pdb structure. This trend can be seen from Table 3
, which shows thermodynamic averages of the radius of gyration Rg, the root-mean-square deviation (rmsd) from the corresponding native structure, and the fraction of native contacts Q for four different parameter choices of lc and rc. Smaller lc and larger rc naturally lead to smaller
rmsd
and larger
Q
values, consistent with more native-like ensembles. We note also that
Rg
remains fairly close to the corresponding native values (see Table 1
) across the various choices of lc and rc.
|
lc
12 and 6 Å
rc
8 Å studied here.
|
|
|
DP(
D), the probability of populating the native topomer during a random, diffusive search, since TSM considers this to be the rate-limiting step in the folding of small, single-domain proteins (Makarov and Plaxco 2003).
As mentioned briefly above, Makarov and Plaxco (2003) made the approximation, based on considerations of Gaussian inert chains, that P(
D)=
a
D, where
and a are constants applicable to all proteins. This means that bringing each additional sequence-distant native pair together contributes on average a constant multiplicative factor a (<1) to the overall probability P(
D) of the native topomer state among the full ensemble of all accessible conformations. (The first three contact pairs are assumed to contribute with somewhat different factors, which are taken into account by the overall multiplicative factor
1.) The topomer search rate was then postulated to be given approximately by the formula kf=
DP(
D), where kf is the folding rate, and 
D is the frequency of attempted contact formation. This formula was fitted to experimental folding rate data for a set of 24 twostate proteins yielding an excellent correlation coefficient of
0.88 (Makarov and Plaxco 2003). It should be noted that the dominant, exponential dependence on
D in this Makarov-Plaxco expression is in P(
D). Thus, this rate correlation formula resembles closely the ln kf
LRO hypothesis proposed previously by Selvaraj and Gromiha (2001) because LRO=
D/N.
The success of this empirical two-parameter fit implies that the structural quantity
D or LROlike the original CO parameter (Plaxco et al. 1998)is capturing significant aspects of the physics of apparent two-state protein folding. Obviously, the more important issue, then, is whether the physical picture offered by the TSM perspective is supported by this remarkable folding rate correlation as well. Does the empirical success of the TSM kf (
D) formula necessarily imply that the proposed process of topomer search is correct or even physically plausible? To provide a logical answer to this question, an essential first step is to determine the mathematical validity of the P(
D)=
a
D formula; more specifically, whether this formula, indeed, provides the native topomer probability it purports to describe. Obviously, unless the answer to this basic question is affirmative, the empirical success of the TSM kf(
D) formula can only be compared with empirical rate correlation with LRO and other proposed parameters but cannot logically be translated into support for the broader topomer-search discourse.
We therefore first focus on the TSM formula P(
D)=
a
D. By fitting experimental data, the two constants
and a have been determined to be a=0.86 and
=4x 105 for lc=12 and rc=6 Å (Makarov and Plaxco 2003). Using these fitted values of Makarov and Plaxco in conjunction with the
D values for lc=12 and rc=6 Å in Table 2
, the
a
D values for our four proteins are found to be ranging from 105 to 1010 (Table 4
). Thus, according to this Makarov-Plaxco formula, populating the native topomer state during a random conformational search is about 3 x 104 times more likely for llmb than for 1aps, for example; and this predicted probability difference is approximately equal to the folding rate difference of these two proteins.
|
D) formula adequately describe the relative size of the native topomer subensemble relative to the full conformational ensemble? To tackle this question, we compute the probability P(
D) for our four proteins in explicit-chain conformational ensembles defined by the energy function E0 in Equation 1. To err on the side of affording a higher probability to the native topomer, in this analysis we use the more permissive contact criterion rij<rnij+3.0 Å to determine if amino acid residues (C
positions) are in contact. The most direct way to ascertain P(
D) would be to perform Langevin dynamics simulations with E0 and count the fraction of conformations that fulfill the native topomer criterion, that is, having the
D amino acid pairs close in space. But such a bruteforce approach is not computationally feasible because of the smallness of P(
D), making the native topomer very unlikely to be visited by unbiased conformational sampling. Thus, to enhance sampling of the native topomer, we introduce the auxiliary energy function
![]() | (3) |
where Ebias is a potential with a bias toward the native topomer and
is a bias strength parameter. Two-dimensional histograms P
(nD, Ebias) are recorded for a series of
, where nD denotes the number of sequencedistant native contacts formed. A standard histogram reweighting technique (Ferrenberg and Swendsen 1989) is then applied to obtain an improved estimate of P0(nD, Ebias), and hence of P0(nD=
D)=P(
D). Further details of this reweighting procedure, including the precise form of Ebias, are given in Materials and Methods. It should be noted that by design Ebias is not a function of all C
positions, but only of those that are part of the native topomer constraint set, such that the added bias is toward the somewhat disordered native topomer state (whose sampling we aim to enhance) rather than toward the fully native pdb structure.
Table 4
shows the results of our P(
D) calculations for four different choices of lc and rc. For the four proteins studied, our explicit-chain estimates of P(
D) lie in the range ~1086 to 1018, in contrast to the much larger values produced by the formula P(
D)=
a
D using the particular a and
parameters of Makarov and Plaxco. In fact, some of the computed explicitchain P(
D) probabilities in Table 4
are as small as what would be expected for a Levinthal search. For instance, assuming that each Ramachandran torsion angle can occupy three discrete values, the total number of conformations for a 64-amino-acid protein (such as 2ci2
[PDB]
) would be roughly (3x3)64
1061. (The essential exponential increase with chain length N of the number ~ µN of all possible conformationsmost of which are noncompactapplies to chains with excluded volume as well. The effect of excluded volume in three dimensions leads to an appreciable but not large decrease in µ. For instance, for chains configured on simple cubic lattices, µ decreases from 6 to
4.68; e.g, see Barber and Ninham 1970; Chan and Dill 1991). From a polymer physics perspective, the results in Table 4
are not surprising. This is because the native topomer ensembles, especially those with relatively large
Ds (e.g., 1aps), P(
D) is the probability that the
D sequence-distant native contacts are formed during a search process described by the energy function E0 (see Equation 1). Two amino acid residues i and j are assumed to contact each other if rij<rnij +3.0 Å . For comparison, we have included the corresponding values from the Makarov-Plaxco P(
D)=
a
D formula with a=0.86 and
=4 x 105. are constituted of conformations quite closely resembling that of the native pdb structure itself (cf. Figs. 3
and 4
). A probable cause of the gross overestimation of native topomer probabilities by the Makarov-Plaxco
a
D formula is its neglect of excluded volume. Excluded volume places enormous restrictions on the conformational freedom of compact chains (Chan and Dill 1990), but excluded volume effects are not taken into account in the Makarov-Plaxco TSM formulation. We note that the relative values of the P(
D) probabilities computed here mostly follow the same order as that provided by the P(
D)=
a
D expression, suggesting that this formula might apply approximately for a different set of a and
. For example, fitting this formula to our explicitchain P(
D) results for lc=12 and rc=8 Å yields the parameter values of a=0.60 and
=4.5 x 1033; but in that case the exceedingly small value of
would not conform to the TSM picture of Makarov and Plaxco (2003), who have related a
~ 105 value to "the extra entropy associated with the first few ordering events" (Makarov and Plaxco 2003, p. 21). More fundamentally, however, the very small values of the present explicitchain P(
D) values strongly support the argument against the TSM speculation that the "entropic cost of finding the native topomer may be reasonable even in the absence of native-like interactions that may favor this set of conformations" (Makarov and Plaxco 2003, p. 22). Quite to the contrary, the vanishingly small native topomer probabilities we obtained imply that an unbiased, diffusive search for the native topomer among all accessible conformations, like the hypothetical Levinthal search, is highly unlikely to succeed and is therefore not a viable mechanism for apparent twostate folding.
-values and rate-limiting formation of the native topomer
In the TSM picture, the rate-limiting step of folding is the search for the native topomer. Thus, TSM predicts how the conformational ensemble of any given protein should look at the point when the rate-limiting step is achieved during the folding process. In principle, these structural predictions of TSM can be independently verified or falsified by experiments, irrespective of the mathematical/physical question raised above regarding whether the TSM formula provides the correct probability for the hypothetical native topomer state.
The native topomer state has been identified with the folding transition state. According to Makarov et al. (2002), the TSM model "assumes that the folding rate is controlled by the rate of forming all the native contacts observed in the folded protein" (note that these "contacts" refer to native topomer contacts that require a contacting pair to be separated by at least a certain number of residues along the chain), and that the state created when all contacts that define the native topomer is formed is a "transition state" of the model (Makarov et al. 2002, pp. 3536, 3539). Therefore, it is instructive to compare the structural characteristics of the native topomer state with the folding transition state properties inferred by experimental techniques.
A common approach to characterize the transition state ensemble for two-state proteins is through
-value analysis (Fersht et al. 1992). In these experiments, collection of single amino acid mutations is performed and their effects on the folding rate and stability are measured. The goal of
-value analysis is to map out the degree of "interaction" of the mutated amino acids in the transition state. The final result for an amino acid is expressed as a
-value, which states whether the amino acid is "ordered" (
1) or "disordered" (
0) in the transition state ensemble. Notwithstanding the recent controversy about the interpretation of experimental
-values (Sanchez and Kiefhaber 2003; Fersht 2004; Hubner et al. 2004),
-value analysis is a widely adopted approach. Thus, it is worthwhile to delineate the logical relationship between experimental
-values and the corresponding quantities implied by TSM, even though
-values were not computed by Makarov et al. themselves.
It is relatively straightforward to use the conformational ensembles from our simulations of the hypothetical native topomers to calculate TSM-predicted
-values, which can then be compared with available experimental
-value data for the four proteins studied here. Similar procedures have been applied before by other researchers to compare experimental
-values and theoretical
-values predicted by non-explicit-chain (Alm and Baker 1999; Galzitskaya and Finkelstein 1999; Muñoz and Eaton 1999) as well as explicit-chain (Clementi et al. 2000, 2003; Ejtehadi et al. 2004) models.
Before turning to more detailed calculations, several salient conclusions about TSM-predicted
-values can be drawn by considering the general properties of the native topomer state (cf. Fig. 2
). Any given amino acid residue making mostly sequence-distant native contacts in a proteins pdb structurewill, because of the topomer constraints, havemanyof its spatially neighboringamino acid residues in the native structure also in proximity in the native topomer state. Consequently, TSM would predict a high
-value for the given amino acid residue. In fact, if all of this amino acid residues native pdb contacts were sequence-distant,
would be predicted byTSMto be
1.Onthe other hand, an aminoacid residuemakingmostlylocal contacts, as mightbe the case for residues in
-helical regions, is likely to have a lower TSM-predicted
-value because by definition the amino acid residues it contacted in the native pdb structure are not required to be in spatial proximity in the native topomer state.
To calculate the TSM-predicted
-values, we use the following definition of
-value for an amino acid residue i:
![]() | (4) |
where
Ni
is the average number of native contacts made by amino acid residue i in the native topomer conformational ensemble, and Nn i is the total number of native pdb contacts for this amino acid residue. This definition is identical to that introduced by Vendruscolo et al. (2001) to construct transition-state ensembles from experimental data. In the present calculation, a contact is considered formed if rij<1.2rnij. Figure 5
compares
icalc and
-values obtained from experiments (Itzhaki et al. 1995; Burton et al. 1997; Chiti et al. 1999; Ternström et al. 1999). As expected, the TSM-predicted
icalc-values display a tendency to be lower in
-helical regions than in
-sheet regions. It is quite clear from Figure 5
that the TSM-predicted and experimental
-values do not follow the same trend along the chain sequences. The correlation coefficient between
icalc and
iexp is <0.1 in each of the four cases, although there are isolated individual calculated
-values that match the corresponding experimental values well. These results indicate that, at least for the four proteins studied, the TSM predictions about the structural characteristics of the rate-limiting transition- state conformational ensemble are very different from that obtained from common experimental
-value analysis.
TSM "
-values"
A more informative way to characterize the folding transition state than the
-values for individual amino acid residues is to consider the participation of pairs of contacting amino acid residues in the transition state. We apply this to the conformational ensembles of the four native topomer states, and determine the probability
ijcalc for each amino acid residue pair ij in the native contact set to be in contact in these topomer ensembles (Fig. 6
). As before, a contact in these ensembles is defined by rij<1.2rnij. These TSM-predicted quantities are analogous to the -values in experimental analysis, wherein bi-histidine metal binding sites are introduced for selected pairs of amino acid residue positions, and the stability of folding pathways with the metal binding pair formed relative to other pathways is regulated by metal ion concentration. The resulting -value of a native contact ij, a number between 0 and 1, then represents the ratio of protein molecules that have the given metal binding site in the transition state (Sosnick and Krantz 2001).
|
ijcalc values for the lc=12, rc=8 Å native topomer ensembles. By TSM definition, the probability of being in contact is close to 1 for pairs ij that are more than lc steps from the diagonal i=j. It is thus a general TSM prediction that
ijTSM
for amino acid pairs ij that are distant in sequence. Not surprisingly, in line with the calculated
-values, the TSM-predicted -values in Figure 6
-helical regions are relatively more disordered in the native topomer states.
-value analysis is a relatively novel method, and experimental data are quite limited thus far. None of our four proteins has been studied experimentally with this technique. Nonetheless, TSM predictions in Figure 6
TSM and native-centric explicit-chain folding models
In the above, we have shown by explicit-chain sampling that the population of the hypothetical native topomer state as a fraction of all accessible conformations is dramatically lower than that originally stipulated by Makarov and Plaxco (2003), and that the structural properties of the native topomer states of the four proteins considered in this study do not appear to resemble that of the corresponding folding transition states inferred from conventional experimental
-value analysis. To further elucidate the relationship between the TSM picture and explicit-chain dynamics, we now consider a class of native-centric models that allow for direct kinetic simulations of folding rates (Clementi et al. 2000; Koga and Takada 2001; Kaya and Chan 2003a).
One of the main justifications for the use of nativecentric models to study the folding process has been the empirically observed relationship between native topology and folding rate, emphasizing the important energetic information embodied in the native topology of natural proteins. This principle is, naturally, also central to the TSM. We therefore find it worthwhile to compare and to put into context these two types of approaches. In particular, we choose to compare the TSM with the Langevin dynamics version (Kaya and Chan 2003a) of the continuum C
model of Clementi et al. (2000), because a recent variation of this explicit-chain modeling construct has been shown to provide a good correlation between model-predicted and real folding rates of 16 apparent two-state proteins (Chavez et al. 2004).
Conformational search in these native-centric models is directed, not unbiased. The main energy term driving chain collapse and folding in this native-centric model consists of Lennard-Jones-like interactions between every pair of amino acid residues ij that satisfies |i j|>3 and forms a contact in the native pdb structure (Clementi et al. 2000). We consider two different definitions for two residues to be a native contact pair in this model: (1) if their C
C
separation in the native pdb structure is less than a certain cutoff distance rc; (2) if any two heavy (non-hydrogen) atoms, one from each residue, are within 4.5 Å ; this is identical to the definition used recently by Chavez et al. (2004). Definition (1) is sensitive to the choice of rc. It turns out that the thermodynamic behaviors of three of our four proteins do not appear to be two-state-like for rc=6 Å and rc=7 Å (data not shown). Here we focus only on the better-behaved rc=8 Å case. Computational runs of 109 simulation time steps are executed for each of the four proteins for the two different native contact sets we just described, using the potential energy function and Langevin dynamics simulation protocol detailed elsewhere (Equations 1, 2; see related discussion in Kaya and Chan 2003a). In the present investigation, the simulations are performed at these model proteins respective folding temperature Tf, the temperature at which the given chain models specific heat capacity is at its maximum.
Figure 7
shows our simulated free-energy profiles in the fraction of native contacts Q. The location of the free-energy barrier along these profiles (the single peak between the unfolded and folded minima at low and high Q, respectively) varies slightly between the four proteins and the two different contact sets. The chain structures in these peak regions are commonly taken as the transition state conformations for these and similar native-centric explicit-chain models (Clementi et al. 2000; Nymeyer et al. 2000; Chavez et al. 2004). However, it has been pointed out that not all structural details of transition-state conformations can be captured by such simple procedures, because the factors determining whether a conformation belongs to the folding transition state can be complex (Du et al. 1998; Hubner et al. 2004). In other words, some true transition- state conformations of the model may reside outside the Q-based free-energy peak region. Nonetheless, inasmuch as kinetic transitions between different Q-values are not too discontinuous, conformations populating the free-energy barrier region are indicative of general features of the conformations associated with the rate-limiting step in folding (Kaya and Chan 2002). With this in mind, in the following discussion we refer to the conformations in the free-energy barrier regions in Figure 7
as the putative transition states of the native-centric explicit-chain models.
|
0.40.6, and that they are quite dissimilar to the corresponding native topomer ensembles, which have very different
Q
-values (Fig. 7
Q
on the cutoff parameters lc and rc. Taken together, these observations suggest strongly that the native topomer states do not generally correspond to the folding transition state in these models. Instead, they are more representative of conformations that are visited after the rate-limiting step in the folding of these explicit-chain models.
Finally, we compare the TSM rate-prediction formula kf
Da
D and the folding rate results from the native-centric models. Following Chavez et al. (2004), we determine folding rates in the native-centric model at Tf using the direct Langevin dynamics simulation method of Kaya and Chan (2003a). Folding and unfolding rates are identical at the transition midpoint. Thus, the folding rate kf nc of the native-centric models at Tf is defined as the reciprocal of the mean first passage time (MFPT) from the beginning of an unfolded "phase" to the initiation of a folded phase, or vice versa, during the Langevin dynamics simulation [i.e., kf nc=(MFPT)1]. The unfolded and folded phases correspond to Q
Qu and Q
Qf, respectively, where Qu and Qf are the unfolded (low Q) and folded (high Q) free-energy minima. The MFPT was determined by averaging over 40 (1aps) to 3225 (1lmb) folding/unfolding events recorded during 109 simulation time steps. Figure 8
shows that the correlation between
Da
D and kf nc is good, which is perhaps not surprising given that both sets of results have been shown previously to correlate strongly with real folding rate data (Makarov and Plaxco 2003; Chavez et al. 2004). We note, however, that both the topomer search rate-prediction formula and the native-centric model predict 1urn to fold roughly an order of magnitude slower than 2ci2, but in fact the opposite is true (see Table 1
).
|
For the native-centric explicit-chain model considered here, every point in the conformational space has an energetic bias toward the native state. Frustration in the model arises solely from chain connectivity and excluded-volume effects. As Q increases, the combined effects of these incremental energetic favorabilities and the reduced conformational freedom as the chain acquires an increasing number of favorable contacts result in a free-energy slope (with respect to Q) that becomes first zero and then negative at intermediate values of Q. In contrast, TSM predicts conformational characteristics that are much more native-like at the rate-limiting step of folding. In this connection, it has been pointed out that, as a consequence of experimentally observed protein folding cooperativity, "an excess of 90% of the native structure is required for the free energy of a typical single-domain protein to drop below zero" (Gillespie and Plaxco 2004, p. 855). However, the configurational position of the transition state is determined by the gradient of free energy with respect to the reaction coordinate(s), not by the value of free energy itself. As an illustration of this basic principle, we note that a chain models stability and cooperativity can be arbitrarily enhanced by adding an ad hoc favorable energy to the unique native structure as a whole without affecting the folding kinetics (Kaya and Chan 2003b). Therefore, the above observation of Gillespie and Plaxco does not necessarily imply that transition states of real, cooperative proteins have to be very close to the native state. In fact, such a proposal would appear to be inconsistent with many chevron predictions of intermediate solvent exposure of folding transition states (P