|
|
||||||||
Institute for Physical Science and Technology, Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, USA
Reprint requests to: D. Thirumalai, Institute for Physical Science and Technology, Department of Chemistry and Biochemistry, University of Maryland, College Park, MD 20742, USA; e-mail: thirum{at}glue.umd.edu; fax: (301) 314-9404.
(RECEIVED October 17, 2001; FINAL REVISION January 29, 2002; ACCEPTED January 29, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.4220102.
| Abstract |
|---|
|
|
|---|
Rn (R is the aggregation prone state and G is either U, the unfolded state, or N, the native state of the monomer) is increased when polymerization occurs in the presence of a "seed" (a dimer). These results support the seeded nucleated-polymerization model of fibril formation in amyloid peptides. To probe generic aspects of aggregation in two-state proteins, we use lattice models with side chains. The phase diagram in the (T,C) plane (T is the temperature and C is the polypeptide concentration) reveals a bewildering array of "phases" or structures. Explicit computations for dimers show that there are at least six phases including ordered structures and amorphous aggregates. In the ordered region of the phase diagram there are three distinct structures. We find ordered dimers (OD) in which each monomer is in the folded state and the interaction between the monomers occurs via a well-defined interface. In the domain-swapped structures a certain fraction of intrachain contacts are replaced by interchain contacts. In the parallel dimers the interface is stabilized by favorable intermolecular hydrophobic interactions. The kinetics of folding to OD shows that aggregation proceeds directly from U in a dynamically cooperative manner without populating partially structured intermediates. These results support the experimental observation that ordered aggregation in the two-state folders U1A and CI2 takes place from U. The contrasting aggregation processes in the two models suggest that there are several distinct mechanisms for polymerization that depend not only on the polypeptide sequence but also on external conditions (such as C, T, pH, and salt concentration). Keywords: Protein aggregation; prions; amyloids; phase diagram; Monte Carlo simulations
| Introduction |
|---|
|
|
|---|
-helical state in the normal isoform of prion protein to a ß-sheet conformation in PrPSc, which is the aggregated state of the pathogenic scrapie form (Jarrett and Lansbury 1993; Prusiner 1998). Similarly, the oligomeric form of Aß-peptides (implicated in Alzheimer's disease) has a predominantly ß-sheet architecture even though, in its monomeric form, it is a random coil over a wide range of external conditions (Kelly 1996; Harper and Lansbury 1997). Neither the mechanism of the conformational change nor the propagation leading to oligomers are fully understood at the molecular level. Although the study of the aggregated form of disease-causing proteins has been the object of intense study, only recently has it been appreciated that almost any protein can form aggregates under appropriate conditions (Booth et al. 1997). More surprisingly, Dobson et al. (Chiti et al. 1999; Jimenez et al. 1999) found that even the structures of the aggregates of "normal" proteins (formed under nonphysiological conditions) are similar to the fibrils that are implicated in the neurodegenerative diseases. These findings suggest that there may be generic mechanisms by which aggregation takes place. Because there are a number of distinct folding mechanisms of monomeric proteins, it is likely that, at the molecular level, there may also be several distinct scenarios for protein aggregation.
A goal of this paper is to explore the generic mechanism by which oligomerization of proteins takes place using lattice models. An obvious variable, besides the polypeptide sequence, that controls aggregation is protein concentration, C. Interactions between the polypeptide chains become important if C exceeds the overlap concentration C*
N/V, where
is a constant, N is approximately the number of residues in the polypeptide chain, and V is the volume associated with one polypeptide chain. If electrostatic interactions are important, V can exceed 4/3;
R3g, where Rg is the radius of gyration of the polypeptide chain (see Materials and Methods). Besides C, other factors can also influence aggregation. The equilibrium between the unfolded (U) and the native (N) states of a monomeric protein, which folds in a single step, depends on the sequence of the polypeptide chain as well as on the external conditions (pH, temperature, and salt concentration). The response of U and N to external conditions can be different (e.g., denaturants can stabilize U, whereas N is destabilized). Such opposing tendencies as well as sequence-dependent variations of folding of the monomer make the study of protein aggregation difficult.
Our goal is to decipher general scenarios for protein association using lattice models. We consider two different models. The first one, introduced recently by Harrison, Chan, Prusiner, and Cohen (HCPC) (Harrison et al. 2001), is used to probe a number of aspects of self-propagation that involves conformational change from a compact state in the monomer to an extended "ß-sheet" conformation in the oligomer phase. This model was designed to be a toy model for probing prion-like behavior. Aggregation and chain propagation in this system occur from a conformation other than U or N. This is in accord with the popular proposal that aggregation in amyloid forming peptides and proteins takes place by populating (at least transiently) partially folded intermediates (Kelly 1996).
In contrast to the previous examples in which oligomer assembly is accompanied by a large conformational transition, several studies (Silow and Oliveberg 1997; Silow et al. 1999) suggest that proteins, which fold by two-state kinetics at infinite dilution (C
0), can undergo reversible aggregation directly from the U state. To decipher the general principles that govern aggregation in these two-state proteins, we use three-dimensional lattice models with side chains (LMSC). Using LMSC and temperature (T) and C as variables, we address the following questions: What are the "phases" in the (T,C) plane? How does the assembly into some of these states occur starting from U?
Because of the simplicity of the models used, only general questions about principles can be addressed. Nevertheless, we show that questions of experimental interest, such as the role of partially folded intermediates in facilitating aggregation, can be addressed using simple models. The present work, which expands on previous studies of aggregation using ON AND OFF-lattice models (Broglia et al. 1998; Gupta et al. 1998; Istrail et al. 1999; Giugliarelli et al. 2000; Smith and Hall 2001), provides a framework for obtaining insights into the scenarios for oligomerization.
| Results |
|---|
|
|
|---|
The most remarkable aspect of the HCPC model (see Materials and Methods) is that, using a simple two-dimensional lattice model with four kinds of beads, they obtained a sequence (Fig. 1a
) that propagates upon interaction with other monomers. In the process, the chain undergoes a large conformational change from a compact native state to an extended "ß-sheet"-like conformation. The energy spectrum of the 16-mer (Fig. 1b
) and the thermodynamics of the monomer can be computed using exact enumeration of all possible conformations. The collapse transition temperature T
, and the folding temperature TF, determined by the standard methods (Camacho and Thirumalai 1993), are 1.90 and 1.10, respectively. We measure temperature in units of
(see Materials and Methods). Because of the clear separation between T
and TF we expect that, under typical folding conditions (T
TF), fluctuations could populate partially folded intermediates that could make this sequence susceptible to aggregation. The folding time
F at T = TF, which is computed from the distribution of first passage times (Camacho and Thirumalai 1993), is 8 x 106 MCS, while at TH = 1.41 (mildly denaturing condition) it is
F/4.
|
![]() | ((1)) |
We equilibrated the chains at an elevated temperature (T = 10.0) for about 5 x 106 Monte Carlo steps (MCS). After equilibration we let T = TH, and the dynamics of 100 trajectories was followed until R2 is reached for the first time. The dimer appears in all trajectories. From the distribution of the first passage times, one can compute PuR2 (t), the fraction of molecules that have not reached the R2 state. The time for forming R2, which is obtained from the double exponential decay of PuR2 (t), is about 2 x 107 MCS (2.5
F).
By following the dynamics of the individual trajectories, we find that there are three routes to the R2 state (Fig. 2a
). In pathway I (Fig. 2a
), through which the maximum flow to R2 occurs at T = TH, the monomers initially fold to their native states on a time scale that is approximately
F. Subsequent assembly, involving the major conformational change, requires near-global unfolding of both chains (see below). For this pathway, unfolding of the chains and subsequent aggregation is the rate-limiting step (Fig. 2a
). In pathway II (Fig. 2a
), one of the chains folds rapidly to the native conformation, whereas the other remains unfolded. The time for R2 formation in this pathway requires global unfolding of only one of the chains, which is followed by assembly. Consequently, the mean time for dimerization in pathway II is less than that in pathway I. In pathway III, both chains interconvert among the ensemble of unfolded conformations until the dimer is formed. In this pathway (Fig. 2a
) conformational changes and assembly occur nearly simultaneously. From the distribution of first passage time we find that the shortest time to reach the dimer is in pathway III in which R2 is reached directly from the ensemble of denatured states
A and
B.
|
The partitioning of the pool of initial population of molecules into three distinct routes with drastically different oligomerization times is reminiscent of the kinetic partitioning mechanism (KPM) for monomeric folding of lysozyme (Guo and Thirumalai 1995; Kiefhaber 1995; Matagne et al. 1997). A generalization of KPM to describe amyloidogenesis kinetics has been proposed recently (Massi and Straub 2001). The present computations lend support to the notion that oligomerization kinetics involves parallel routes to aggregation.
Templated polymerization
A number of years ago Griffith (1967) proposed that an autocatalytic polymerization of proteins can occur provided "suitable components are available," that is, growth of the oligomers by propagation is possible if a template already exists. In the templated assembly (TA) model the propagating conformation of the monomer is presumed to be different from the native state (Griffith 1967). The propagating state R in the HCPC model is the sixth "excited state" of the chain (Fig. 1b
). There are 43 other conformations with the same energy as R. Upon forming the template, recruitment of additional monomers (TA) becomes possible, whereas spontaneous aggregation of many monomers is less likely. The propagation of the R2 dimer has already been addressed by HCPC. These authors showed that it is kinetically easier to add a third monomer to a preformed dimer (Harrison et al. 2001). The reaction N + R2
R3 occurs much more rapidly than spontaneous formation of R3 starting from three monomers in their native states. Thus, the presence of the template autocatalytically enables propagation of the oligomers. Assuming that the template exists, the propagation of the R2 for this model protein seems possible.
We consider two models for TA. In the first, referred to as directed TA (DTA), the polypeptide chain is only allowed to move to the right of the template, which breaks the translational symmetry. This might physically correspond to assembly when a minimal template unit (say a dimer) is immobilized on a glass plate. This appears to be the situation considered by HCPC. The other case corresponds to assembly in solution, namely, TA. This is more akin to seeded nucleated polymerization (Harper and Lansbury 1997) provided that the dimer is the "nucleus" (see below).
To explore TA for the model protein we undertook a series of simulations (at T = TH) to study the reaction
![]() | ((2)) |
In all cases we find that the addition of the nth monomer to an already existing template of Rn-1 occurs by two major pathways (Fig. 2b
). Addition occurs most rapidly when the monomer in the U state is added to the template. In this situation, growth of the oligomer and conformational change occur nearly simultaneously. For this sequence, in the DTA, the amplitude of this kinetically efficient pathway (II) is smaller than that of the major pathway (I) (Fig. 2b
). In the latter case, the unfolded monomer initially folds. Subsequent growth and TA can occur only upon global unfolding of the chain, which explains the slower kinetics along this pathway (see below). In the DTA simulations, we restricted the motion of the chains such that each chain was allowed to move only in the half space where it is placed initially and by forcing the free chain to move only to the right side of the template. The time scales in the DTA for R2 + U
R3 are comparable to those of Harrison et al. (2001). The restriction placed on the motion of the chains breaks the symmetry of the space and, in essence, produces an entropy barrier for assembly. To probe TA directly, we allowed the chains to move freely in space. This is appropriate for examining propagation of oligomers in solution. For the dimer formation, we ran 122 trajectories at T = TH until R2 is first reached. The average folding time is
3
F (compared to
10
F in the DTA). The three pathways found above (Fig. 2a
) are present here also, but with two important changes: (1) the dominant route is UA + UB
A + ÑB
R2, which is pathway II in Figure 2a
; and (2) the average conversion time of the UA + UB
ÑA + ÑB
R2 pathway is shorter than in the DTA simulations. For the R2 + U
R3 reaction we ran 100 trajectories at T = TH until R3 is reached. The free chain is restricted in its motion only by the constraint between its center of mass and the center of mass of the dimer (Materials and Methods). The average assembly time is
10
F (compared to 100
F in DTA). The two pathways found before (Fig. 2b
) are still present with only minor changes in the amplitudes. However, there is a drastic change in the average assembly time for R2 +U
R2 + N
R3. It goes from
100
F (DTA) to
10
F (TA).
From the DTA and TA simulations it is clear that the conformational change from N
R can occur efficiently only if a template is already present. However, the present simulations cannot distinguish between TA and the nucleated-polymerization (NP) model (Harper and Lansbury 1997). According to the NP model, the formation of the critical nucleus is the rate-determining step. This is usually discerned by the observation of a lag phase in polymer growth. Such a lag phase is not observed in the TA. Because in this model the nucleus is a dimer, TA and NP appear to be superficially similar. To distinguish between the two, simulations at a fixed seed concentration (a dimer in this case) but varying monomer concentration should be carried out. The growth rate would be independent of C in the TA model but would increase as C
(
unknown) in the NP model. Although not conclusive, the near constancy of the conversion time for Rn-1 + U
Rn (n
5) seems to suggest that self-propagation in the HCPC model is consistent with TA.
Minimum size of the propagating oligomer
It is important to determine the minimal number of chains in the template that leads to fast assembly, that is, the size of the "nucleus." To determine this, we followed 150 trajectories for the reaction R + N
R2 (that is, one chain is fixed in the R conformation while the other one is free to move around it starting from its native state conformation) at T = TH. The average folding time, obtained from the one-exponential fit of the fraction Pu(t) of molecules that does not dimerize, is 3.2 x 107 MCS, which is almost identical to the conversion time obtained by HCPC for the reaction N + N
R2. By comparison, the average conversion time for R2 + N
R3 is one order of magnitude smaller than that for N + N + N
R3 (Harrison et al. 2001). Therefore, for the HCPC model sequence we infer that the minimal number of chains in the template that promotes fast TA is 2.
Transient unfolding precedes assembly
For the model proteins the propagatable species, namely the R conformation (Fig. 1c
), is a high energy state. For the monomer the ratio fR = fN =e-
E/kBT = 0.007 at T = TH, where
E = ER - EN, and fR and fN are the occupation fractions of the R and N states, respectively. This shows that the R state is not significantly populated at the simulation temperature. Nevertheless, the R state has to be populated at least transiently for polymerization to occur. To monitor the dynamics of formation of the R state, we computed for each trajectory
|
| ((3)) |
The correct interface is formed only when Nnnint = 0 and Nnint = 8. The time evolution of these two quantities along a trajectory from each of the three pathways shown in Figure 2a
can be found in Figures 3, 4, and 5![]()
![]()
. These graphs show that in the kinetically efficient pathway III (Fig. 5
) there is always a certain number of native interface contacts present. This is not the case in pathways I and II (Figs. 3,4![]()
) where native interface contacts are formed, then are broken and reformed until the R2 interface is finally reached. In pathway I there are instances (even late in the trajectory) where there are zero native interface contacts and a substantial number of Nnnint. To form R2, structural rearrangements that lead to Nnnint = 0 and Nnint = 8 are required. This is possible only in the presence of substantial chain unfolding.
|
|
|
The R3DS conformation was not found in 2.7 x 108 MCS when all chains were allowed to move. The two chains that are initially in the R2DS conformation evolve rapidly from it, and they fold into their native state (Fig. 1a
). The third chain, which starts in the native state, reaches the RDS state, but there is no instance when all three chains are in the RDS state simultaneously. This result is not necessarily surprising in light of the fact that even the formation of the R3 conformation from three free chains takes much longer than through TA.
In the simulations of TA for propagation by the domain-swapped mode, we ran three trajectories for 3.4 x 108 MCS at T = TH, which is nearly an order of magnitude longer than the time needed to form R3 (Fig. 2b
). The R3DS conformation is not found in any of these trajectories. In all trajectories, the free chain passes through the RDS state relatively early in the assembly process and through its native state at a later time. None of the native (R3DS) interface contacts are found. These results suggest that the R2DS conformation does not propagate for this sequence. It would be interesting to find the combination of sequence and external conditions for which both Rn and RnDS form.
Lattice models with side chains: Phase diagram
To obtain the phase diagram of a generic two-state folder we use the cubic lattice model with side chains (LMSC) for the sequence given in Figure 6a
. Previous studies (Klimov and Thirumalai 1998; Klimov and Thirumalai 2001) have established that this sequence folds by two-state kinetics, and the folding thermodynamics is cooperative. Thus, at T
T
TF, aggregation prone states are not populated. However, at high values of C, interactions between chains are possible. This may lead to oligomerization of the various species.
|
TF
T
) and low C independently folded conformations should be populated. These are the most stable states at infinite dilution. As in CI2, the transition between the two states takes place by nucleation collapse or a condensation mechanism (Fersht 1997; Klimov and Thirumalai 2001). As the concentration is increased, beyond a typical overlap value C* (see Materials and Methods), interactions between chains enable aggregation. At T
TF the oligomers are made of fluctuating unfolded monomers. Because of the interactions, the conformations of these unfolded monomers are different from noninteracting random coils. Such species, which may be thought of as the analog of the molten globule for monomeric proteins, may be precursors to fibril formation in amyloidogenic proteins. At T
TF we expect an ordered oligomer with many favorable interface contacts that stabilize it with respect to independently folded monomers. As the concentration is increased still further, it is possible to get amorphous or disordered oligomers that do not order for long times, that is, they exhibit "glassy" behavior. The structures of these disordered complexes are expected to be different at high and low temperature. Thus, based on physical considerations alone, we expect a rather rich phase diagram in the (T,C) plane even for a protein that, at infinite dilution, folds by two-state kinetics.
We obtained the phase diagram numerically by computing the lowest free-energy structures for a two-chain system (see Materials and Methods). The results, along with the theoretical predictions, are displayed in Figure 7
. The six >>"phases" predicted above are also found in our simulations thus confirming the richness of the predicted phase behavior of interacting two-state folders. However, numerical simulations show unexpected structures in the ordered phase of oligomers (region IV of Fig. 7
). In the "ordered oligomers" area of the phase diagram, we also found an analog of a domain-swapped (DS) dimer (Fig. 6b
) that was not anticipated based on the theoretical considerations given above. An appropriate "order parameter" that distinguishes DS from the ordered dimer (OD; Fig. 6a
) is the fraction of native interchain contacts,
int (eq. 10
). The value of
int for the DS dimer is 0.7, signifying that its interface is very different from that of the OD, for which
int = 0. In the DS (Fig. 6b
) there are side-chain to side-chain interface contacts out of which 12 (86%) are among the 20 side-chain to side-chain contacts that are present in the native state of the monomer. As expected (Bennett et al. 1995), the energy of the DS dimer is higher than that of the OD.
|
TF, where large conformational fluctuations are possible. Because fluctuations of the native state resulting in partially folded or globally unfolded states facilitate aggregation, it is logical to suggest that in some proteins domain swapping might be a mechanism for fibrillization. In the examples studied by Eisenberg et al. (Bennett et al. 1995), domain swapping involves the replacement of a portion of the tertiary structure of a protein by an identical piece from a second chain. This occurs if the native protein structure possesses a part at the beginning/end of the sequence that is stabilized only by local contacts. The native-state conformation of sequence A does not have this characteristic, and therefore, in the LMSC model domain swapping occurs only when all interchain contacts are identical to the native intrachain contacts. In our model, the majority of these contacts are formed between beads located in the middle of the chain. This special structure leads to a "dead-end" DS dimer in the LMSC.
The contiguous presence of a series of hydrophobic side chains in the model sequence (see caption to Fig. 6
) also presents the possibility of forming dimers in which the interacting residues between the two chains are parallel to each other. The resulting (PD) dimer (Fig. 6c
), whose ground-state energy is slightly lower than that of the DS structure, shows an interface that is stabilized by strong contacts between the hydrophobic moieties. PD can form in naturally occuring sequences such as Aß peptides that are unstructured as monomers. They form parallel and antiparallel ß-sheets that are stabilized by interchain contacts between the hydrophobic patch (LVFFA) as well as complimentary electrostatic interactions (Balbach et al. 2000). The PD we observe may be thought of as an analog of the dimer of Aß peptides. The formation of a stable PD requires sequences in which there are many contiguous hydrophobic residues.
If topologically forbidden contacts (see Materials and Methods) are allowed to form, then additional low-energy conformations such as the variant dimer VD (Fig. 6d
), which has an energy of -33.4, can form. The monomers in the VD state are compact and highly ordered. For reasons explained in Materials and Methods, we believe that the VD conformation is an artifact of lattice models.
Dimerization kinetics: Dependence on T and C
The aggregation kinetics in the nonpropagating LMSC model protein should be qualitatively different from the two-dimensional prion-like behavior found in the HCPC model. In the HCPC model the formation of the R2 state either from U or N requires conformational change to the R state in each chain prior to assembly. In contrast, the formation of OD in the LMSC model requires that both chains fold to the native conformation with precise relative orientation so that the native interface may be created. Because large conformational changes do not occur at the (T,C) values that we have examined, it is likely that higher order oligmoers do not form for this sequence.
The most popular hypothesis for protein aggregation (London et al. 1974; Speed et al. 1995; Fink 1998) is that, in the folding process or due to fluctuations, partially folded intermediates are populated. These structures, which typically have some exposed hydrophobic residues, can aggregate. The morphology of such aggregates could be either ordered (amyloid fibrils) or amorphous (inclusion bodies). The hallmark of this hypothesis is that the formation of an aggregation-prone intermediate is a prerequisite for polymerization. Experiments on a number of proteins under a wide range of conditions support this model (Fink 1998).
Recently, Silow et al. (1999) have shown that aggregates, presumably ordered ones, can form in CI2. It is well known that this small single-domain protein folds by two-state kinetics when C
0. Silow et al. (1999) asserted that CI2 aggregation takes place directly from U because there are no intermediates that form under a wide range of conditions. If oligomerization takes place directly from U, this should be reflected in the dynamic cooperativity of the aggregation kinetics. To provide a picture of the assembly process, we simulated the dimerization kinetics for the three-dimensional sequence which, like CI2, folds kinetically in a single step with a
F = 2 x 106 MCS at T = TF (0.26) (Klimov and Thirumalai 1998).
Because very long times are required to probe aggregation kinetics in LMSC we have not performed as exhaustive simulations as for the HCPC model. Typically, we generated between 1050 trajectories for a time period ranging from (50500)
F, from which meaningful conclusions can be drawn. Although we have explored the dimerization kinetics for a range of (T,C) values, here we focus on conditions under which the ordered dimer (OD) forms (Fig. 7
). The OD (Fig. 6a
), which has an energy of -33.4, consists of the two chains in their native conformation (each with energy -14.5 and side chain [sc] side chain contacts) with a well-defined interface made of eight scsc contacts. To monitor the formation of the OD, we followed the time evolution of
s and
int (eqs. 8 and 10![]()
in Materials and Methods). The transition to OD occurs when both
s and
int
0 for a particular trajectory. Among the 40 trajectories of maximum duration of at most 4 x 108 MCS (that is, 200
F), only 65% reach the OD. In the remaining trajectories either one or both chains fold. There are also trajectories when neither chain folds. Thus, the pathways to the OD are similar to those in Figure 2a
except the time scales in the LMSC are extremely long. Among the remaining trajectories, 15% reach the DS dimer and 20% reach the PD conformation. A high degree of order is present in the PD conformation where each chain has 18 scsc intrachain contacts (out of which 15 are native). A characteristic of all these three types of conformations (OD, DS, and PD) is that each chain has a total (intra- and interchain) of 28 or 29 scsc contacts.
Although the yield of OD during the above-mentioned duration is only 65%, the analysis of the trajectories that reach the OD reveals that there are at least two pathways in the assembly of the oligomer: (1) the two chains fold simultaneously, and the OD is found immediately afterwards (that is, the folded chains have proper relative orientation leading to the formation of the native interface); (2) there is a considerable difference between the times when each of the two chains first reaches its native state and between the time the slowest chain folds and the formation of the OD. The first pathway is the minor one (35%), with an average assembly time of
34
F, while the second pathway corresponds to an average conversion time of
100
F.
The time evolution of
s and
int at T = 0.24 (i.e., <TF) (Fig. 8a,b
) shows that there is dynamic cooperativity in the formation of OD. At t
3 x 107 MCS both
s and
int
0, indicating the formation of the OD. Thus, the formation of the folded monomer as well as the proper native interface occurs simultaneously. Although there are fluctuations in the OD after its formation, they are relatively small. The OD is stable under these conditions of (T,C). Our computations support the finding by Silow et al. (1999) that aggregation can proceed directly from U without the need to populate partially folded intermediates.
|
| Discussion |
|---|
|
|
|---|
In the oligomerization process, especially in the three-dimensional model with side chains, we found several distinct structures for the dimer. The three basins of attraction correspond to OD, DS, and PD. The simulations show that the interbasins transitions do not occur on the long time scale of the simulations, suggesting that these structures are separated by fairly substantial barriers. We also found that kinetic partitioning into these structures occurs very early in the assembly process, and depends on the initial conditions. Although a direct comparison to prion strains using these toy models is a stretch, finding that distinct conformers assemble on distinct time scales with vastly different structures is not inconsistent with the explanation of scrapie strains. A similar suggestion has been made before using lattice models (Harrison et al. 1999). Bessen et al. (1995) have suggested, based on biochemical studies of hyper and drowsy strains in mink, that different strains reflect distinct ways in which the altered conformation of normal PrPC packs. If the time for nucleating a particular structure is shorter than for transitions between various conformers, then the growth of the polymer will depend on the structure of the crystal seed. In our model systems the OD, despite having the lowest free energy, will not be formed if conditions favor the formation of an alternative packing (PD or DS), that is, if the initial partitioning leads to DS or PD.
Generic aspects of protein aggregation
One of the most significant findings in this work is the richness of the phase diagram even in simple toy models. For both models there are a number of structures that form depending on the (T,C) values. The simulations also show that aggregability critically depends on the sequence. For the HCPC sequence the most likely structure of the polymer, Rn, resembles ß-sheet architecture. Although RnDS is, in principle, possible, it appears to be kinetically and thermodynamically unstable for this sequence. A bewildering number of "phases" are possible in the lattice model with side chains. There are six phases in all (Fig. 7
). More remarkably, in region IV (ordered oligomers in Fig. 7
) there are additional structures that emerge. The amazing array of structures can be experimentally probed in the (D,C) plane where D is the concentration of denaturants.
The present study and the one by Harrison et al. (2001) show that the phase diagram depends on the precise sequence. A single-point mutation in the HCPC sequence can result in nonpropagating oligomers (Harrison et al. 2001). This remarkable observation can be rationalized in terms of the stability of the monomeric native state or the topology of N. Similarly, the emergence of PD in the LMSC, which is reminiscent of the Aß dimer, is related to the presence of a large number of contiguous hydrophobic residues. For example, we find that for another sequence (HWFIGYQRWFRKEWM) in the LMSC model the oligomeric structures are different. In particular, this sequence, which has the same native state as in Figure 6a
, undergoes a conformational change upon dimerization that is similar to that seen in HCPC and amyloid forming peptides (R.I. Dima and D. Thirumalai, unpubl.).
Aggregation mechanisms
The two models provide contrasting mechanisms for oligomerization kinetics. In the HCPC model, dimerization and subsequent growth occur only when a high-energy intermediate is transiently populated. For this case, the growth of the polymer is facilitated by the presence of a template made of the minimal dimer propagating unit. Although a complete test of nucleation polymerization including the presence of lag time has not been demonstrated, our calculations show that growth of the polymer occurs if a dimer "seed" is already present. This is in conformity with the current model for fibril formation in amyloids and prions (Harper and Lansbury 1997). Our computations also support the notion that aggregation occurs from intermediates.
On the other hand, the LMSC model sequence, which is a two-state folder, dimerizes directly from U. The formation of OD is dynamically cooperative. This finding supports the experiments on U1A and CI2 (Silow et al. 1999) in which ordered aggregates form from U rather than from partially structured intermediates. Our calculations and a growing body of experimental work show that many scenarios for protein aggregation are possible. Besides the patterns of hydrophobic-hydrophilic residues in the sequence (West et al. 1999), external conditions (pH, protein concentration, denaturants, presence of a seed, and temperature) can alter the aggregation mechanism.
Folding and aggregation: Evolutionary implications
It is clear that oligomers (ordered or amorphous) can form, depending on the (T,C) values. Because aggregation can take place directly from U, just as seen in CI2 (Silow et al. 1999), the relationship between aggregability and folding depends not only on the sequence, but also on the external conditions. Designing sequences that avoid aggregation over a broad range of external conditions might require optimizing the folding rates such that they are considerably longer than the equivalent pseudo first-order rates for oligomerization. We also find that fluctuations in the native conformation can result in aggregation and, therefore, a designed sequence must also satisfy certain stability requirements. Thus, natural sequences may have evolved to simultaneously optimize both folding rates and stability for a range of (T,C) values. The two requirements might not be simultaneously satisfied for certain folds, and therefore, the evolved sequences could be a compromise. These arguments emphasize the need for considering aggregation in conjunction with folding (Smith and Hall 2001) to decipher the mechanisms by which a monomeric protein reaches the native state.
| Materials and methods |
|---|
|
|
|---|
|
| ((4)) |
is the lattice spacing. The contact energies are eH,H = -4
, eH,P = -2
, eH,A = -1
, eH,B = -1
, eP,P = -3
, eP,A = -2
, eP,B = -2
, eA,A = 0
, eA,B = -5
, eB,B = 0
, with
> 0. Following Harrison et al., we chose the sequence in Figure 1
Three-dimensional lattice models with side chains (LMSC)
To compute the generic phase diagram and oligomerization kinetics of two-state proteins we consider a more realistic representation of the polypeptide chain. In the LMSC model each aminoacid residue is represented as two beads: one corresponding to the
-carbon, and the other to the center of mass of the side chain (Bromberg and Dill 1994; Klimov and Thirumalai 1998). The
-carbon beads, representing the backbone, are connected, and the side chain beads are connected by a covalent linkage to one backbone bead. The energy of a conformation is given by
|
| ((5)) |
ij are taken from Table III of Kolinski, Godzik, and Skolnick (KGS) (Kolinski et al. 1993). For this three-dimensional model enumeration of all the conformations even for N = 15 is not possible. The thermodynamic properties for the sequence, labeled A in Klimov and Thirumalai (1998), are calculated using multiple histogram methods.
Mimicking the effects of concentration of the polypeptide chain
Protein aggregation occurs only when two polypeptide chains interact. To obtain accurate phase diagrams and the kinetics of association, multichain simulations are needed. Because such simulations are computationally intensive we devise a simple protocol to ensure that two chains interact. To describe our method it is useful to recall that whenever the protein concentration C exceeds the overlap concentration C*
N/Rg3 (Rg is the radius of the polypeptide chain), there is proteinprotein interaction. This estimate for C* assumes that the interaction between proteins is short ranged. If electrostatic interactions play a role, then Rg should be replaced by Rgapp, where Rgapp (>Rg) is the range over which the electrostatic potential is significant. For C < C*, the solution is dilute enough so that one can only observe monomeric folding. For a given value of C the mean distance between the centers of mass of two chains is R*cm
C-1/3. Thus, the effect of concentration can be approximately mimicked by considering the restricted partition function
|
| ((6)) |
Rg chain association is possible. This way of incorporating the effect of C in the simulations allows us to obtain the qualitative features of the phase diagram in the (T,C) plane. In the simulations, the
function constraint is replaced by
|
| ((7)) |
Notice that according to equation 7
the fluctuations in R are of the order
, which need not be an integer. In practice, to probe dimer formation in the 2D case, we let Rcm = 3.50, which is large enough to allow each chain to fold independently to its monomeric native state. This value also allows for interaction between chains so that dimerization can occur. Typically,
= 1.0. In the case of the TA calculations, Rcm is the distance between the centers of mass of the template and of the free monomer. A direct way of including concentration effects in simulations is to have many chains in a box, as has been done by Gupta et al. (1998) in their two-dimensional HP lattice model (Dill et al. 1995). For a similar model Istrail et al. (1999) probed aggregation by allowing two independently folding chains to collide. By measuring the energies in the kinetics simulations for aggregation, they compute the probability of dimer formation. In the simulations of HCPC (Harrison et al. 2001), the chains are forced to interact by recentering them periodically. This is achieved by having a center of mass translation every 106 MCS. Because of forced interactions, molecular crowding effects are mimicked. Our method of involving the concentration effects may be viewed as a mean-field limit of simulation in a periodic box. The effect of other chains at finite concentration is to force any two chains to move within Rcm
C-1/3.
Probes of aggregation kinetics
To characterize the evolution of the ensemble of chains towards its ground state conformation, we defined two-order parameters analogous to the o