|
|
||||||||
Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
Reprint requests to: Dr. Yaoqi Zhou, Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA; e-mail: yqzhou{at}buffalo.edu; fax: (716) 829-2344.
(RECEIVED February 19, 2002; FINAL REVISION April 10, 2002; ACCEPTED April 10, 2002)
1 These authors contributed equally to this work. ![]()
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0205002.
| Abstract |
|---|
|
|
|---|
helices, ß strands, and loops are the basic building blocks of protein structure. The folding kinetics of
helices and ß strands have been investigated extensively. However, little is known about the formation of loops. Experimental studies show that for some proteins, the formation of a single loop is the rate-determining step for folding, whereas for others, a loop (or turn) can misfold to serve as the hinge loop region for domain-swapped species. Computer simulations of an all-atom model of fragment B of Staphylococcal protein A found that the formation of a single loop initiates the dominant folding pathway. On the other hand, the stability analysis of intermediates suggests that the same loop is a likely candidate to serve as a hinge loop for domain swapping. To interpret the simulation result, we developed a simple structural parameter: the loop contact distance (LCD), or the sequence distance of contacting residues between a loop and the rest of the protein. The parameter is applied to a number of other proteins, including SH3 domains and prion protein. The results suggest that a locally interacting loop (low LCD) can either promote folding or serve as the hinge region for domain swapping. Thus, there is an intimate connection between folding and domain swapping, a possible cause of misfolding and aggregation. Keywords: Total contact distance; loop contact distance; protein folding; domain-swapping; loop formation; fragment B of protein A
| Introduction |
|---|
|
|
|---|
All-atom folding simulations using empirical force fields are not yet possible using available computing power. We overcome this limitation by using a structure-based, all-atom (except nonpolar hydrogen atoms) model in which the atoms interact by discontinuous G
potentials (Zhou and Linhananta 2002a, b). The goal is to determine the folding kinetics from the native structure. Thermodynamic analysis of the model (Zhou and Linhananta 2002a) found that the inclusion of side chains eliminates the molten-globule-like state often encountered in the C
-based model (Zhou and Karplus 1997; Pande and Rokhsar 1998). The same all-atom model has also yielded a collapse-initiated folding mechanism for the second ß-hairpin fragment of the Ig-binding domain B of Streptococcal protein G and revealed the essential role of both hydrophobic and hydrophilic residues. The results are highly consistent with available experimental data, as well as with other all-atom unfolding and equilibrium simulation studies of the same ß-hairpin (Zhou and Linhananta 2002b).
The new model allows us to fold the 46-residue, 459-atom BpA. At a reduced temperature (T*) of 2.5, 80 out of 197 (41%) independent trajectories folded to the native state in
16 µsec or
50 h each on a 1-GHz pentium PC. Folding kinetic results suggest two folding pathways mediated by loop formations. In the dominant, fast-folding pathway, L2 forms first, resulting in an H2-H3 intermediate that rapidly folds to the native state. In the slower pathway, L1 initiates the formation of an H1-H2 intermediate, leaving L2 unformed. The long lifetime of H1-H2 suggests that it is a possible template for a domain-swapped dimer in which L2 is the hinge loop. These remarkable properties are attributed to the fact that L2 interacts weakly with the rest of the protein. To generalize, we quantify the topological connectivity between a loop and the rest of protein by a parameter called loop contact distance (LCD). Applications of LCD reveal that the different roles of a weakly interacting loop (low LCD) in the folding of the model BpA are also observed in many other proteins.
| Results and Discussion |
|---|
|
|
|---|
|
|
Although this study suggests a specific pathway that differs from other theoretical works, some important aspects have been previously observed. For example, I12 was also found in the equilibrium free-energy surface analysis based on an all-atom CHARMM model in explicit solvent (Boczko and Brooks 1995; Guo et al. 1997). The faster formation of H3 in our model is consistent with the slower unfolding of H3 in an all-atom unfolding simulation study of BpA in explicit solvent (Alonso and Daggett 2000). In addition, loop formation as a rate-determining step was found in other theoretical studies of BpA as well (Shea et al. 1999; Berriz and Shakhnovich 2001). However, the focus of this paper is not on the accuracy of our results, but rather on the understanding of the high-resolution details obtained from the folding simulations of the model BpA. We will show that our interpretation of the results has important general implication to the folding and misfolding of proteins.
The all-atom folding simulations reveal that L2 forms first most of the time. To interpret this, we define loop contact distance as the total contact distance (TCD) between the loop and the rest of the protein (see Materials and Methods). Similar to the observed correlation between high folding rates and low TCD values, loops with lower LCD values are expected to form earlier. Indeed, we found that the LCD value for L2 of BpA (0.22) is 4.3 times less than the value for L1 (0.94). This explains the dominance of the I23 pathway by a factor of 3.7 over the I12 pathway. L1 has many nonlocal long-distance contacts (i - j > 3) with H2 and H3, whereas L2 has only two short-distance, nonlocal contacts (Pro 39 with Ser 34 and Leu 35). A loop with a large number of nonlocal contacts (high LCD) must overcome a large entropic barrier, associated with the conformational search to make the contacts, and hence will take longer to form. The early formation of L2 was also observed in a C
-based model of BpA that has an orientationally dependent native potential (Berriz and Shakhnovich 2001).
To further prove the utility of LCD, one needs to know the rates of the formation of different loops in a protein. Such information is not yet directly available. The role of loop formation in protein folding has been mostly derived from protein engineering experiments (Fersht 1995). In protein engineering experiments, the transition-state ensemble of a protein-folding reaction is characterized by
values. For any residue,
1 means that native contacts involving the residue are mostly formed at the transition state, whereas the opposite is true for
0.
To address the question of whether or not a loop with the lowest LCD value is most likely to have the highest
value, we compiled a list of proteins with high
1 values in the loop regions (Table 1
). That is, we only surveyed proteins with a loop that is known to be formed at the transition state. We should emphasize that only a qualitative comparison can be made because a
value is not a direct indicator for the rate of folding. The results are mixed.
|
1, indeed, has the lowest LCD value (WW domain,
spectrin SH3, src-SH3) or a value close to the lowest (Sso7d-SH3 domain). There is no experimental result for all-
proteins. For
/ß proteins, however, LCD values are predictive for loops with
1 with the stipulation that the comparison of LCD values is between the loops connecting ß strands only (EE loops of Table 1
value and the EE loop with the lowest LCD value occurred by chance is 0.003.
It is not clear why LCD values are predictive only for the loop between two ß strands (and possibly between two
helices for which there is no data). It could be that the transition state is more structured than indicated by
-value analysis (Bulaj and Goldenberg 2001; Ozkan et al. 2001). Or, more likely, it suggests the limitation of the LCD parameter. After all, the TCD parameter (see Materials and Methods), from which the LCD parameter is derived, is an empirical parameter that has an approximate correlation with folding rate (Zhou and Zhou 2002). It cannot accurately describe the mutation-induced change in folding rate. Nevertheless, the results verify the concept that for many proteins, ß-sheet proteins in particular, the loop with the lowest LCD value plays a key role in folding.
The results presented above also highlight the important role of the detailed native structure in determining folding pathways (Zhou and Linhananta 2002b), even for proteins with identical topology. For example, both proteins G and L are made of a four-stranded ß-sheet that packs with an
helix. The difference in their folding pathways was attributed (McCallister et al. 2000) to the difference between the internal stabilities of the two ß-hairpin loops. This is consistent with our findings here. A loop with a low LCD value is somewhat isolated from the rest of the protein and thus is likely to be more stable in isolation.
Another interesting observation regarding the folding simulations of BpA is that the structures of I12 and I23 have domain-swapped forms, similar to those found in simpler C
-based model (Zhou and Karplus 1999). Domain-swapping (Schlunegger et al. 1997) refers to the exchange of a domain of a protein with the same domain of a second identical protein. An interesting question is, what would be the most probable dimeric domain-swapped structure for the model BpA if it exists? Clearly, a longer-lived intermediate would have a higher probability of colliding with another intermediate to form a domain-swapped dimer. Because I12 has a higher barrier for folding (see Fig. 1b
), it is more likely to serve as a template for a domain-swapped dimer. Physically, it can be understood as the lack of strong driving force for the formation of the weakly interacting L2 in the late stage of folding. This picture is consistent with the view that in the early stage of protein folding, an unfolded protein must overcome entropic barriers, whereas in the later stage, the barriers are more energetic (
ali et al. 1994).
The above interpretation suggests that a loop with low LCD may serve as the hinge region for domain swapping. To verify this hypothesis, we performed a literature search for proteins that have both monomeric and dimeric domain-swapped structures. Proteins with dimeric domain-swapped structures but without corresponding structures for same-sequence monomers are not included in the database. For these proteins, we decided not to use monomer structures from homologous proteins or mutants because LCD values may be significantly different even for proteins within the same structural family, as shown in Table 1
. For example, the domain-swapping human cystatin C (Janowski et al. 2001) is not included in the database because only the monomer structure of chicken cystatin C is available.
The LCD values of proteins with both monomeric and domain-swapped dimeric structures are shown in Table 2
. The LCD values correctly predict the hinge regions of prion, D9k, EPS8-SH2 domain, and Grb2-SH2 domain. For RNase A, the loop with the lowest LCD value corresponds to the hinge region of the major domain-swapped dimer (Liu et al. 2001). For protein Cyanovirin-N, the hinge region has the second lowest LCD with a value close to the lowest value. One exception is the hinge region of the minor domain-swapped dimer of RNase A (Liu et al. 1998) that has an intermediate LCD value. This may be due to the fact that one of the hinges becomes helical in the dimeric form. A more serious exception is protein L V49A mutant, for which the EH loop (the loop between the helix and strand 3) has the lowest LCD value but the second EE loop (the second ß-hairpin) is the hinge region. Nevertheless, the analysis shows that in most cases (subject to the limitation of the availability of experimental data), a loop with the lowest LCD value is prone to domain swapping. The probability that the observed correlation between the hinge loop and the loop with the lowest LCD value occurred by chance is 0.0003 (excluding the minor domain-swapped dimer of RNase A).
|
Hence, a weakly interacting loop can play dual, and opposite, roles. It can promote folding by bringing connecting secondary structure units together to form a transition state. Alternatively, it can participate in misfolding by serving as the hinge region for a domain-swapped dimer if it did not fold in the initial stage. More remarkable is the fact that these specific behaviors are encoded in the native structure and can be predicted in most cases by a simple structural parameter even for proteins with identical native topology. Moreover, the evidence reported in this work suggests that the mechanism of misfolding, as manifested in the formation of domain-swapped dimer, is also largely determined by the native structure.
| Materials and methods |
|---|
|
|
|---|
model in which the square-well depth is -
for atomic pairs with overlap in the native structure and 0, otherwise was used. The hard-core and square-well diameters were 0.8 and 1.2 times the Van der Waals diameters obtained from the CHARMM parameter set 19 (Neria et al. 1996).
Kinetics
Discontinuous molecular-dynamics techniques were used for the simulations. There were 197 folding simulations started from the equilibrated coil-like state at T* = kBT/
= 4 and quenched to T* = 2.5. (The folding transition temperature Tf* is 3.3, obtained by a weighted histogram analysis of equilibrium simulation data at different temperatures [Zhou and Linhananta 2002a]. No significant changes in folding kinetics were observed at T* = 3.) The physical time unit was obtained by setting 100 reduced time units, the approximate contact time between any two residues that are two residues apart, to be equal to the experimentally measured time of
20 nsec (Bieri et al. 1999). Structures were saved every 100 reduced time units, for a total of 800 recorded structures for each simulation. The fractions of native H1-H2 and H2-H3 atomic contacts (i.e., the ratio between the number of native contacts in the structure to the number of native contacts in the native structure) were calculated for each structure. The two-dimensional probability distribution in Figure 1a
was calculated by binning the fraction values of the 197 x 800 configurations into bins with widths of 0.05.
Loop contact distance
Loop contact distance LCD is derived from total contact distance, (TCD; Zhou and Zhou 2002) which improves over contact order (Plaxco et al. 1998) in predicting folding rates. TCD is defined as the contribution to the average sequence separation by contacting residues within a cutoff distance Rcut.
![]() | ((1)) |
![]() | ((2)) |
loops, no such restriction was applied.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Bai, Y.W., Karimi, A., Dyson, H.J., and Wright, P.E. 1997. Absence of a stable intermediate on the folding pathway of protein A. Protein Sci. 6: 14491457.[Abstract]
Berriz, G.F. and Shakhnovich, E.I. 2001. Characterization of the folding kinetics of a three-helix bundle protein via a minimalist Langevin model. J. Mol. Biol. 310: 673685.[CrossRef][Medline]
Bieri, O., Wirz, J., Hellrung, B., Schutkowski, M., and Drewello, M. 1999. The speed limit for protein folding measured by triplettriplet energy transfer. Proc. Natl. Acad. Sci. 96: 95979601.
Boczko, E.M. and Brooks III, C.L. 1995. First principles calculation of the folding free energy of a three-helix bundle protein. Science 269: 393396.
Bottomley, S.P., Popplewell, A.G., Scawen, M., Wan, T., Sutton, B.J., and Gore, M.G. 1994. The stability and unfolding of an IgG binding protein based upon the B domain of protein A from Staphylococcus aureus probed by tryptophan substitution and fluorescence spectroscopy. Protein Eng. 7: 14631470.
Bulaj, G. and Goldenberg, D.P. 2001.
-Values for BPTI folding intermediates and implications for transition state analysis. Nat. Struct. Biol. 8: 326330.[CrossRef][Medline]
Favrin, G., Irbäck, A., and Wallin, S. 2002. Folding of a small helical protein using hydrogen bonds and hydrophobicity forces. Proteins 42: 99105.
Fersht, A.R. 1995. Characterizing transition states in protein folding: An essential step in the puzzle. Curr. Opin. Struc. Biol. 5: 7984.[CrossRef][Medline]
Guerois, R. and Serrano, L. 2000. The SH3-fold family: Experimental evidence and prediction of variations in the folding pathways. J. Mol. Biol. 304: 967982.[CrossRef][Medline]
Guo, Z.Y., Brooks III, C.L., and Boczko, E.M. 1997. Exploring the folding free energy surface of a three-helix bundle protein. Proc. Natl. Acad. Sci. 94: 1016110166.
Hakansson, M., Svensson, A., Fast, J., and Linse, S. 2001. An extended hydrophobic core induces EF-hand swapping. Protein Sci. 10: 927933.
Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W., and Gruebele, M. 2001. The folding mechanism of a ß-sheet: The WW domain. J. Mol. Biol. 311: 373393.[CrossRef][Medline]
Janowski, R., Kozak, M., Jankowska, E., Grzonka, Z., Grubb, A., Abrahamson, M., and Jaskolski, M. 2001. Human cystein C, an amyloidogenic protein, dimerizes through three-dimensional domain swapping. Nat. Struct. Biol. 8: 316320.[CrossRef][Medline]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Karplus, M. and Weaver, D.L. 1976. Protein-folding dynamics. Nature 260: 404406.[CrossRef][Medline]
Kim, D.E., Fisher, C., and Baker, D. 2000. A breakdown of symmetry in the folding transition state of protein L. J. Mol. Biol. 298: 971984.[CrossRef][Medline]
Kishan, K.R., Newcomer, M.E., Rhodes, T.H., and Guilliot, S.D. 2001. Effect of pH and salt bridges on structural assembly: Molecular structures of the monomer and intertwined dimer of the Eps8 SH3 domain. Protein Sci. 10: 10461055.
Knaus, K.J., Morillas, M., Swietnick, W., Malone, M., Surewicz, W.K., and Yee, V.C. 2001. Crystal structure of the human prion protein reveal a mechanism for oligomerization. Nat. Struct. Biol. 8: 770774.[CrossRef][Medline]
Kolinski, A., Galazka, W., and Skolnick, J. 1998. Monte Carlo studies of the thermodynamics and kinetics of reduced protein modelsapplication to small helical, ß, and
/ß proteins. J. Chem. Phys. 108: 26082617.[CrossRef]
Kraulis, P. 1991. MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Applied Cryst. 24: 946950.
Kuhlman, B., O'Neill, J.W., Kim, D.E., Zhang, K.Y.J., and Baker, D. 2001. Conversion of monomeric protein L to an obligate dimer by computational protein design. Proc. Natl. Acad. Sci. 98: 1068710691.
Liu, Y., Hart, P.J., Schlunegger, M.P., and Eisenberg, D. 1998. The crystal structure of a 3D domain-swapped dimer of RNase A at a 2.1 Å resolution. Proc. Natl. Acad. Sci. 95: 34373442.
Liu, Y., Gotte, G., Libonati, M., and Eisenberg, D. 2001. A domain-swapped RNase A dimer with implications for amyloid formation. Nat. Struct. Biol. 8: 211214.[CrossRef][Medline]
Martínez, J.C. and Serrano, L. 1999. The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nat. Struct. Biol. 6: 10101016.[CrossRef][Medline]
McCallister, E.L., Alm, E., and Baker, D. 2000. Critical role of ß-hairpin formation in protein G folding. Nat. Struct. Biol. 7: 669673.[CrossRef][Medline]
Myers, J.K. and Oas, T.G. 2001. Preorganized secondary structure as an important determinant of fast protein folding. Nat. Struct. Biol. 8: 552558.[CrossRef][Medline]
Nauli, S., Kuhlman, B., and Baker, D. 2001. Computer-based redesign of a protein folding pathway. Nat. Struct. Biol. 8: 602605.[CrossRef][Medline]
Neria, E., Fischer, S., and Karplus, M. 1996. Simulation of activation free energies in molecular systems. J. Chem. Phys. 105: 19021921.[CrossRef]
O'Neill, J.W., Kim, D.E., Johnsen, K., Baker, D., and Zhang, K.Y.J. 2001. Single-site mutations induce 3D domain swapping in the B1 domain of protein L from Peptostreptococcus magnus. Structure 9: 10171027.
Ozkan, S.B., Bahar, I., and Dill, K.A. 2001. Transition states and the meaning of
-values in protein folding kinetics. Nat. Struct. Biol. 8: 765769.[CrossRef][Medline]
Pande, V.S. and Rokhsar, D.S. 1998. Is the molten globule a third phase of proteins? Proc. Natl. Acad. Sci. 95: 14901494.
Plaxco, K.W., Simons, K.T., and Baker, D. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277: 985994.[CrossRef][Medline]
Riddle, D.S., Grantcharova, V.P., Santiago, J.V., Alm, E., Ruczinski, I., and Baker, D. 1999. Experiment and theory highlight role of native state topology in SH3 folding. Nat. Struct. Biol. 6: 10161024.[CrossRef][Medline]
ali, A., Shakhnovich, E.I., and Karplus, M. 1994. How does a protein fold? Nature 369: 248251.[CrossRef][Medline]
Schiering, N., Casale, E., Caccia, P., Giordano, P., and Battistini, C. 2000. Dimer formation through domain swapping in the crystal structure of the Grb2-SH2-Ac-pYVNV complex. Biochemistry 39: 1337613382.[CrossRef][Medline]
Schlunegger, M., Bennett, M., and Eisenberg, D. 1997. Oligomer formation by 3D domain swapping: A model for protein assembly and misassembly. Adv. Protein Chem. 50: 61122.[Medline]
Shea, J.E., Onuchic, J.N., and Brooks, C.L. 1999. Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc. Natl. Acad. Sci. 96: 1251212517.
Shea, J.-E., Onuchic, J.N., and Brooks III, C.L. 2000. Energetic frustration and the nature of the transition state in protein folding. J. Chem. Phys. 113: 76637671.[CrossRef]
Yang, F., Bewley, C.A., Louis, J.M., Gustafson, K.R., Boyd, M.R., Gronenborn, A.M., Clore, G.M., and Wlodawer, A. 1999. Crystal structure of Cyanovirin-N, a potent HIV-inactivating protein, shows unexpected domain swapping. J. Mol. Biol. 288: 403412.[CrossRef][Medline]
Zhou, Y. and Karplus, M. 1997. Folding thermodynamics of a model three-helix-bundle protein. Proc. Natl. Acad. Sci. 94: 1442914432.
. 1999. Interpreting the folding kinetics of helical proteins. Nature 401: 400403.[CrossRef][Medline]
Zhou, Y. and Linhananta, A. 2002a. Thermodynamics of an all-atom off-lattice model of the fragment B of staphylococcal protein A: Implication for the origin of the cooperativity of protein folding. J. Phys. Chem. B 106: 14811485.[CrossRef]
. 2002b. Role of hydrophilic and hydrophobic contacts in folding of the second ß-hairpin fragment of protein G: Molecular dynamics simulation studies of an all-atom model. Proteins 47: 154162.[CrossRef][Medline]
Zhou, H. and Zhou, Y. 2002. Folding rate prediction using total contact distance. Biophys. J. 82: 458463.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
Z. Luo, J. Ding, and Y. Zhou Temperature-Dependent Folding Pathways of Pin1 WW Domain: An All-Atom Molecular Dynamics Simulation of a Go Model Biophys. J., September 15, 2007; 93(6): 2152 - 2161. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Laurine, C. Gregoire, M. Fandrich, S. Engemann, S. Marchal, L. Thion, M. Mohr, B. Monsarrat, B. Michel, C. M. Dobson, et al. Lithostathine Quadruple-helical Filaments Form Proteinase K-resistant Deposits in Creutzfeldt-Jakob Disease J. Biol. Chem., December 19, 2003; 278(51): 51770 - 51778. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Meiler and D. Baker Coupled prediction of protein secondary and tertiary structure PNAS, October 14, 2003; 100(21): 12105 - 12110. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |