|
|
||||||||
1 Basic Research Program, Science Applications International Corp. (SAIC)-Frederick, Inc., Laboratory of Experimental and Computational Biology, Frederick, Maryland 21702, USA
2 Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
Reprint requests to: Ruth Nussinov, Basic Research Program, SAIC-Frederick, Inc., Laboratory of Experimental and Computational Biology, NCI-Frederick, Building 469, Room 145, Frederick, MD 21702, USA; e-mail: ruthn{at}ncifcrf.gov; fax: (301) 846-5598.
(RECEIVED March 29, 2004; FINAL REVISION May 23, 2004; ACCEPTED July 12, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: protein building block; computational protein design; combinatorial assembly; protein G; ubiquitin; molecular dynamics simulation
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04774004.
| Introduction |
|---|
|
|
|---|
The computational procedures are outlined in Figure 1
: Briefly,
|
In our algorithm, the criterion of hydrophobicity will ensure that the candidates will have similar hydrophobic/hydrophilic pattern as the original building blocks. On the other hand, the small RMSD criterion constrains the candidates to those with similar topology as the original building blocks. In the combinatorial assembly procedure, candidates are superimposed onto their corresponding building blocks in the native protein. Thus, this procedure ensures that the engineered protein will have a fold similar to the original native protein, similar hydrophobic and hydrophilic pattern, but low sequence identity. The minor nonequilibrium energy, which may exist in the original engineered proteins, is removed by force field energy minimization.
The algorithm proposed here is very similar to experiments of protein domain swapping and combinatorial shuffling of polypeptide segments except that the "domain" is defined by building blocks in our algorithm. In the computational and experimental domain swapping design study, Voigt et al. (2002) defined the protein building blocks by minimum disturbance of the integrity of the protein 3D structure using concepts of schema theory of genetic algorithms. Building blocks defined either by minimum disturbance or by fold independence can be regarded as relatively stable protein fragments in a given protein. Mayo and Arnold (Meyer et al. 2003) have further constructed a combinatorial library to estimate the disruption caused upon substitution of schemas due to altered interactions in the 3D structures upon schema shuffling. Other fragment-based approaches include protein design by phage display libraries. This strategy has been employed to computationally and experimentally design a four-helix bundle protein (Chu et al. 2002), coupling phage display and proteolysis. Interestingly, the authors find that the positions of the cutting sites of the protease may significantly influence the selection of structures. Pioneering studies of limited proteolysis by Fontana et al. (1997, 1999) have long shown that fragments obtained through a limited proteolysis strategy can be combined to yield the native protein. This suggests that fragments obtained through such applications can be used both for studies of protein folding pathways and for protein design. The number of potential combinations in protein design is huge, as shown in the first pioneering completely automated zinc finger redesign by Mayo and his colleagues (Dahiyat and Mayo 1997; Dahiyat et al. 1997). Fragment-based approaches reduce the number of combinations in a designed protein. An alternate algorithm to reduce the huge number of degrees of freedom involves a statistical computationally assisted design strategy. This method has recently successfully designed water-soluble analogs of a potassium channel (Slovic et al. 2004) and a monomeric helical dinuclear metalloprotein (Calhoun et al. 2003). Still another promising strategy involves an application of the Rosetta Design algorithm (Dantas et al. 2003). Additionally, new protein engineering techniques using multiple stabilizing substitutions were recently employed by Peng and coworkers (Cammett et al. 2003). These techniques were shown to yield remarkable results, enhancing the stability of cyclin-dependent kinase inhibitor and renovating Cdk4 binding activity of several flawed cancer-associated mutant proteins.
Recombination is a powerful tool for the engineering and optimization of proteins in vitro (Crameri et al. 1998; Riechmann and Winter 2000). It enhances design through combination of fragments from different proteins to form a new protein with a potential new function. Here, rather than substituting a single residue at each location, our approach substitutes fragments. Importantly the fragment size varies, depending on its identification as a local minimum along the polypeptide chain. The minimum size is 15 amino acids, and the maximum can be any size. A fragment-based approach reduces the computational cost dramatically. At the same time, criteria such as those defined above ensure that the topology and hydrophobic/hydrophilic patterns of engineered protein are similar to the native protein. The similarity between an engineered protein and its parent native protein will likely ensure that the engineered protein has good opportunity to be stable.
Two proteins, protein G B1 domain (PDB code: 2gb1 [PDB] ) and ubiquitin (PDB code: 1ubq [PDB] ), were selected for engineering. These two engineered proteins share ~20% and ~25% amino acid identity, respectively. Like native proteins, the engineered proteins also have a hydrophobic core and the hydrophilic side chains are exposed to the protein surface. In addition, the engineered proteins have similar folds as their corresponding native proteins. On the other hand, two "nonproteins" with inverted polar/nonpolar residue patterns (with no or poor hydrophobic cores) based on the topologies of protein G B1 domain and ubiquitin were also engineered for control. The stabilities of the engineered and control proteins were tested by explicit water molecular dynamics simulations. Employing this computational algorithm, we are able to engineer new, similar fold, low homology proteins based on a selected native protein, and to examine the idea whether the building blocks are stand-alone fragments. The computational methods developed here may assist in combinatorial design of new functional proteins.
Computational algorithm
There are three major procedures in computational algorithm of protein engineering: (1) building block (BB) cutting algorithm, (2) candidate BB searching and in silico protein engineering algorithms, and (3) stability tests by molecular dynamics simulations. The tertiary structure of selected native protein is partitioned into a set of building blocks by estimating their compactness, degree of isolation, and hydrophobilicity. The building blocks are regarded as relatively stable and highly populated fragments. Based on the structure, sequence, and the hydrophobilicity pattern of the building blocks, candidate BBs with similar structure, low sequence identity, and similar hydrophobic/hydrophilic pattern are searched against the Protein Data Bank (Berman et al. 2000). The best candidate BBs are superimposed onto the corresponding BBs C
architectures. Finally, the stability of the engineered protein is examined by molecular simulations. The three procedures are described as below.
Building block cutting algorithm
The detailed description of the building block cutting algorithm has been published elsewhere (Tsai et al. 2000), and is only briefly outlined here. A scoring function estimates the relative stability of a candidate building block. The scoring function is expressed as:
![]() | (1) |
where Z, H, and I are the compactness, hydrophobicity, and degree of isolation, respectively. The hydrophobicity score (H) is defined as the fraction of the buried nonpolar surface area over the total nonpolar surface area,
![]() | (2) |
where NonASABuried and NonASASurf are the buried and the exposed nonpolar surface area (Tsai et al. 2000). The subscripts avg and dev are the arithmetic average and the standard deviation, respectively, obtained from a nonredundant data set of 930 representative single chain proteins. Quantities with superscripts 1 and 2 are calculated with respect to fragment size and a function of the fraction of fragment size to the entire protein, respectively. The selected candidate BB has a high stability score as estimated from equation 1, which represents the minimum deviation from the averaged values. Fragments with various lengths (minimum 15) are estimated for their stability scores. The procedure is carried out iteratively until the building blocks can no longer be cut. The resulting spanning tree delineates the most likely protein folding pathways.
Candidate BB search and in silico protein engineering algorithms
Once a native target protein has been cut into its building blocks, the structures and sequences of its BBs are used to search the PDB for substitute fragments. Four criteria are used in the candidate BBs search:
-RMSD: The C
-RMSD (original vs. candidate BB) is expected to be as small as possible (<2.5 Å).
![]() | (3) |
expectation value
is the expected value of the experimental hydrophobicity scale (EHS) difference between the 20 amino acids. The experimental hydrophobicity scales are taken from Fauchere and Pliskas work, and the expectation value is 1.151 based on this scale (Fauchere and Pliska 1983). Therefore, for a candidate without any similarity to the original BB, we expect its hydrophobilicity pattern to be equal to a unit. A selected candidate has a smaller hydrophobilicity pattern.
In this study, 19,294 protein structures with a total of 36,653 chains (when chain length >15) deposited in the Protein Data Bank were searched. Finally, the engineered protein is assembled by superimposing the candidate BBs onto the native protein architecture. To ensure that two connected BBs are covalently joined properly, larger (10 times) weighting factors are used for the N- and C-terminal C
atoms of each candidate BB in the superimposition and assembly procedures. The unassigned fragments (i.e., those between BBs) are kept in the engineered protein. These criteria and procedures ensure that the engineered protein will have a similar fold as the native protein. On the other hand, it will have low sequence identity. Additionally, it will also own a good hydrophobic core.
Stability test by MD simulations
The stability of the engineered proteins is tested by molecular dynamics (MD) simulations. To assess whether a protein is stable and folded by computer simulations is a challenging task. It is not only limited by the accuracy of the theory (e.g., force field), but also restricted by the computer power (i.e., simulation time). Protein folding is on the milliseconds to microseconds to seconds time scale. Current computers are incapable of routinely offering such long time simulations. The engineered proteins constructed based on the algorithms proposed above are assumed to have structures similar to their native structures. Namely, the original engineered protein may have a structure similar to its native one. Therefore, explicit water MD simulation on an order of nanosecond simulation time might be long enough to serve as a first test in examining the stability of the engineered proteins.
All simulations were performed with CHARMM (Brooks et al. 1983). The system was treated explicitly with the all atom model using CHARMM-22 force field (MacKerell et al. 1998). A series of MD simulations were performed for the native, engineered, and nonproteins at room temperature with the explicit water TIP3P model (Jorgensen et al. 1983). The proteins were solvated with explicit water molecules in a cubic box. The size of box depends on the size of the protein to preserve infinite dilution. All simulations were performed using the NVT ensemble under periodic boundary conditions with the minimum image convention. The systems were energy-minimized by the Adopted Basis Newton-Raphson (ABNR) prior to the MD simulations. A group based distance cutoff was applied at 12 Å and 13 Å when generating the list of pairs. The force switching function was used to smooth the electrostatic potential energy (pair-wise distances between 812 Å), whereas the van der Waals shift function was used to smooth the van der Waals potential energy (Steinbach and Brooks 1994). The non-bonded neighboring list was updated every 20 steps. In the simulations, the C
-RMSD of the native proteins was expected to be lower than that of the engineered proteins, which was used as the low bound reference. In contrast, in the absence of compact hydrophobic core, the C
-RMSD of nonproteins was expected to be higher. Thus, the C
-RMSD of a nonprotein was employed as the upper bound reference.
| Results |
|---|
|
|
|---|
Protein G B1 domain
Protein G B1 domain consists of 56 residues with two building blocks (Fig. 2A
). Building block-I (BB-I) has 38 residues (residues 239) and building block-II (BB-II) has 20 residues (residues 3756). Residue 1 in the N terminus is unassigned and is kept in the engineered protein. For convenience, the three overlapped residues (3739) between building blocks-I and -II are assigned to BB-II only. Therefore, the adjusted BB-I has 35 residues (from 2 to 36) and BB-II has its original 20 residues (from 37 to 56). The sequence of native-2gb1 is shown in Table 1
.
|
|
|
|
Ubiquitin
Ubiquitin has 76 residues with three building blocks, BB-I (residues 121), BB-II (residues 2141), and BB-III (residues 4268). For simplicity, the overlapped residue 21 is assigned to BB-II. Residues 6976 in the C terminus are unassigned. BB-I is a
-hairpin; BB-II is an
-helixrich fragment; BB-III is a large loop. The structure of native ubiquitin (nat-1ubq) along with its three building block sources is shown in Figure 4
.
|
|
|
|
| Discussion |
|---|
|
|
|---|
-RMSDs versus their energy minimized structures.
The C
-RMSDs of nat-2gb1, eng-2gb1, and non-2gb1 in 8.0-nsec explicit water MD simulations are shown in Figure 6
. The C
-RMSD of nat-2gb1 fluctuates around 1.0 Å during the entire course of the simulation (Fig. 6A
). In contrast, the C
-RMSD of non-2gb1 with an inverted hydrophobic core increases with the simulation time indicating that its energy-minimized structure cannot be maintained. For the engineered protein (eng-2gb1), its structure fluctuates around its energy-minimized structure (with a compact core) with a C
-RMSD of ~2.5 Å during the simulation. As expected, the C
-RMSD of the engineered protein (eng-2gb1) locates between the low bound C
-RMSD of the native protein (nat-2gb1) and the upper bound C
-RMSD of the nonprotein (non-2gb1), suggesting that the engineered protein is potentially stable in vitro. Figure 6B
shows the averaged C
-RMSD of nat-2gb1, eng-2gb1, and non-2gb1 as a function of their residue position. Again, the C
-RMSD of eng-2gb1 lies between those of nat-2gb1 and non-2gb1. To further analyze the stabilities of each building block in individual proteins, their C
-RMSDs as a function of time are calculated (Fig. 6C,D
). The C
-RMSDs of building block-I of nonprotein increases with simulation time, whereas the others are stable. Surprisingly, building block-II of nonprotein is also stable in the simulation, indicating that this fragment can be a stand-alone building block, and the mutual stabilization from other fragments may not be important.
|
-RMSDs of nat-1ubq, eng-1ubq, and non-1ubq in 9-nsec explicit water MD simulations. Similar to the behavior of eng-2gb1, the C
-RMSDs of eng-1ubq lies between the non-1ubq and nat-1ubq. Nevertheless, its C
-RMSD is only slightly lower than that of non-1ubq, indicating that the engineered protein cannot be very stable. To further investigate why the eng-1ubq is not very stable, the C
-RMSDs of each building block as a function of time were calculated. The C
-RMSDs of the whole proteins as a function of the residue number were also computed. The results are as expected: The C
-RMSD of building block-II of eng-1ubq is stable, nearly overlapping that of nat-1ubq. The C
-RMSD of building block-I of eng-1ubq fluctuates, but with relatively low magnitude. In contrast, the C
-RMSD of building block-III of eng-1ubq increases rapidly with simulation time. The BB-III large loop is much more flexible than the helical and
-stranded structures. In addition, few qualified candidates can be found for this loop building block (see Fig. 5
|
Conclusions and future work
In this study, a de novo computational algorithm is proposed to engineer proteins in terms of protein building blocks. This approach is similar to combinatorial experiments, where protein building blocks are used as "shuffling domains." Here, BBs are defined as fragments that form local minima along the polypeptide chain. As such, they have relatively high population times. Because protein building blocks are conformationally independent entities (Haspel et al. 2003a,b), we test the feasibility of partitioning proteins into building blocks and exchanging between BBs with similar conformations and hydrophobic/hydrophilic patterns taken from different proteins. This approach is similar to combinatorial experiments, where protein building blocks are used as "shuffling domains." The sequence identities of the selected fragments are chosen to be as low as possible (<25%) to avoid a homology bias. Based on these criteria, a new protein can be assembled with a similar fold and low sequence identity compared to the selected native protein. Two proteins (protein G B1 domain and ubiquitin) are selected to illustrate this engineering algorithm. The stability of engineered proteins is tested by simulations. The MD simulations show that the fold of one engineered protein (protein G B1 domain denoted as eng-2gb1) is kept during the 8-nsec explicit water simulations. The RMSD of eng-2gb1 is in between the lower bound RMSD of the native protein and the upper bound RMSD of the "nonprotein" (with inverted hydrophobic core). However, the newly engineered ubiquitin is much less stable because BB-III contains a large flexible loop. Our searches of the PDB found only a few candidate large loop BBs with a similar static conformations. Because a crystal structure is an average structure and large loops are particularly flexible, it is quite possible that the structure we have captured by picking the crystal coordinates does not represent the optimal conformation for this building block. We conclude that in a fragment-based engineering strategy, engineering large loops is very challenging. Overall, our study suggests that it is potentially feasible to engineer proteins in terms of protein building blocks.
Here, we have demonstrated that proteins can be engineered in terms of protein building blocks in silico. The next essential step is to experimentally synthesize the engineered proteins and validate their stability by in vitro experiments. The scoring function used to select the candidate fragments should also be improved to enhance the stability of engineered proteins. For example, deletion and insertion of residues might be included in the candidate fragment search in an attempt to find additional, possibly better candidates. The volume of amino acids may further be considered in the matching, even though the volume of amino acids has been implemented in the scoring function by hydrophobicity scale difference. Moreover, the packing of hydrophobic core may be further optimized (Lazar and Handel 1998; Malakauskas and Mayo 1998). Computationally engineered proteins and their experimental stability tests should best be performed iteratively.
| Footnotes |
|---|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Baldwin, R.L. and Rose, G.D. 1999a. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24: 2633.[CrossRef][Medline]
. 1999b. Is protein folding hierarchic? II. Folding intermediates and transition states. Trends Biochem. Sci. 24: 7783.[CrossRef][Medline]
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S., and Karplus, M. 1983. CharmmA program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4: 187217.
Brooks, C.L., Gruebele, M., Onuchic, J.N., and Wolynes, P.G. 1998. Chemical physics of protein folding. Proc. Natl. Acad. Sci. 95: 1103711038.
Calhoun, J.R., Kono, H., Lahr, S., Wang, W., DeGrado, W.F., and Saven, J.G. 2003. Computational design and characterization of a monomeric helical dinuclear metalloprotein. J. Mol. Biol. 334: 11011115.[CrossRef][Medline]
Cammett, T.J., Luo, L., and Peng, Z.Y. 2003. Design and characterization of a hyperstable p16(INK4a) that restores Cdk4 binding activity when combined with oncogenic mutations. J. Mol. Biol. 327: 285297.[CrossRef][Medline]
Chu, R., Takei, J., Knowlton, J.R., Andrykovitch, M., Pei, W.H., Kajava, A.V., Steinbach, P.J., Ji, X.H., and Bai, Y.W. 2002. Redesign of a four-helix bundle protein by phage display coupled with proteolysis and structural characterization by NMR and X-ray crystallography. J. Mol. Biol. 323: 253262.[CrossRef][Medline]
Crameri, A., Raillard, S.A., Bermudez, E., and Stemmer, W.P.C. 1998. DNA shuffling of a family of genes from diverse species accelerates directed evolution. Nature 391: 288291.[CrossRef][Medline]
Dahiyat, B.I. and Mayo, S.L. 1997. De novo protein design: Fully automated sequence selection. Science 278: 8287.
Dahiyat, B.I., Sarisky, C.A., and Mayo, S.L. 1997. De novo protein design: Towards fully automated sequence selection. J. Mol. Biol. 273: 789796.[CrossRef][Medline]
Dantas, G., Kuhlman, B., Callender, D., Wong, M., and Baker, D. 2003. A large scale test of computational protein design: Folding and stability of nine completely redesigned globular proteins. J. Mol. Biol. 332: 449460.[CrossRef][Medline]
Dill, K.A. and Chan, H.S. 1997. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4: 1019.[CrossRef][Medline]
Dobson, C.M., Sali, A., and Karplus, M. 1998. Protein folding: A perspective from theory and experiment. Angew. Chem. (Intl. Ed.) 37: 868893.[CrossRef]
Fauchere, J.L. and Pliska, V. 1983. Hydrophobic parameters-Pi of amino-acid side-chains from the partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem. 18: 369375.
Fontana, A., Polverino deLaureto, P., DeFilippis, V., Scaramella, E., and Zam-bonin, M. 1997. Probing the partly folded states of proteins by limited proteolysis. Fold. Des. 2: R17R26.[CrossRef][Medline]
. 1999. Limited proteolysis in the study of protein conformation. In Proteolytic enzymes: Tools and targets (eds. E.E. Sterchi and W. Stocker), pp. 257284. Springer Verlag, Heidelberg, Germany.
Haspel, N., Tsai, C.J., Wolfson, H., and Nussinov, R. 2003a. Hierarchical protein folding pathways: A computational study of protein fragments. Proteins 51: 203215.[CrossRef][Medline]
. 2003b. Reducing the computational complexity of protein folding via fragment folding and assembly. Protein Sci. 12: 11771187.
Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., and Klein, M.L. 1983. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79: 926935.[CrossRef]
Lazar, G.A. and Handel, T.M. 1998. Hydrophobic core packing and protein design. Curr. Opin. Chem. Biol. 2: 675679.[CrossRef][Medline]
Lesk, A.M. and Rose, G.D. 1981. Folding units in globularproteins. Proc. Natl. Acad. Sci. 78: 43044308.
Levinthal, C. 1968. Are there pathways for protein folding? J. Chim. Phys. 65: 4445.[CrossRef]
MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., Fischer, S., Gao, J., Guo, H., Ha, S., et al. 1998. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 102: 35863616.[CrossRef]
Malakauskas, S.M. and Mayo, S.L. 1998. Design, structure and stability of a hyperthermophilic protein variant. Nat. Struct. Biol. 5: 470475.[CrossRef][Medline]
Meyer, M.M., Silberg, J.J., Voigt, C.A., Endelman, J.B., Mayo, S.L., Wang, Z.G., and Arnold, F.H. 2003. Library analysis of SCHEMA-guided protein recombination. Protein Sci. 12: 16861693.
Onuchic, J.N., LutheySchulten, Z., and Wolynes, P.G. 1997. Theory of protein folding: The energy landscape perspective. Annu. Rev. Phys. Chem. 48: 545600.[CrossRef][Medline]
Riechmann, L. and Winter, G. 2000. Novel folded protein domains generated by combinatorial shuffling of polypeptide segments. Proc. Natl. Acad. Sci. 97: 1006810073.
Slovic, A.M., Kono, H., Lear, J.D., Saven, J.G., and DeGrado, W.F. 2004. Computational design of water-soluble analogues of the potassium channel KcsA. Proc. Natl. Acad. Sci. 101: 18281833.
Steinbach, P.J. and Brooks, B.R. 1994. New spherical-cutoff methods for long-range forces in macromolecular simulation. J. Comput. Chem. 15: 667683.[CrossRef]
Tsai, C.J. and Nussinov, R. 2001a. The building block folding model and the kinetics of protein folding. Protein Eng. 14: 723733.
. 2001b. Transient, highly populated, building blocks folding model. Cell Biochem. Biophys. 34: 209235.[Medline]
Tsai, C.J., Maizel, J.V., and Nussinov, R. 2000. Anatomy of protein structures: Visualizing how a one-dimensional protein chain folds into a three-dimensional shape. Proc. Natl. Acad. Sci. 97: 1203812043.
Tsai, C.J., de Laureto, P.P., Fontana, A., and Nussinov, R. 2002. Comparison of protein fragments identified by limited proteolysis and by computational cutting of proteins. Protein Sci. 11: 17531770.
Voigt, C.A., Martinez, C., Wang, Z.G., Mayo, S.L., and Arnold, F.H. 2002. Protein building blocks preserved by recombination. Nat. Struct. Biol. 9: 553558.[Medline]
Wolynes, P.G., Onuchic, J.N., and Thirumalai, D. 1995. Navigating the folding routes. Science 267: 16191620.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
G. Wainreb, N. Haspel, H. J. Wolfson, and R. Nussinov A permissive secondary structure-guided superposition tool for clustering of protein fragments toward protein structure prediction via fragment assembly Bioinformatics, June 1, 2006; 22(11): 1343 - 1352. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||