|
|
||||||||
1 Chemistry Department and Biophysics Program, Stanford University, Stanford, California 94305, USA
2 Biochemical Sciences Program, Harvard College, Cambridge, Massachusetts 02138, USA
3 Xencor, Inc., Monrovia, California 91016, USA
Reprint requests to: Vijay S. Pande, Chemistry Department, Stanford University, Stanford, CA 94305, USA; e-mail: pande{at}stanford.edu; fax: (650) 723-4817.
(RECEIVED February 5, 2002; FINAL REVISION August 16, 2002; ACCEPTED September 4, 2002)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0203902.
| Abstract |
|---|
|
|
|---|
Keywords: Protein design; sequence space; designability; backbone flexibility; distributed computing
Abbreviations: RMSD, root-mean-square deviation PDB, Protein Data Bank
| Introduction |
|---|
|
|
|---|
An important practical use of protein design is in the stabilization of known protein folds (Dahiyat 1999). The optimization schemes used in most protein design algorithms are written to find local or globally optimized sequences, with the lowest or near-lowest free energy of folding for an existing target structure; much recent work has addressed this topic (Desjarlais and Clarke 1998; Shakhnovich 1998; Koehl and Levitt 1999a; Voigt et al. 2000; Wernisch et al. 2000; Pokala and Handel 2001). Finding sequences that will form a given structure often results in sequences with increased stability over the wild type (Malakauskas and Mayo 1998). An exciting potential direction for protein design lies in creating totally novel protein structures. The successful design of a family of right-handed coiled coils demonstrated the capability of computational protein design to create novel structures, and highlighted the importance of allowing for backbone flexibility in the design process (Harbury et al. 1998).
Experimental techniques for protein design have also enjoyed much success in the last decade. Rational design, based on structural analysis and site-directed mutagenesis, has been used extensively in redesigning enzymes for increased stability and/or altered function (Cedrone et al. 2000; Kazlauskas 2000). The most successful experimental methods for protein design involve directed protein evolution, using genetic recombination of natural diversity and in vitro functional assays to explore sequence space (Tobin et al. 2000; Bornscheuer and Pohl 2001). Directed protein evolution generates a diversity of functional sequences through iterations of mutation and recombination, allowing the exploration of areas of sequence space that are not accessible using rational design or random mutagenesis techniques. However, because current methods for in vitro protein evolution are limited to searching spaces in the range of 103106 sequences, computational techniques for reducing the search space for experimental protein design are of great importance and current relevance (Kono and Saven 2001; Voigt et al. 2001). By computationally designing large libraries of viable sequences, favorable and unfavorable regions of sequence space could be identified and combinatorial libraries could be greatly constrained by tailoring the range of diversity allowed at each position of the protein.
Most computational studies to date have produced designed sequences that tend to resemble the native sequence of the protein structure (Koehl and Levitt 1999b , 2002a; Kuhlman and Baker 2000; Raha et al. 2000). This result has generally been attributed to the constraints imposed by using fixed backbones. Backbone flexibility in the target structure is desirable when computationally designing amino acid sequences, because it is well known that natural proteins use small backbone adjustments to accommodate disruptive mutations (Eriksson et al. 1992; Baldwin et al. 1993). Indeed, when designing sequences to a structure, one does not expect these sequences to fold to exactly the target structure with zero deviation, but rather some ensemble of highly similar structures. Incorporating backbone flexibility into computational protein design more realistically models real proteins, and is a critical prerequisite for de novo protein design, where the exact structure of the resulting protein cannot be known (Desjarlais and Handel 1999).
Some recent studies have described methods incorporating some form of backbone flexibility, with excellent success in designing sequences that stably fold to the target structure (Su and Mayo 1997; Harbury et al. 1998; Desjarlais and Handel 1999). However, due to the extreme computational demands of including backbone flexibility in the design process, previous work has been limited to coarse-grained variation of backbone structure parameters (e.g., relative arrangement of secondary or supersecondary structure element; Su and Mayo 1997; Harbury et al. 1998; or designing only a subset of residues in the target protein; Desjarlais and Handel 1999). In all cases, only a small number of minimum-energy sequences for several proteins of interest were identified. Some recent work of note (Zou and Saven 2000; Kono and Saven 2001) has developed a generally applicable statistical theory for exploring protein sequence space, analogous to other mean-field methods used in protein design (Koehl and Delarue 1994; Lee 1994; Koehl and Levitt 1999a), which does not require the explicit articulation of minimum-energy sequences. Instead, this approach estimates amino acid probabilities at each residue position, which are energetically consistent with a given protein structure. In designing sequence profiles for protein L, backbone flexibility was incorporated by considering those sequence properties that were robust with respect to 21 backbone variants in an NMR ensemble (Kono and Saven 2001).
In this study, using a distributed computing network (Shirts and Pande 2000) of over 3000 processors has allowed us to design hundreds of minimum-energy sequences per structure, with the incorporation of fine-grain backbone variability, for the set of all protein structures in the Protein Data Bank (Berman et al. 2000) of length less than 100 residues, solved by X-ray crystallography: 253 structures in total. Designing to an ensemble of slight structural variants of the target structure produces a large diversity of high-quality sequences, allowing for the exploration of a much broader range of sequence space than previous studies, and leading to novel insights into the determinants of protein sequence space.
| Results |
|---|
|
|
|---|
3000 active processors over the 62-day course of data collection), almost 200,000 distinct sequences were returned. These overall figures agree well with tests that show a protein of 100 amino acids requiring roughly 24 h for completion of one full sequence design on a 500-MHz Celeron workstation.
|
RMSD, specifically, is used throughout this study) of each variant is no more than 1 Å from the native target structure. For example, Figure 1
|
Increased sequence diversity with structural ensembles
To assess the amount of diversity generated by our method, the entropy of the designed sequences for each structure was calculated (Shenkin et al. 1991). Figure 2a
displays the distribution of residue entropies for each position in the total set of 253 structures. The residue entropies range from 1.0 to 14.4, with a mean of 6.6. As a control, between 70 and 100 sequences were designed for the fixed native backbone of each of the 253 target structures (i.e., no structural ensembles were used). The residue entropies of these sequences range from 1.0 to 3.3, with a mean of 2.4.
|
Figure 2a
shows that the residue entropy distribution of the sterically allowed set of rotamers is more sharply peaked and shifted higher than the residue entropy distribution of the final designed sequences. This shows the effects of the other terms in the energy function, such as hydrogen bonding and solvation, in constraining the sequence space of a protein structure. Figure 2b
plots the distribution of sequence entropies (i.e., mean residue entropy over an entire sequence) for the 253 structures. The sequence entropies for the designed sequences have a more sharply peaked distribution than the overall pool of designed residue entropies, and the separation between the designed and sterically allowed distributions is even greater than in the case of residue entropy.
Previous studies have reported designed sequences retaining a high degree of similarity with the native sequence of the target structure (Koehl and Levitt 1999b; Kuhlman and Baker 2000; Raha et al. 2000). In a study on a set of 108 proteins, Kuhlman and Baker found that 51% of the core residues in designed sequences and 27% of all residues matched those found in the native sequence of the target structure. Koehl and Levitt found a 36% average identity to the native sequence over ten independent designs of 1ctf, but only a 16% average identity to native in 13 designed TIM sequences. In a study using a very slightly modified version of the design algorithm used here, Desjarlais and colleagues (Raha et al. 2000) found a 24%28% identity to the native target structure. The results of applying our method to single, fixed, native backbones agree well with results such as these. When only the native fixed backbone is used for design (as described above), average identity to the native sequence of the target structure ranges from 1% to 40%, with a mean of 24% (Fig. 3
). For buried positions, this value ranges from 0% to 75%, with an average of 43%. These distributions, both in mean and range, are strikingly similar to those produced by Kuhlman and Baker. When structural ensembles of 100 structural variants are used as design targets, the average identity of the resulting sequences to the native sequence drops to 17%, and the average pairwise identity of the sequences is 29%. The distributions of identity to the native sequence for both full sequences and core positions alone also narrow dramatically when structural ensembles are used. This suggests that the inclusion of backbone flexibility, even in the fairly simple manner used here, allows for the design of a much greater diversity of sequences compatible with the target structure.
|
|
|
|
|
All six groups of structures, defined initially by structural similarity, corresponded to PFAM families of natural sequences (Bateman et al. 2000), which are defined solely by sequence similarity. The full alignment of natural sequences for each fold was obtained from PFAM. To reduce the inherent biases in natural sequence alignments, the alignments were reduced to 90% sequence redundancy, and were weighted according to the Henikoff algorithm (Henikoff and Henikoff 1994). These measures are critical in compensating for the artefactually low diversity of natural sequence alignments arising from the evolutionary relatedness of natural sequences (Larson et al. 2000). This weighting is unnecessary for designed sequences because each is completely independent of the others; the sampling of sequence space is not biased by an evolutionary constraints. Summary statistics for the designed and natural sequence sets for each of the six folds are tabulated in Table 2
. In all cases, the designed sequence sets had greater overall sequence entropy than the natural sequence alignments. Surprisingly, there seems to be no correlation between the diversity (as measured by sequence entropy) of natural sequence alignments and the diversity of corresponding sets of designed sequences, perhaps stemming from the aforementioned sampling biases of natural sequence diversity.
| Discussion |
|---|
|
|
|---|
The use of protein folds, as opposed to individual structures, as landmarks in sequence space facilitates meaningful comparisons between experimental or computational explorations of sequence space and those regions of sequence space known to be inhabited by natural protein sequences. As computational protein design has become more tractable, a number of recent studies have sought to compare sets of designed sequences to their natural counterparts, by looking at the identity of designed sequences to the native sequence of the target structure. Instead of comparing designed sequences to the native sequence alone, it is more meaningful to make comparisons against the natural sequence alignment of structural homologs (see, e.g., Koehl and Levitt 2002b). Natural sequence alignments are a reliable, albeit small, sample of sequence space, to which we can compare larger computationally predicted samples of the same sequence space. By broadening the boundaries of sequence space to encompass larger ensembles of similar structures, meaningful comparisons to natural sequences and structures can be made, while taking into account the known plasticity of proteins.
Backbone flexibility in protein design
Incorporating backbone flexibility is of general importance to computational protein design, and is certainly a prerequisite for de novo structure design, where the exact structure of the target is not known. Although computational protein design does not seek to directly simulate a physical process, it is highly desirable to build the realistic behavior of proteins (i.e., backbone relaxation to accommodate mutations) into design algorithms. Previous studies incorporating backbone flexibility, although quite successful, have been hindered by the increased computational complexity of annealing in conformation space on top of annealing in sequence space (Su and Mayo 1997; Harbury et al. 1998; Desjarlais and Handel 1999). By utilizing a distributed computing architecture, we have been able to incorporate fine-grained backbone flexibility in a large-scale protein design effort.
Designing to a structural ensemble is a fairly simple way of incorporating backbone flexibility, but we see that it allows for a much broader search of sequence space than fixed-backbone methods. Designing to a single, fixed backbone produces results very similar to other recently published studies. Designing to a structural ensemble, however, produces a much greater diversity of sequences, and allows movement away from the region of sequence space immediately surrounding the native sequence. Homology searches against natural sequence databases (a method used by a number of recent studies to confirm relevance of their designed sequences) show that the quality of these sequences is not diminished. In fact, the increased diversity of the sequence set improves the utility of designed sequence libraries in fold recognition for structural and functional genomics (S.M. Larson, A. Garg, J.R. Desjarlais, V.S. Pande, in prep.).
Designability
The concept of designability (Li et al. 1996, 1998; Helling et al. 2001) has been proposed as an explanation for the oft-noted observation that certain protein structures or folds are more commonly seen in nature than others (Chothia 1992; Orengo et al. 1994; Murzin et al. 1995; Brenner et al. 1997). Designability is defined simply as the number of sequences that can fold into a specific structure. Numerous theoretical studies have investigated this property through complete enumeration of sequences and structures of lattice (Buchler and Goldstein 1999, and references therein) and off-lattice models (Miller et al. 2002). In this study, we can estimate designability of real protein structures by comparing the sequence entropies of large sets of diverse designed sequences for different folds (see Fig. 6
; Table 2
). Recall that the entropies of designed sequences for structures within a fold cluster together tightly. The range of allowed amino acids at any one position varies from structure to structure within a fold, but the overall diversity of the allowed sequence space seems to be defined by the structural properties of the fold. The relatively tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggests (a) that the diversity of the designed sequences for a structure is primarily determined by some structural characteristics of its overall fold, and (b) that the designability principle postulated from studies of simple models may hold in real proteins.
The results of this study are, of course, based on our particular model of the protein sequencestructure relationship, and it would be of great interest to see how the results of other theoretical and/or experimental protein design studies of a similar scale might compare. Most importantly, further theoretical and experimental work is needed to identify the specific structural characteristics that determine a folds sequence space.
| Materials and methods |
|---|
|
|
|---|
Protein sequence design
Sequences were designed using SPA (Raha et al. 2000). Briefly, protein structures are created by modeling the placement of amino acid side-chain rotamers onto a fixed target backbone. Models are scored using a combination of the Amber potential function (Weiner et al. 1984) with OPLS nonbonded parameters (Jorgensen and Tirado-Rives 1988), a surface-area term that accounts implicitly for solvation effects (Eisenberg and McLachlan 1986), and a set of amino acid baseline corrections, which are critical for maintaining reasonable amino acid compositions. The models are optimized by a sequence selection process that involves initial filtering of rotamers, and a genetic algorithm for finding an optimal sequence for the target structure. A diversity of sequences can be designed for the same target backbone, as the initial population of 300 models is randomly assigned from a filtered rotamer library, analogous to starting in a random point of sequence space. Two hundred rounds of model building and evaluation, selective recombination, and a small amount of random mutagenesis are performed, and the entire cycle is repeated 30 times.
To create an ensemble of 100 target backbones for each structure, a Monte Carlo expansion and contraction algorithm was used to gently perturb the dihedral angles of the target backbone. The algorithm works by creating random perturbations of up to 5 degrees to the dihedral angles of the target structure, followed by simple Monte Carlo with smaller random perturbations until the target RMSD from the native structure is reached. In this study, the perturbation was constrained such that no two backbones in the ensemble differ by more than 1.0 Å RMSD. Studying an ensemble of such slightly varying structures is justified by the fact that structure determination techniques, NMR and X-ray crystallography, are generally accurate to about the 1.0-Å level. Each work unit of sequence design is done against a fixed backbone (i.e., one of the 100 variants of the target structure), and the designed sequences for all 100 variants are included in the resulting overall sequence set for the target structure.
Structure and sequence analyses
The set of protein structures used for this study consisted of all records in the Protein Data Bank (Berman et al. 2000) that contained only one chain, less than 100 amino acids long, solved by X-ray crystallography; a total of 292 structures. A sufficient amount of data was returned to complete the described analyses for 253 of these structures. The complete set of designed sequences for each structure can be obtained at http://gah.stanford.edu/cgi-bin/results/SqlCgi.pl.
Residue entropy was calculated according to the standard formulation:
![]() |
Structures were grouped into folds using VAST (Madej et al. 1995); all 253 structures were clustered into their assigned structural groupings from MMDB (Wang et al. 2000). Natural sequence alignments corresponding to the VAST structural groupings were obtained from PFAM (Bateman et al. 2000). To reduce sequence bias and increase the relative diversity of the natural sequence sets, the alignments were reduced to 90% redundancy and weighted according to the Henikoff algorithm (Henikoff and Henikoff 1994).
-Helix and ß-sheet character for each structure was defined as the fraction of residues assigned to the corresponding secondary structure by DSSP (Kabsch and Sander 1983). DSSP was also used to automate the identification of buried residues (i.e., less than 10% exposed side-chain surface area). Contact order was calculated as described by Plaxco and colleagues (1998).
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Baldwin, E.P., Hajiseyedjavadi, O., Baase, W.A., and Matthews, B.W. 1993. The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. Science 262: 17151718.
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., and Sonnhammer, E.L. 2000. The Pfam protein families database. Nucleic Acids Res. 28: 263266.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Bornscheuer, U.T. and Pohl, M. 2001. Improved biocatalysts by directed evolution and rational protein design. Curr. Opin. Chem. Biol. 5: 137143.[CrossRef][Medline]
Brenner, S.E., Chothia, C., and Hubbard, T.J. 1997. Population statistics of protein structures: Lessons from structural classifications. Curr. Opin. Struct. Biol. 7: 369376.[CrossRef][Medline]
Buchler, N.E. and Goldstein, R.A. 1999. Effect of alphabet size and foldability requirements on protein structure designability. Proteins 34: 113124.[CrossRef][Medline]
Cedrone, F., Menez, A., and Quemeneur, E. 2000. Tailoring new enzyme functions by rational redesign. Curr. Opin. Struct. Biol. 10: 405410.[CrossRef][Medline]
Chothia, C. 1992. Proteins. One thousand families for the molecular biologist. Nature 357: 543544.[CrossRef][Medline]
Dahiyat, B.I. 1999. In silico design for protein stabilization. Curr. Opin. Biotechnol. 10: 387390.[CrossRef][Medline]
Desjarlais, J.R. and Clarke, N.D. 1998. Computer search algorithms in protein modification and design. Curr. Opin. Struct. Biol. 8: 471475.[CrossRef][Medline]
Desjarlais, J.R. and Handel, T.M. 1995. De novo design of the hydrophobic cores of proteins. Protein Sci. 4: 20062018.[Abstract]
1999. Side-chain and backbone flexibility in protein core design. J. Mol. Biol. 290: 305318.[CrossRef][Medline]
Eisenberg, D. and McLachlan, A.D. 1986. Solvation energy in protein folding and binding. Nature 319: 199203.[CrossRef][Medline]
Eriksson, A.E., Baase, W.A., Zhang, X.J., Heinz, D.W., Blaber, M., Baldwin, E.P., and Matthews, B.W. 1992. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255: 178183.
Harbury, P.B., Plecs, J.J., Tidor, B., Alber, T., and Kim, P.S. 1998. High-resolution protein design with backbone freedom. Science 282: 14621467.
Helling, R., Li, H., Melin, R., Miller, J., Wingreen, N., Zeng, C., and Tang, C. 2001. The designability of protein structures. J. Mol. Graph. Model 19: 157167.[CrossRef][Medline]
Henikoff, S. and Henikoff, J.G. 1994. Protein family classification based on searching a database of blocks. Genomics 19: 97107.[CrossRef][Medline]
Jorgensen, W.L. and Tirado-Rives, J. 1988. The OPLS potential functions for proteins. Energy minimizations for crystals of cyclic peptides and crambin. J. Am. Chem. Soc. 110: 16571666.[CrossRef]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Kazlauskas, R.J. 2000. Molecular modeling and biocatalysis: Explanations, predictions, limitations, and opportunities. Curr. Opin. Chem. Biol. 4: 8188.[CrossRef][Medline]
Koehl, P. and Delarue, M. 1994. Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. J. Mol. Biol. 239: 249275.[CrossRef][Medline]
Koehl, P. and Levitt, M. 1999a. De novo protein design. I. In search of stability and specificity. J. Mol. Biol. 293: 11611181.[CrossRef][Medline]
. 1999b. De novo protein design. II. Plasticity in sequence space. J. Mol. Biol. 293: 11831193.[CrossRef][Medline]
. 2002a. Improved recognition of native-like protein structures using a family of designed sequences. Proc. Natl. Acad. Sci. 99: 691696.
. 2002b. Protein topology and stability define the space of allowed sequences. Proc. Natl. Acad. Sci. 99: 12801285.
Kono, H. and Saven, J.G. 2001. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure. J. Mol. Biol. 306: 607628.[CrossRef][Medline]
Kuhlman, B. and Baker, D. 2000. Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. 97: 1038310388.
Larson, S.M., Di Nardo, A.A., and Davidson, A.R. 2000. Analysis of covariation in an SH3 domain sequence alignment: Applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J. Mol. Biol. 303: 433446.[CrossRef][Medline]
Lee, C. 1994. Predicting protein mutant energetics by self-consistent ensemble optimization. J. Mol. Biol. 236: 918939.[CrossRef][Medline]
Li, H., Helling, R., Tang, C., and Wingreen, N. 1996. Emergence of preferred structures in a simple model of protein folding. Science 273: 666669.[Abstract]
Li, H., Tang, C., and Wingreen, N.S. 1998. Are protein folds atypical? Proc. Natl. Acad. Sci. 95: 49874990.
Madej, T., Gibrat, J.F., and Bryant, S.H. 1995. Threading a database of protein cores. Proteins 23: 356369.[CrossRef][Medline]
Malakauskas, S.M. and Mayo, S.L. 1998. Design, structure and stability of a hyperthermophilic protein variant. Nat. Struct. Biol. 5: 470475.[CrossRef][Medline]
Miller, J., Zeng, C., Wingreen, N.S., and Tang, C. 2002. Emergence of highly designable proteinbackbone conformations in an off-lattice model. Proteins. 47: 506512.[CrossRef][Medline]
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Musacchio, A., Saraste, M., and Wilmanns, M. 1994. High-resolution crystal structures of tyrosine kinase SH3 domains complexed with proline-rich peptides. Nat. Struct. Biol. 1: 546551.[CrossRef][Medline]
Orengo, C.A., Jones, D.T., and Thornton, J.M. 1994. Protein superfamilies and domain superfolds. Nature 372: 631634.[CrossRef][Medline]
Pabo, C. 1983. Molecular technology. Designing proteins and peptides. Nature 301: 200.[CrossRef][Medline]
Pande, V.S., Grosberg, A.Y., and Tanaka, T. 1997. Statistical mechanics of simple models of protein folding and design. Biophys J. 73: 31923210.
Plaxco, K.W., Simons, K.T., and Baker, D. 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J. Mol. Biol. 277: 985994.[CrossRef][Medline]
Pokala, N. and Handel, T.M. 2001. Review: Protein designWhere we were, where we are, where we're going. J. Struct. Biol. 134: 269281.[CrossRef][Medline]
Raha, K., Wollacott, A.M., Italia, M.J., and Desjarlais, J.R. 2000. Prediction of amino acid sequence from structure. Protein Sci. 9: 11061119.[Abstract]
Shakhnovich, E.I. 1998. Protein design: A perspective from simple tractable models. Fold. Des. 3: R45R58.[CrossRef][Medline]
Shenkin, P.S., Erman, B., and Mastrandrea, L.D. 1991. Information-theoretical entropy as a measure of sequence variability. Proteins 11: 297313.[CrossRef][Medline]
Shirts, M. and Pande, V.S. 2000. COMPUTING: Screen savers of the world unite! Science 290: 19031904.
Su, A. and Mayo, S.L. 1997. Coupling backbone flexibility and amino acid sequence selection in protein design. Protein Sci. 6: 17011707.[Abstract]
Tobin, M.B., Gustafsson, C., and Huisman, G.W. 2000. Directed evolution: The "rational" basis for "irrational" design. Curr. Opin. Struct. Biol. 10: 421427.[CrossRef][Medline]
Voigt, C.A., Gordon, D.B., and Mayo, S.L. 2000. Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. J. Mol. Biol. 299: 789803.[CrossRef][Medline]
Voigt, C.A., Mayo, S.L., Arnold, F.H., and Wang, Z.G. 2001. Computational method to reduce the search space for directed protein evolution. Proc. Natl. Acad. Sci. 98: 37783783.
Wang, Y., Addess, K.J., Geer, L., Madej, T., Marchler-Bauer, A., Zimmerman, D., and Bryant, S.H. 2000. MMDB: 3D structure data in Entrez. Nucleic Acids Res. 28: 243245.
Weiner, S.J., Kollman, P.A., Case, D.A., Singh, U.C., Ghio, C., Alagona, G., Profeta, S., and Weiner, P. 1984. A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106: 765784.[CrossRef]
Wernisch, L., Hery S., and Wodak, S.J. 2000. Automatic protein design with all atom force-fields by exact and heuristic optimization. J. Mol. Biol. 301: 713736.[CrossRef][Medline]
Zou, J. and Saven, J.G. 2000. Statistical theory of combinatorial libraries of folding proteins: Energetic discrimination of a target structure. J. Mol. Biol. 296: 281294.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
K. A. Crowhurst and S. L. Mayo NMR-detected conformational exchange observed in a computationally designed variant of protein G{beta}1 Protein Eng. Des. Sel., September 1, 2008; 21(9): 577 - 587. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Fung, C. A. Floudas, M. S. Taylor, L. Zhang, and D. Morikis Toward Full-Sequence De Novo Protein Design with Flexible Templates for Human Beta-Defensin-2 Biophys. J., January 15, 2008; 94(2): 584 - 599. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Meyerguz, J. Kleinberg, and R. Elber From the Cover: The network of sequence flow between protein structures PNAS, July 10, 2007; 104(28): 11627 - 11632. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. P. Treynor, C. L. Vizcarra, D. Nedelcu, and S. L. Mayo Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function PNAS, January 2, 2007; 104(1): 48 - 53. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Reynolds, J. M. Thomson, K. D. Corbett, C. R. Bethel, J. M. Berger, J. F. Kirsch, R. A. Bonomo, and T. M. Handel Structural and Computational Characterization of the SHV-1 beta-Lactamase-beta-Lactamase Inhibitor Protein Interface J. Biol. Chem., September 8, 2006; 281(36): 26745 - 26753. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Bloom, D. A. Drummond, F. H. Arnold, and C. O. Wilke Structural Determinants of the Rate of Protein Evolution in Yeast Mol. Biol. Evol., September 1, 2006; 23(9): 1751 - 1761. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Saraf, G. L. Moore, N. M. Goodey, V. Y. Cao, S. J. Benkovic, and C. D. Maranas IPRO: An Iterative Computational Protein Library Redesign and Optimization Procedure Biophys. J., June 1, 2006; 90(11): 4167 - 4180. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Chakrabarti, A. M. Klibanov, and R. A. Friesner Sequence optimization and designability of enzyme active sites PNAS, August 23, 2005; 102(34): 12035 - 12040. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liao, W. Yeh, D. Chiang, R.L. Jernigan, and B. Lustig Protein sequence entropy is closely related to packing density and hydrophobicity Protein Eng. Des. Sel., February 1, 2005; 18(2): 59 - 64. [Abstract] [Full Text] [PDF] |