|
|
||||||||
1 T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
2 Laboratory of Chemical Physics, NIDDK, National Institutes of Health, Bethesda, Maryland 20892, USA
(RECEIVED May 6, 2007; FINAL REVISION May 14, 2007; ACCEPTED May 15, 2007)
| Abstract |
|---|
|
|
|---|
Keywords: protein structure/folding; NMR spectroscopy; computational analysis
| Introduction |
|---|
|
|
|---|
Approximate backbone torsion angles are the only experimental data required by these previous exercises (Gong and Rose 2005; Gong et al. 2005; Fleming et al. 2006), and distributions of backbone torsions can be obtained readily from NMR chemical shifts (Cornilescu et al. 1999). Combining these approaches, we present a new, three-stage algorithm to build native topology from chemical shifts alone.
In the first stage, experimentally determined chemical shifts are used to search the protein database (Berman et al. 2000) for backbone fragments with similar chemical shifts. In the second stage, these backbone fragments are "stitched" together in a self-consistent manner, using Monte Carlo simulation (Gong et al. 2005). In the third stage, side chains are added to the lowest energy backbone conformation, using a side-chain rotamer library (Canutescu et al. 2003) followed by conjugate gradient energy minimization (Phillips et al. 2005). Results are presented for calbindin, CspA, GB3, ubiquitin, and DinI, five proteins with uncomplicated topologies (Table 1). This three-stage algorithm, which was motivated by ideas about the mechanism of protein folding (Rose et al. 2006), holds promise for further development as a high-throughput NMR structure determination.
|
| Results |
|---|
|
|
|---|
|
|
Owing to the presence of occasional confounding outliers in Figure 2, the positive correlation between energy and backbone RMSD is not quite sufficient to identify the most stable conformation. Such outliers are an unavoidable consequence of the highly approximate energy function used here. However, this problem can be overcome using structural clustering to recognize and retain all large conformational clusters while eliminating any sparsely distributed, uncorrelated conformations.
The major conformational clusters for all proteins are listed in Table 1. In each case, the lowest energy structures and the native structure are subsumed within the largest cluster (Table 1).
For each protein, the representative structural cluster is selected by total simulation energy. Of note, the hydrogen bond potential alone is sufficient to discriminate among them, underscoring the guiding importance of hydrogen bond satisfaction in protein folding (Myers and Pace 1996; Fleming and Rose 2005; Street et al. 2006).
The selected clusters are shown in Table 1. All are native-like, with low mean backbone RMSD (<4.2 Å) from the native conformation. Within each, the single most stable conformation was chosen as the final structural model. For the five proteins, the average backbone RMSD from the native conformation is 3.6 Å. Stereoviews of these chosen models are shown superimposed on their native counterparts in Figure 3, visual corroboration that our protocol successfully identifies the native topology. These backbone models were then decorated with side chains, starting with a backbone rotamer library (Canutescu et al. 2003) followed by energy minimization (Phillips et al. 2005), resulting in an average, all-atom RMSD from the native conformation of 4.1 Å for the five proteins (Table 1).
|
| Discussion |
|---|
|
|
|---|
More recently, residual dipolar couplings (RDC) have been used to facilitate rapid structure determination by NMR. RDCs depend on the orientation of internuclear vectors with respect to a global axis system, and they can be measured in weakly aligned proteins dissolved in anisotropic media. The acquisition of RDCs requires comparatively little additional data collection time beyond backbone resonance assignments but provides considerable additional structural information (Prestegard et al. 2000; Tolman et al. 2001). Several structure determination methods based on RDC-restrained models have been developed (Delaglio et al. 2000; Andrec et al. 2001; Rohl and Baker 2002; Kontaxis et al. 2005; Mayer et al. 2006). However, methods that rely primarily on RDC restraints require multiple data sets collected in different media to overcome restraint degeneracy. Alternatively, chemical shifts and NOEs can be used to supplement RDC restraints for full structure determination (Mayer et al. 2006). Yet another approach involves post-processing database analysis to distinguish between true and false positives (Andrec et al. 2001).
In related work, Baker and colleagues extended their Rosetta programs to identify the native fold using residual dipolar couplings (RDC) together with chemical shifts (Rohl and Baker 2002). They conclude that addition of RDC data helps to filter out false positives in fragment selection. Our simulations, which rely solely on data from chemical shifts, seek to filter false positives in fragment assembly by using a simple potential function followed by clustering.
Previous work demonstrated the strong correlation between chemical shifts and protein backbone conformation (Cornilescu et al. 1999). Markley and coworkers developed PECAN (http://bija.nmrfam.wisc.edu/PECAN/), a program for conformational analysis from chemical shifts that emphasizes secondary structure prediction (Eghbalnia et al. 2005). Wishart and coworkers developed ShiftX and related programs (http://redpoll.pharmacy.ualberta.ca/shiftx/) that focus on the converse problem of predicting chemical shifts from known protein structure (Neal et al. 2003).
Extending these earlier ideas, the five examples presented here raise the possibility that chemical shifts may be sufficient to decipher native protein backbone conformation, at least for proteins with uncomplicated topologies. Our approach affords further possibilities as well. For example, recent work of Avbelj et al. (2004) has potential for use as a fragment filter by incorporating the under-realized relationship between chemical shifts and solvent accessibility. In sum, the algorithm described here represents a promising, open-ended approach to rapid and high-throughput protein structure determination by measuring backbone chemical shifts and running short simulations.
| Materials and Methods |
|---|
|
|
|---|
Stage I: Fragment library construction and fragment searching
A fragment library was constructed using 5665 protein chains from the PISCES server (Wang and Dunbrack Jr. 2003), all with sequence identity <40%, resolution <2.5 Å, and an R factor of 1.0 or better. Chains were split into consecutively overlapping six-residue fragments and chemical shifts of each fragment were calculated using the SPARTA program (Shen and Bax 2007), available from (http://spin.niddk.nih.gov/bax). Chemical shifts of target proteins were downloaded from the BioMagResBank (BMRB) server (http://www.bmrb.wisc.edu). Suitable candidates for fragment substitution were identified by comparing experimentally determined chemical shifts of a target protein against calculated chemical shifts of library fragments, after first eliminating any fragments from proteins having the same topology as the target protein. For every consecutive six-residue segment in the target protein, the 20 most similar fragments were selected from the library. Fragment similarity was scored based on both chemical shifts and primary structure.
Stage II: Fragment assembly Monte Carlo simulation
Simulations were initiated with the protein backbone in an extended conformation; all side-chain atoms beyond beta carbons were discarded. Standard van der Waals radii and hydrogen-bond criteria were used, as described in Gong et al. (2005). Similar to that earlier protocol (Gong et al. 2005), 50,000 cycles of Metropolis Monte Carlo simulation (Metropolis et al. 1953) were performed, preceded by 5000 relaxation cycles; each cycle consisted of n – 5 steps for a chain of length n. At each step, a randomly chosen six-residue segment of the target peptide was replaced by a randomly chosen library fragment from the list of 20 candidates. Simulated annealing was introduced into the simulation by systematically incrementing
in the Metropolis criterion, –
E, over the range [0.5–4.0], where
= 1/RT
2 at 300°K. Typical processing times are 2–3 h on a desktop computer for each simulation.
The Metropolis criterion was applied using an energy function with four simple terms: (1) steric exclusion (E soft_debump), (2) hydrogen-bonding (E HB), (3) global compaction (E confine), and (4) contact energy (E contact). The first three terms are identical to those described in our earlier protocol (Gong et al. 2005). Here, a small additional contact energy term, E contact, was introduced to bias structures toward forming an interior hydrophobic core by assigning one of four discrete values (0.5, –0.25, –0.5, –1.0) to pairwise spatial neighbors based on their polarity (polar:apolar, small apolar:small apolar, small apolar:large apolar, and large apolar:large apolar, respectively). As was the case previously (Gong et al. 2005), hydrogen bonding remains the dominant term in the simulation energy.
Post-simulation processing
This protocol was applied to each target protein in 400 independent simulations. Conformations with the lowest simulation energy from every simulation were collected and used to construct a representative ensemble. Members of this ensemble were then clustered by structure (viz., an
-carbon distance matrix) using Pycluster (de Hoon et al. 2004). Clusters with a Pearson correlation coefficient >95% and spanning >5% of the ensemble were retained for further analysis, and their energy distribution and RMSD from the native conformation were calculated.
Side-chain decoration
SCWRL3.0 (Canutescu et al. 2003) was used to add side chains to the lowest energy backbone conformation within the most stable cluster, as specified by the amino acid sequence. Steric clashes, an unavoidable byproduct of SCWRL side-chain decoration, were relieved by 1000 steps of side-chain torsional angle conjugate gradient minimization, using a soft-sphere potential. The model was further optimized by an additional 1000 steps of conjugate gradient minimization using the program NAMD-2.6 (http://www.ks.uiuc.edu/Research/namd) (Phillips et al. 2005), with parameters from the CHARM22 all-hydrogen force field.
| Footnotes |
|---|
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.072988407.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
Avbelj, F., Kocjan, D., and Baldwin, R.L. 2004. Protein chemical shifts arising from
-helices and
-sheets depend on solvent exposure. Proc. Natl. Acad. Sci. 101: 17394–17397.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235–242.
Canutescu, A.A., Shelenkov, A.A., and Dunbrack Jr, R.L. 2003. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 12: 2001–2014.
Cornilescu, G., Delaglio, F., and Bax, A. 1999. Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J. Biomol. NMR 13: 289–302.[CrossRef][Medline]
de Hoon, M.J., Imoto, S., Nolan, J., and Miyano, S. 2004. Open source clustering software. Bioinformatics 20: 1453–1454.
Delaglio, F., Kontaxis, G., and Bax, A. 2000. Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J. Am. Chem. Soc. 122: 2142–2143.[CrossRef]
Dill, K.A. 1985. Theory for the folding and stability of globular proteins. Biochemistry 24: 1501–1509.[CrossRef][Medline]
Eghbalnia, H.R., Wang, L., Bahrami, A., Assadi, A., and Markley, J.L. 2005. Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements. J. Biomol. NMR 32: 71–81.[CrossRef][Medline]
Fleming, P.J. and Rose, G.D. 2005. Do all backbone polar groups in proteins form hydrogen bonds? Protein Sci. 14: 1911–1917.
Fleming, P.J., Gong, H., and Rose, G.D. 2006. Secondary structure determines protein topology. Protein Sci. 15: 1829–1834.
Gong, H. and Rose, G.D. 2005. Does secondary structure determine tertiary structure in proteins? Proteins 61: 338–343.[CrossRef][Medline]
Gong, H., Fleming, P.J., and Rose, G.D. 2005. Building native protein conformation from highly approximate backbone torsion angles. Proc. Natl. Acad. Sci. 102: 16227–16232.
Kamat, A.P. and Lesk, A.M. 2007. Contact patterns between helices and strands of sheet define protein folding patterns. Proteins 66: 869–876.[CrossRef][Medline]
Kauzmann, W. 1959. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14: 1–63.[Medline]
Kontaxis, G., Delaglio, F., and Bax, A. 2005. Molecular fragment replacement approach to protein structure determination by chemical shift and dipolar homology database mining. Methods Enzymol. 394: 42–78.[CrossRef][Medline]
Mayer, K.L., Qu, Y., Bansal, S., LeBlond, P.D., Jenney Jr, F.E., Brereton, P.S., Adams, M.W., Xu, Y., and Prestegard, J.H. 2006. Structure determination of a new protein from backbone-centered NMR data and NMR-assisted structure prediction. Proteins 65: 480–489.[CrossRef][Medline]
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. 1953. Equation of state calculations by fast computing machines. J. Chem. Phys. 21: 1087–1092.[CrossRef]
Montelione, G.T., Zheng, D., Huang, Y.J., Gunsalus, K.C., and Szyperski, T. 2000. Protein NMR spectroscopy in structural genomics. Nat. Struct. Biol. 7: (Suppl): 982–985.[CrossRef][Medline]
Moseley, H.N. and Montelione, G.T. 1999. Automated analysis of NMR assignments and structures for proteins. Curr. Opin. Struct. Biol. 9: 635–642.[CrossRef][Medline]
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.[CrossRef][Medline]
Myers, J.K. and Pace, C.N. 1996. Hydrogen bonding stabilizes globular proteins. Biophys. J. 71: 2033–2039.[Medline]
Neal, S., Nip, A.M., Zhang, H., and Wishart, D.S. 2003. Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J. Biomol. NMR 26: 215–240.[CrossRef][Medline]
Orengo, C.A., Michie, A.D., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATH—A hierarchic classification of protein domain structures. Structure 5: 1093–1108.[Medline]
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., and Schulten, K. 2005. Scalable molecular dynamics with NAMD. J. Comput. Chem. 26: 1781–1802.[CrossRef][Medline]
Prestegard, J.H., al-Hashimi, H.M., and Tolman, J.R. 2000. NMR structures of biomolecules using field oriented media and residual dipolar couplings. Q. Rev. Biophys. 33: 371–424.[CrossRef][Medline]
Rohl, C.A. and Baker, D. 2002. De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. J. Am. Chem. Soc. 124: 2723–2729.[CrossRef][Medline]
Rose, G.D., Fleming, P.J., Banavar, J.R., and Maritan, A. 2006. A backbone-based theory of protein folding. Proc. Natl. Acad. Sci. 103: 16623–16633.
Shen, Y. and Bax, A. 2007. Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J. Biomol. NMR (in press.
Street, T.O., Bolen, D.W., and Rose, G.D. 2006. A molecular mechanism for osmolyte-induced protein stability. Proc. Natl. Acad. Sci. 103: 13997–14002.
Tolman, J.R., Al-Hashimi, H.M., Kay, L.E., and Prestegard, J.H. 2001. Structural and dynamic analysis of residual dipolar coupling data for proteins. J. Am. Chem. Soc. 123: 1416–1424.[CrossRef][Medline]
Wang, G. and Dunbrack Jr, R.L. 2003. PISCES: A protein sequence culling server. Bioinformatics 19: 1589–1591.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
D. S. Wishart, D. Arndt, M. Berjanskii, P. Tang, J. Zhou, and G. Lin CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data Nucleic Acids Res., July 1, 2008; 36(suppl_2): W496 - W502. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Shen, O. Lange, F. Delaglio, P. Rossi, J. M. Aramini, G. Liu, A. Eletsky, Y. Wu, K. K. Singarapu, A. Lemak, et al. From the Cover: Consistent blind protein structure generation from NMR chemical shift data PNAS, March 25, 2008; 105(12): 4685 - 4690. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Gong and G. D. Rose Assessing the solvent-dependent surface area of unfolded proteins using an ensemble model PNAS, March 4, 2008; 105(9): 3321 - 3326. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |