|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health (NIH), Bethesda, Maryland 20892, USA
2 Biological Sciences Program, Institute for Physical Science and Technology and Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, USA
Reprint requests to: D. Thirumalai, Biological Sciences Program, Institute for Physical Science and Technology and Department of Chemistry and Biochemistry, University of Maryland, College Park, MD 20742, USA; e-mail: thirum{at}glue.umd.edu; fax: (301) 314-9404.
(RECEIVED June 14, 2004; FINAL REVISION September 7, 2004; ACCEPTED September 8, 2004)
| Abstract |
|---|
NB
4. A limited analysis of the predicted binding sequences shows that they do not adopt any preferred secondary structure. Our method also predicts the putative binding regions in the identified SPs. The results of our study show that a variety of SPs, associated with diverse functions, can interact with GroEL. Keywords: chaperonins; protein recognition; E. coli; yeast genomes
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04933205.
| Introduction |
|---|
Although GroEL interacts with a vast array of proteins, including random polypeptide sequences, an important unresolved question is "What are the natural substrates in a genome that require the chaperonin machinery to reach their functionally competent states?" The answer to this question, under in vivo conditions depends on a number of factors, including the growth condition of E. coli. Based on reasonable assumptions about the amount of GroEL and GroES in E. coli and estimates of the rates of assisted in vitro folding, it has been argued (Lorimer 1996) that only ~2%5% of the proteins can afford to recruit the GroEL machinery. Here, we identify the set of proteins in five genomes including E. coli and Saccharomyces cerevisiae that are capable of forming relatively stable complexes with chaperonins. Using the patterns of proteins that are known to form complexes with GroEL, we show that a large number of E. coli proteins can interact with GroEL in the same manner as GroES or the strongly binding peptide (SBP) (Chen and Sigler 1999). Our method also identifies putative stretches in the protein sequence that are responsible for this interaction. The binding patterns have diverse secondary structures even when found within the same protein.
| Hypothesis |
|---|
-hairpin conformation adopted by the GroEL-bound GroES mobile-loop peptide and the SBP, as well as their sequence similarities, it has been suggested that the peptides that interact strongly with GroEL belong to the co-chaperonin class (Shewmaker et al. 2001).
|
2) patterns of residues similar to that found in the mobile loop of GroES or the SBP. The presence of such patterns ensures that these proteins can be accommodated between the grooves of helices H and I. The requirement that natural SPs have at least two such patterns in a sequence guarantees that the SP can interact with multiple GroEL-binding sites simultaneously. Stringent substrates apparently interact with at least three consecutive subunits (Farr et al. 2000). The stability of the GroELSP complex is determined not only by binding to multiple sites, but also by specific sequence-dependent interactions between SP and the apical domain. Strong binding to a few binding sites can lead to more stable GroELSP than weak binding to several binding sites. Interaction with multiple subunits imparts stability to the GroELSP complex and ensures a dynamic role for GroEL-assisted folding. Using the structural complementarity to the residues in the binding pocket of GroEL, we determine protein sequences that exhibit the same complementarity to the GroEL-binding site as GroES and SBP. Among this set of putative GroEL substrates, we find several proteins that have been shown to interact strongly with GroEL in vitro (Houry et al. 1999), which indicates that other GroEL substrates might have the binding features of GroES. The absence of structures for the SPGroEL complex makes it difficult to estimate the typical number of contacts, NC, natural SPs make with the apical domain. As a result, we also searched for the number of potential SPs when NC and the number of distinct interaction sites, NB, of GroEL (see Materials and Methods for details) are varied. By varying NC between four and five, we identify most of the currently known SPs that recognize GroEL. The minimal hypothesis allows us to identify a vast number of potential natural SPs for chaperonins. The calculational details are given in Materials and Methods.
| Putative E. coli substrate proteins |
|---|
64%) contain at least one binding pattern that exactly matches with the one in GroES (Table 1
|
| Comparison between predictions and experiments |
|---|
|
The preceding comparisons have been made by searching for a precise GroES pattern, which was determined by noting that the mobile loop makes six contacts with H and I helices of the apical domain (see Materials and Methods). As explained in Materials and Methods, the typical number of contacts (NC) a polypeptide chain makes with GroEL during the capture process is not known. The value of NC is likely to be a function of the SP. Even for a given SP, NC is likely to fluctuate. To generalize our method, we made a search for the number of sequences as a function of NC and NB (see Materials and Methods). If NC is too small, then all SPs would be identified as natural SPs. However, small NC would lead to a highly unstable SPGroEL complex. On the other hand, if NC is large as in the case for SBP, which makes eight contacts with the apical domain, then very few SPs would qualify as natural SPs (see Table 1
). This is because unbinding of the SP in a hyperstable GroELSP complex is improbable. Thus, there should be a range of NCvalues that would give rise to a stable, but not hyperstable SPGroEL complex. In terms of the variables (NC, NB), we find that as long as 4 < NC < 6, then interaction with about two to four binding sites suffices to identify the expected third of the E. coli proteins as natural SPs. From Figure 2
, it follows that in excess of 80% of the 52 proteins identified by Houry et al. (1999) can be predicted using the two-dimensional map, as long as NC and NB are in the range given above.
|
| S. cerevisiae and E. coli genomes have similar percentages of substrate proteins that interact with chaperonins |
|---|
| Putative binding regions do not have a preferred secondary structure |
|---|
-helical, but 776784 is a strand. Among the two threonine synthase binding regions, in the segment 2735, a short strand is enclosed by two coiled regions, but segment 137145 is predicted to form an
-helix. Similar variations occur among the four binding regions of the UDP-glucose lipid carrier transferase: 97105 and 121129 are
-helical, 151159 includes a coil region and a helix, and 207215 forms a
-strand and a coil region. These anecdotal examples support the experimental studies of Yoshida and coworkers (Aoki et al. 2000) who found that "random" sequences with no preference for any specific secondary structure can bind to GroEL. | The eubacterium Ureaplasma urealyticum contains putative GroEL substrates |
|---|
| Thermophilic and hyperthermophilic bacteria contain different percentages of putative GroEL substrates |
|---|
| A set of membrane proteins contains GroES-like binding patterns |
|---|
| Assessing the sensitivity of results to variations in binding patterns |
|---|
|
We also considered the consequence of varying the number of contiguous hydrophobic residues in our search pattern. This type of pattern could define an SP-binding pattern in view of the importance of the hydrophobic interactions for SP recognition by GroEL. The number of sequence matches as a function of the length of the hydrophobic pattern shows (Fig. 3
) that short hydrophobic patterns (
4) are present in most protein sequences. The number of sequences that contain a continuous stretch of six hydrophobic residues is close to that containing the GroES-binding pattern is 2488. The similarity of the results for these two different patterns is also observed in the number of sequences that contain these patterns multiple times. It follows that polypeptide chains that contain a minimum number (between five and six) (Fig. 3
) of contiguous stretch of hydrophobic residues can interact favorably with GroEL. To assess whether this conclusion depends on natural sequences, we generated a database of random sequences each with length
= 314. The sequences are made from four types of amino acids with probability of occurrence corresponding to that in E. coli (Materials and Methods). Analysis of the database of random sequences shows that the number of sequences with five or six continuous stretches of hydrophobic residues is nearly the same as in E. coli (White and Jacobs 1990). From this, we conclude that GroEL can form complexes with random sequences as long as they possess a continuous stretch of at least five hydrophobic amino acid residues.
|
| Discussion |
|---|
2) of these patterns because they can form stable complexes (not hyperstable as in the SBPapical domain complex) with GroEL through interactions at multiple binding sites. Several experimentally identified GroEL SPs are found to contain these patterns, suggesting that the GroES and SBP-like complementarity are features of a large class of substrates, including ones that are considered to be stringent SPs. The putative SPs have diverse functions and no common structural feature. Our method is a physically based procedure for identifying natural SPs for GroEL and its analog in other organisms. The good agreement between the predictions and the experimentally identified GroEL substrates (provided NC and NB are allowed to vary), validates the methodology. However, it is clear that the sequence-based approach alone cannot identify all of the SPs that interact transiently with GroEL in a cellular environment. This could especially be the case for large SPs (those that cannot be fully encapsulated in the GroEL cavity) that form only marginally stable complexes with GroEL. Even such proteins like aconitase, which may not interact directly with the grooves of helices H and I could have binding patterns of the GroES mobile loop. Despite this limitation, the present method has, for the first time, provided a basis for identifying natural substrate proteins that require the chaperonin machinery. The predictions of a bioinformatics-based approach are amenable to experimental tests.
| Materials and methods |
|---|
To perform a general search of the genome for putative chaperonin SPs, we divided the 20 amino acids into four classes, namely, hydrophobic (H), polar (P), positively (+), and negatively charged (). The four classes are H (C, F, I, L, W, V, M, Y, and A), P (G, P, N, T, S, Q, and H), + (R and K), and (D and E). Previously we showed (Stan et al. 2003) that, for GroEL and GroES functions, the chemical class, but not the identity of the residue, is conserved. Translating the binding sequences into the four residue types, the amino acid residues that are complementary to those in the strongly conserved residues in the GroEL-binding sites are HH_ _ _HPHHPP for SBP and P_HHH_P_H for GroES. Neither of these sequence patterns contain charged residues, which is consistent with the notion that predominantly hydrophobic attraction (however, see Buckle et al. 1997) helps the apical domain ensnare the SPs. The most likely substrates are those that contain at least two sequence patterns such as the ones found in the mobile loop of GroES or the SBP. Accordingly, we searched for these two patterns in five genomes: the Escherichia coli K12 (NCBI accession code NC_000913 [GenBank] ; Blattner et al. 1997), the Saccharomyces cerevisiae (the current version as of May 1, 2004 at the Saccharomyces Genome Database) (Goffeau et al. 1996), the Ureaplasma urealyticum (NCBI accession code NC_002162 [GenBank] ; Glass et al. 2000), the Thermoplasma acidophilum (NCBI accession code NC_002578 [GenBank] ; Ruepp et al. 2000), and the Methanopyrus kandleri (NCBI accession code NC_003551 [GenBank] ; Slesarev et al. 2002).
The patterns identified using GroES complementarity can vary to some extent. The extent of variation will depend on the strength of interaction between the binding sites and the apical domain. For efficient annealing, it has been argued (Orland and Thirumalai 1997) that the average stabilizing energy per residue between SP binding sites and GroEL be on the order of (12) kBT. The length of the GroES pattern, which is the natural complement to the helices H and I, can change to satisfy the stability criterion. However, it is crucial to have a set of core residues in the SP that serve as recognition sites by the hydrophobic residues in the GroEL apical domain. Our previous study (Stan et al. 2003) shows that the core residues in GroES are G_IVL.
Sequences containing multiple matches are more likely to be natural SPs, as each substrate protein can bind to up to seven GroEL-binding sites. Simultaneously binding to several (say, >4) sites is unlikely because the resulting SPGroEL complex would have a very low dissociation constant. The successive matches must be separated by a minimal distance along the sequence corresponding to the spatial separation between adjacent binding sites. In the T state of GroEL, the binding sites from neighboring subunits are separated by 25 Å. To estimate the sequence length, 1, corresponding to this distance, we used the Flory formula R
b 13/5, where R is the end-to-end distance and b ~3.8 Å, is the distance between C
atoms. This leads to l ~23 residues between the end of one binding pattern and the beginning of the next along the sequence. The value of l
23 is approximate. It is likely that multiple binding sites that are separated by stiff loops (l < 23) can also serve as natural substrates. Most probable loops have l
10 (Camacho and Thirumalai 1995). Using 10
l
23 in the search for multiple binding patterns results in ~5% variation in the number of multiple pattern matches. Increasing l up to 60 results in only an ~10% change in the number of patterns.
The patterns that we search for are based on the number of contacts, NC, the mobile loop of GroES, or the SBP makes with the H and I helices of the apical domain. Because no SPGroEL structure for a natural SP is available, we searched for the instances that these exact patterns occur in the various genomes. Until there are several SPGroEL structures, it is uncertain whether the number of contacts that these peptides make is typical. Absent this information, we performed a general two-parameter search by varying the number of contacts, NC, and located a stretch of the polypeptide chain that can interact with the apical domain. The pattern changes as NC is changed. For a fixed pattern, we also searched the genomes for the number of binding sites (NB) in each sequence. A typical range of NB is likely to be 2
NB
4.
Statistical significance of sequence matches
To establish the statistical significance of our results, it is necessary to ascertain whether the observed number of patterns can arise randomly. The probability that a stretch of L randomly arranged residues matches a given pattern is (Karlin 1995)
![]() | (1) |
where Pi, i = H,P,+, is the probability of a residue being of type i, and li is the number of residues of type i in the pattern. The number of random arrangements of the four residue types and gaps (g) of length lg in the given pattern is
![]() | (2) |
where L = 1H + 1P + 1+ + 1 + 1g. The expected number of sequences in a database that contain at least one pattern is
![]() | (3) |
where L1 =
S L + 1, and NS is the number of sequences. We assumed that the average sequence length,
S, satisfies the condition
S > L. The frequencies of the four chemical types in the E. coli genome are PH = 0.46, PP = 0.38, P+ = 0.10, and P = 0.11. Excluding sequences of <20 residues,
S for E. coli is 314. For S. cerevisiae, the corresponding values are PH = 0.40, PP = 0.37, P+ = 0.12, P = 0.12, and
S = 458, while for U. urealyticum PH = 0.43, PP = 0.32, P+ = 0.13, P = 0.12, and
S = 372.
| Footnotes |
|---|
| Acknowledgments |
|---|
| References |
|---|
Blattner, F.A., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., Mayhew, G.F., et al. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 14531462.
Bochkareva, E., Seluanov, A., Bibi, E., and Girshovich, A. 1996. Chaperonin-promoted post-translational membrane insertion of a multispanning membrane protein lactose permease. J. Biol. Chem. 271: 2225622261.
Buckle, A.M., Zahn, R., and Fersht, A.R. 1997. A structural model for GroEL - polypeptide recognition. Proc. Natl. Acad. Sci. 94: 35713575.
Camacho, C. and Thirumalai, D. 1995. Theoretical predictions of folding pathways by using the proximity rule, with applications to bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. 92: 12771281.
Chaudhuri, T.K., Farr, G.W., Fenton, W.A., Rospert, S., and Horwich, A.L. 2001. GroEL/GroES-mediated folding of a protein too large to be encapsulated. Cell 107: 223233.[CrossRef][Medline]
Chen, L.L. and Sigler, P.B. 1999. The crystal structure of a GroEL/peptide complex: Plasticity as a basis for substrate diversity. Cell 99: 757768.[CrossRef][Medline]
Deaton, J., Sun, J., Holzenburg, A., Struck, D.K., Berry, J., and Young, R. 2004. Functional bacteriorhodopsin is efficiently solubilized and delivered to membranes by the chaperonin GroEL. Proc. Natl. Acad. Sci. 101: 22812286.
Farr, G.W., Furtak, K., Rowland, M.R., Ranson, N.A., Saibil, H.R., Kirchhausen, T., and Horwich, A.L. 2000. Multivalent binding of nonnative substrate proteins by the chaperonin GroEL. Cell 100: 561573.[CrossRef][Medline]
Fayet, O., Louarn, J.M., and Georgopoulos, C. 1986. Suppression of Escherichia coli DNA46 mutation by amplification of the groes and groel genes. Mol. Gen. Genet. 202: 434445.[CrossRef]
Fenton, W.A. and Horwich, A.L. 2003. Chaperonin-mediated protein folding: Fate of substrate polypeptide. Q. Rev. Biophys. 36: 229256.[CrossRef][Medline]
Fenton, W.A., Kashi, Y., Furtak, K., and Horwich, A.L. 1994. Residues in chaperonin GroEL required for polypeptide binding and release. Nature 371: 614619.[CrossRef][Medline]
Glass, J.I., Lefkowitz, E.J., Glass, J.S., Heiner, C.R., Chen, E.Y., and Cassell, G.H. 2000. The complete sequence of the mucosal pathogen Ureaplasma urealyticum. Nature 407: 757762.[CrossRef][Medline]
Goffeau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., et al. 1996. Life with 6000 genes. Science 274: 546567.
Gómez-Puertas, P., Martín-Benito, J., Carrascosa, J.L., Willison, K.R., and Valpuesta, J.M. 2004. The substrate recognition mechanisms in chaperonins. J. Mol. Recognit. 17: 8594.[CrossRef][Medline]
Gordon, C.L., Sather, S.K., Casjens, S., and King, J. 1994. Selective in vivo rescue by GroEL/ES of thermolabile folding intermediates to phage P22 structural proteins. J. Biol. Chem. 269: 2794127951.
Hemmingsen, S.M., Woolford, V., van der Vies, S.M., Tilly, K., Dennis, D.T., Georgopoulos, C.P., Hendrix, R.W., and Ellis, R.J. 1988. Homologous plant and bacterial proteins chaperone oligomeric protein assembly. Nature 333: 330334.[CrossRef][Medline]
Horovitz, A., Fridmann, Y., Kafri, G., and Yifrach, O. 2001. Allostery in chaperonins. J. Struct. Biol. 135: 104114.[CrossRef][Medline]
Houry, W.A., Frishman, D., Eckerskorn, C., Lottspeich, F., and Hartl, F.U. 1999. Identification of in vivo substrates of the chaperonin GroEL. Nature 402: 147154.[CrossRef][Medline]
Humphrey, W., Dalke, A., and Schulten, K. 1996. VMDvisual molecular dynamics. J. Mol. Graphics 14: 3338.[CrossRef][Medline]
Jones, D.T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195202.[CrossRef][Medline]
Karlin, S. 1995. Statistical significance of sequence patterns in proteins. Curr. Opin. Struct. Biol. 5: 360371.[CrossRef][Medline]
Landry, S.J., Taher, A., Georgopoulos, C., and van der Vies, S.M. 1996. Interplay of structure and disorder in cochaperonin mobile loops. Proc. Natl. Acad. Sci. 93: 1162211627.
Lorimer, G.H. 1996. A quantitative assessment of the role of the chaperonin proteins in protein folding in vivo. FASEB J. 10: 59.[Abstract]
Orland, H. and Thirumalai, D. 1997. A kinetic model for chaperonin assisted protein folding. J. Phys. 7: 533560.
Reading, D.S., Hallberg, R.L., and Myers, A.M. 1989. Characterization of the yeast HSP60 gene coding for a mitochondrial assembly factor. Nature 337: 655659.[CrossRef][Medline]
Ruepp, A., Graml, W., Santos-Martinez, M.L., Koretke, K.K., Volker, C., Mewes, H.W., Frishman, D., Stocker, S., Lupas, AN., and Baumeister, W. 2000. The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum. Nature 407: 508513.[CrossRef][Medline]
Shewmaker, F., Maskos, K., Simmerling, C., and Landry, S.J. 2001. The disordered mobile loop of GroES folds into a defined
-hairpin upon binding GroEL. J. Biol. Chem. 276: 3125731264.
Slesarev, A.I., Mezhevaya, K.V., Makarova, K.S., Polushin, N.N., Shcherbinina, O.V., Shakhova, V.V., Belova, G.I., Natale, L.A.D.A., Rogozin, I.B., Tatusov, R.L., et al. 2002. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. 99: 46444649.
Sobolev, V., Sorokine, A., Prilusky, J., Abola, E.E., and Edelman, M. 1999. Automated analysis of interatomic contacts in proteins. Bioinformatics 15: 327332.
Stan, G., Thirumalai, D., Lorimer, G.H., and Brooks, B.R. 2003. Annealing function of GroEL: Structural and bioninformatic analysis. Biophys. Chem. 100: 453467.[CrossRef][Medline]
Thirumalai, D. and Lorimer, G.H. 2001. Chaperonin-mediated protein folding. Annu. Rev. Biophys. Biomol. Struct. 30: 245269.[CrossRef][Medline]
van Dyk, T.K., Gatenby, A.A., and LaRossa, R.A. 1989. Demonstration by genetic suppression of interaction of GroE products with many proteins. Nature 342: 451453.[CrossRef][Medline]
Viitanen, P.V., Gatenby, A.A., and Lorimer, G.H. 1992. Purified chaperonin 60 (GroEL) interacts with the non-native states of a multitude of Escherichia coli proteins. Protein Sci. 1: 363369.[Abstract]
White, S.H. and Jacobs, R.E. 1990. Statistical distribution of hydrophobic residues along the length of protein chains: Implications for protein folding and evolution. Biophys. J. 57: 911921.
Xu, Z. and Sigler, P.B. 1998. GroEL/GroES: Structure and function of a two-stroke folding machine. J. Struct. Biol. 124: 129141.[CrossRef][Medline]
Xu, Z., Horwich, A.L., and Sigler, P.B. 1997. The crystal structure of the assymetric GroEL-GroES-(ADP)7 chaperonin complex. Nature 388: 741750.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
|