|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Nagoya University, Department of Applied Physics, Graduate School of Engineering, Nagoya, Aichi 464-8603, Japan
2 Tokyo University of Agriculture and Technology, Department of Biotechnology, Tokyo 184-8588, Japan
Reprint requests to: Nobuyuki Uchikoga, Department of Applied Physics, Graduate School of Engineering, Nagoya University, Furocho, Chikusaku, Nagoya, Aichi 464-8603, Japan; e-mail: uchikoga{at}bp.nuap.nagoya-u.ac.jp; fax: +81-52-789-3708.
(RECEIVED July 19, 2004; FINAL REVISION August 27, 2004; ACCEPTED August 27, 2004)
| Abstract |
|---|
|
|
|---|
Keywords: structural classification; extended protein; bioinformatics; structural genomics; mechanism of structural stabilization; physical properties of amino acid residues
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04984505.
| Introduction |
|---|
|
|
|---|
Such extended proteins provide very interesting problems involving structure and changes in structure. First, extended proteins lack some physical properties of globular proteins, and vice versa. The structure of single extended protein molecules, as exemplified by calmodulin and troponin C, consists of separate domains near each terminal linked by a central segment exposed to water (Babu et al. 1988; Houdusse et al. 1997; Chou et al. 2001). In contrast, globular proteins are stabilized by a hydrophobic core (Kauzmann 1959).
Second, extended proteins often contain a flexible segment, which allows changes in their structure to occur. For example, the central part of the region linking the terminal domains of calmodulin is flexible (Zhang et al. 1995; Chou et al. 2001). Because of this flexible segment, an extended protein such as calmodulin often has a globular form when it binds a short peptide molecule (Ikura et al. 1992; Meador et al. 1992). In contrast, when a zinc finger protein binds to double-stranded DNA molecules (Pavletich and Pabo 1993), it maintains its extended structure. In both cases, the extended proteins involved have the ability to undergo various structural changes. The mechanisms of stabilization and structural change of extended proteins involve interesting physical problems, because physical conditions of the extended structure must be changed to those of a globular structure. The two aspects of extended proteins discussed above can provide clues useful for classification of extended proteins. Also, they raise the following question: What is the most effective method for classification of extended proteins with a changeable structure?
The most widely used approaches to structural classification relating the structures and functions of proteins are based on comparative methods. Prediction of disordered or unstructured regions is performed using genome-scale comparative analysis methods; e.g., the prediction tool DISOPRED (Jones and Ward 2003; Ward et al. 2004). However, completely orphan sequences cannot be analyzed by comparative methods. Ab initio prediction is potentially applicable for genome-scale analysis of extended proteins, but that method appears to require further development for analysis of large amino acid sequences (Hardin et al. 2002). Because structure can be partly inferred from function, classification of extended structures at low resolution can be effective in complete genome analysis. If physical properties that distinguish extended proteins from globular proteins are indicated by examination of amino acid sequences, this can provide clues to the physical principles involved in protein folding and structural change. Membrane proteins have the following physical properties: at least one very hydrophobic segment that spans the hydrophobic region of the membrane, and clusters of amphiphilic residues at the membrane-water interface. On the basis of those physical characteristics, a highly accurate computational method for classifying membrane proteins has been developed (Hirokawa et al. 1998). This indicates that investigation of physical properties of amino acid sequences is useful for classification of proteins.
In the present study, we examined structures and amino acid sequences in the Protein Data Bank (PDB; Berman et al. 2000) to elucidate stabilization mechanisms of extended proteins. We found that the most important factor in discrimination of extended proteins from globular proteins was long-range electrostatic repulsion between separated charges within an extended molecule. The second most important factor was the enhanced propensities of amphiphilic amino acids in the central region exposed to water. In this paper, we also discuss the structural changes that occur when extended proteins bind other molecules. Furthermore, we developed a software tool (SOSUIdumbbell) for predicting extended proteins, based on the two factors described above. This software tool can be applied to genome-scale data sets.
| Results |
|---|
|
|
|---|
|
|
Charge balance
The net charges of extended proteins were >20 or < 20. If electrostatic repulsion is a dominant factor in stabilization of the extended structure, the amino acid sequences of these proteins should all show the same pattern of distribution of electric charges.
When the amino acid sequence of calmodulin is divided into N- and C-terminal globular domains and a linking helix (as per PDB annotation), these three parts have negative charges of 12, 9, and 2, respectively. Troponin C shows a very similar distribution: 14, 12, and 2, respectively. This special charge distribution suggests that a repulsion mechanism is responsible for the structural stability of these extended proteins; i.e., repulsive interaction between two globular domains can prevent collapse of the extended structure. Even if the central helix in these proteins is flexible (Zhang et al. 1995; Chou et al. 2001), moderate repulsive interaction between the end points can maintain stability. To elucidate the repulsion mechanism of extended proteins, net charges of the N- and C-terminal halves of various proteins were examined. The distribution of net charge was generally random around the origin of the dispersion diagram (Fig. 2C
).
We then estimated the strength of the charge balance mechanism involved in structural stability of extended proteins. For two domains with a net charge of 10 each, at a distance of 2.5 nm, and with the dielectric constant of water set at 80, we obtained a repulsive interaction energy value of 1.2 x 1019 joules. This value is about 30 times the thermal energy required to affect the conformation of a protein (4 x 1021 joules). Therefore, proteins with sufficiently great repulsive energy between the two terminal domains to affect conformation are located in the region of the graph where QN x QC > 100 (Fig. 2C
). Solid circles in Figure 2C
indicate proteins with a charge density >0.14. Many proteins that would be plotted in the area of the graph where QN x QC > 100 have a charge density (DQ) <0.14. Only 58 of the original 26,075 amino acid sequences satisfied the following three inequality conditions for electric charge distribution: |QTotal|> 20, QN x QC > 100 and DQ > 0.14.
Central domain exposed to water
Some of the 58 proteins that satisfied the three inequality conditions of charge balance had a compact structure. This suggests that interaction between amino acid residues and water is also an important factor in stabilization of the extended structure. We examined the amino acid sequences of extended proteins to identify domains exposed to water.
We calculated relative propensity of amino acids in two parts of a domain exposed to water: center and termini. The center and termini of a central helix linking N- and C-terminal domains in typical extended proteins have complementary amino acid compositions (Fig. 3A
). The center mainly contains amphiphilic amino acids (Lys, Arg, His, Glu, and Gln, in order of value) with a side chain containing a polar group and a hydrophobic stem (Hirokawa et al. 1998; Mitaku et al. 2002). In contrast, hydrophobic residues were abundant at the termini of central helices (Fig. 3A
).
|
| Discussion |
|---|
|
|
|---|
All calmodulins and troponin Cs in our data set were predicted by our criteria. Also, we identified a transcription factor whose amino acid sequence had low homology with calmodulin and troponin C. Our genome analysis identified more than 150 proteins of the human genome that were predicted to be extended, and about half of these were DNA- or RNA-binding proteins annotated as transcription factors, histones, or ribosomal proteins. The four conditions in the preceding paragraph can be used to identify single proteins with extended structures and with no sequence similarity to calmodulin or troponin C. Further results of genome analysis using this algorithm will be reported and discussed in future papers. The present results indicate that extended proteins are stabilized by long-range interaction between the terminals and short-range interaction between the central domain and water molecules.
Most of the predicted extended proteins contained the EF hand motif. There is evidence suggesting that the EF hand motif is responsible for the charge balance of extended proteins. However, recoverin (1REC) and yeast frequenin (1FPW), which have globular forms, each have two EF hand domains (one near the N-terminal, and one near the C-terminal). The charge balance of 1REC is 0 and 3, and the charge balance of 1FPW is 3 and 9. This indicates that charge balance involves features of amino acid sequences other than the EF hand motif.
We have developed a method for predicting extended proteins from amino acid sequences. This algorithm, SO-SUIdumbbell, is available on the Web as http://bp.nuap.nagoya-u.ac.jp/sosui/sosuidumbbell/dumbbell_submit.html, and is described in Figure 4
. Because this tool requires only the amino acid sequence of a protein to determine whether it has an extended structure, it can be applied to sets of all amino acid sequences of entire genomes. We plan to use SOSUIdumbbell to analyze complete genomes in future studies.
|
Structural change of extended proteins
We found two types of large changes in conformation of extended proteins involved in molecular complexes, by analysis of structural data. Each complex contains a molecule with a charge that is opposite in sign to that of the extended protein. For example, when calmodulin is bound to a polypeptide molecule, its extended structure (Fig. 5
, middle left) changes to a collapsed form (Fig. 5
, bottom left). Fluctuation of the structure of calmodulin bound to a peptide molecule has been observed in analysis of 3D NMR structural data (Ikura et al. 1992; Meador et al. 1992). In contrast, transcription factors with a large positive net charge bind to double-stranded DNA in an extended form when they recognize a specific nucleotide sequence (Pavletich and Pabo 1993), and the structures of most single extended proteins are unknown (Fig. 5
, middle and bottom right). Calmodulin, which has a large negative net charge, can combine with four calcium ions (two at each terminal), implying that long-range repulsion between the two terminal domains is weakened. Conversely, when zinc finger motifs bind zinc ions, the protein becomes more positively charged, which increases its ability to bind to DNA molecules. In each of these cases, long-range electrostatic interaction appears to be the dominant factor in the structural change of the extended protein. Investigation of physical properties of amino acid sequences can provide useful information, not only about protein folding but also about structural changes.
|
| Materials and methods |
|---|
|
|
|---|
Definition of extended protein structure
First, we defined the structure of extended proteins and selected typical extended structures from a data set of the PDB. Structural data of extended proteins was compared with physical properties of their amino acid sequences. Then, we used 3D structures of single proteins (7234 proteins) in PDB release 96 to make a complementary data set of extended-type proteins and other types of proteins. In this analysis, proteins in complexes were excluded because their conformation may be stabilized by protein-protein interaction, which apparently involves interaction between specific domains. Size of proteins was limited to a range of 100500 residues. A typical extended protein is composed of three segments: N- and C-terminus segments, and a central helix. The central helix is longer than 19 residues, and is near the midpoint of the amino acid sequence. The domains at the N- and C-terminal regions contain more than 50 residues each. Selection of extended-type proteins was also based on the spatial relationship between the central helix and the N- and C-terminus domains (Fig. 1A
). We imagined a virtual plane perpendicular to the central helix at its midpoint. With a protein that is globular overall, the virtual plane will pass through domains of both the N- and C-terminal regions. Briefly, proteins were classified as extended type if the virtual plane did not pass through the N- or C-terminal segment; i.e., WNN=WCC=1.0 and WNC=WCN=0 (Fig. 1B
). This scheme selects proteins with typical extended structures. Eleven of the 7234 single proteins satisfied these structural criteria for extended proteins: calmodulin (1CLL
[PDB]
, 1CLM, 1OSA, 4CLN) and troponin C (1NCX, 1NCY, 1NCZ, 1TN4, 1TOP, 2TN4, 4TNC).
Physicochemical properties of central region exposed to water
To examine the central domain, we used the hydropathy and amphiphilicity indices of amino acids determined by Kyte and Doolittle (1982) and Mitaku et al. (2002), respectively. Minimum amphiphilicity was set at 0.9, and maximum hydrophobicity was set at 1.15. The side lobes of hydrophobicity peaks were > 0.25, indicating sequences at least 10 amino acids long. Using these conditions, we were able to locate the edges of the central region exposed to water. The central region was determined as the area in which the proportion of the lateral segments in the terminals was <1.50 for hydrophobicity and >0.67 for amphiphilicity.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Chou, J.J., Li, S., Klee, C.B., and Bax, A. 2001. Solution structure of Ca(2+)-calmodulin reveals flexible hand-like properties of its domains. Nat. Struct. Biol. 8: 990997.[CrossRef][Medline]
Hardin, C., Pogorelov, T.V., and Luthey-Schulten, Z. 2002. Ab initio protein structure prediction. Curr. Opin. Struct. Biol. 12: 176181.[CrossRef][Medline]
Hirokawa, T., Boon-Chieng, S., and Mitaku, S. 1998. SOSUI: Classification and secondary structure prediction system for membrane proteins. Bioinformatics 14: 378379.
Houdusse, A., Love, M.L., Dominguez, R., Grabarek, Z., and Cohen, C. 1997. Structures of four Ca2+-bound troponin C at 2.0 Å resolution: Further insights into the Ca2+-switch in the calmodulin superfamily. Structure 5: 16951711.[Medline]
Ikura, M., Clore, G.M., Gronenborn, A.M., Zhu, G., Klee, C.B., and Bax, A. 1992. Solution structure of a calmodulin-target peptide complex by multidimensional NMR. Science 256: 632638.
Jones, D.T. and Ward, J.J. 2003. Prediction of disordered regions in proteins from position specific score matrices. Proteins 53: 573578.
Kauzmann, W. 1959. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14: 163.[Medline]
Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105132.[CrossRef][Medline]
Meador, W.R., Means, A.R., and Quincho, F.A. 1992. Target enzyme recognition by calmodulin: 2.4 Å structure of a calmodulin-peptide complex. Science 257: 12511255.
Mitaku, S., Hirokawa, T., and Tsuji, T. 2002. Amphiphilicity index of polar amino acids as an aid in the characterization of amino acid preference at membrane-water interfaces. Bioinformatics 18: 608616.
Pavletich, N.P. and Pabo, C.O. 1993. Crystal structure of a five-finger gli-DNA complex: New perspectives on Zinc fingers. Science 261: 17011707.
Ward, J.J., Sodhi, J.S., McGuffin, L.J., Buxton, B.F., and Jones, D.T. 2004. Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J. Mol. Biol. 337: 635645.[CrossRef][Medline]
Wright, P.E., and Dyson, H.J. 1999. Intrinsically unstructured proteins: Reassessing the protein structure-function paradigm. J. Mol. Biol. 293: 321331.[CrossRef][Medline]
Zhang, M., Tanaka, T., and Ikura, M. 1995. Calcium-induced conformational transition revealed by the solution structure of apo calmodulin. Nat. Struct. Biol. 2: 758767.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |