|
|
||||||||
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, New York, NY 10032, USA
3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
Reprint requests to: Burkhard Rost, Columbia University, New York, NY 10032, USA; e-mail: rost{at}columbia.edu; fax: (212) 305-7932.
(RECEIVED May 5, 2002; FINAL REVISION September 16, 2002; ACCEPTED September 16, 2002)
Terminology: Advanced prediction methods: all methods that do not exclusively use a hydrophobicity scale; simple prediction methods: membrane prediction methods exclusively based on hydrophobicity scales; loop: referring to the region that connects two transmembrane helices in sequence; in particular, such loops could consist of entire structural domains.
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0214602.
| Abstract |
|---|
|
|
|---|
Keywords: Membrane proteins; protein structure prediction; predicting transmembrane helices; bioinformatics
Abbreviations: 3D, three-dimensional DSSP, program assigning secondary structure (Kabsch and Sander 1983) HMM, hidden Markov model PDB, Protein Data Bank of experimentally determined 3D structures of proteins (Bernstein et al. 1977; Berman et al. 2000) SWISS-PROT, database of protein sequences (Bairoch and Apweiler 2000) TM, transmembrane TMH, transmembrane helix
| Introduction |
|---|
|
|
|---|
20 percentage points less accurate. Distribution of membrane helix length crucial parameter for prediction. Prediction methods typically explore that TMH are predominantly apolar and believed to be between 17 and 25 residues long (von Heijne 1996). The upper and lower bounds for the length of membrane helices are explicitly used by most prediction methods in two ways. (1) Some methods identify only hydrophobic regions as membrane helices that fall into the typical length interval (von Heijne 1992; Casadio et al. 1996; Persson and Argos 1996; Hirokawa et al. 1998; Ikeda et al. 2001; Jayasinghe et al. 2001). (2) Other methods search the best path through some predicted membrane helix propensity landscape that is compatible with such upper and lower bounds (Jones et al. 1994; Rost et al. 1996a, b; Krogh et al. 2001; Tusnady and Simon 2001). James Bowie found that the length distribution of three high-resolution structures was shifted toward longer helices (Bowie 1997).
Here, we re-evaluated the distribution of the length of TMH and that of the loops in between helices based on significantly larger dataset than previously used (Bowie 1997). Then, we analyzed 28 prediction methods in terms of their performance on short loops and long membrane helices.
| Results and Discussion |
|---|
|
|
|---|
|
|
5 residues long. Obviously, we cannot expect that the 36 sequence-unique high-resolution chains used in our study (see Materials and Methods) are fully representative for all helical membrane proteins. Given that we predict about 20,000 helical membrane proteins in the five entirely sequenced eukaryotes alone (Liu and Rost 2001, 2002), we also doubt that the 165-sequence-unique low-resolution proteins (see Materials and Methods) are more representative. Clearly, the high-resolution data are more accurate than the low-resolution data. Thus, our data suggested that a considerable percentage of all loops between membrane helices are very short.
|
90% of the loops longer than 15 residues were correctly detected by the advanced prediction methods, <60% of the loops
5 residues were identified (Fig. 4, left graph
|
|
32 residues used to group membrane helices into short and long, helices shorter than N were predicted less accurately than were helices longer than N; (2) the trend was inverted for helices longer than 32 residues (only available for high-resolution data). These very long helices were predicted less accurately than all other helices; and (3) helices shorter than 1720 residues posed an even stronger challenge to prediction methods than shorter helices (a significant drop of accuracy is shown in Figure 5
20% of all membrane helices are longer than 32 residues (Fig. 1
5000 proteins with a helix longer than 32 residues (Liu and Rost 2001, 2002).
|
33 residues) as being (1) correct, (2) incorrectly cut into two membrane helices, and (3) not predicted at all (Fig. 6
71% of the long helices correctly. In contrast to the advanced methods, the errors of simple methods had a six times higher rate of incorrectly splitting long helices than they had of missing the helix. This may suggest that the difficulty of simple methods with long helices is primarily due to overpredicting helical regions. This is supported by the fact that simple methods have great sensitivity but poor specificity at detecting TMH (Chen et al. 2002). In contrast, advanced methods have better specificity at detecting a TMH, as is indicated by their high accuracy of predicting even long helices. However, the price for being highly specific is that they can miss some TMH.
|
| Materials and methods |
|---|
|
|
|---|
Advanced prediction methods.
We referred to prediction methods as advanced when they implement more than simple hydrophobicity scales. We tested the following programs: DAS, HMMTOP (version 2), PHDhtm, PHDpsihtm, PRED-TMR, SOSUI, TMHMM (version 2), and TopPred2. TopPred2 averages the GES-scale of hydrophobicity (Engelman et al. 1986) using a trapezoid window (von Heijne 1992; Sipos and von Heijne 1993). PHDhtm combines a neural network using evolutionary information with a dynamic programming optimization of the final prediction (Rost et al. 1995, 1996b). PHDpsihtm uses PSI-BLAST (Altschul et al. 1997) alignments as input (B. Rost, unpubl.). DAS optimizes the use of hydrophobicity plots (Cserzö et al. 1997). SOSUI (Hirokawa et al. 1998) uses a combination of hydrophobicity and amphiphilicity preferences to predict membrane helices. TMHMM is the most advancedand seemingly most accuratecurrent method to predict membrane helices (Sonnhammer et al. 1998). It embeds a number of statistical preferences and rules into a hidden Markov model to optimize the prediction of the localization of membrane helices and their orientation. (Note: Similar concepts are used for HMMTOP; Tusnady and Simon 1998). PRED-TMR uses a standard hydrophobicity analysis with emphasis on detecting the ends and beginnings of membrane helices (Pasquier et al. 1999).
Simple methods exclusively based on hydrophobicity scales.
We also implemented our in-house prediction methods that simply used various hydrophobicity scales for prediction. In particular, we tested the following scales: A-Cid, normalized hydrophobicity scale for
proteins (Cid 1992); Av-Cid, normalized average hydrophobicity scale (Cid 1992); Ben-Tal, hydrophobicity scale representing free energy of transfer of an amino acid from water into the center of the hydrocarbon region of a model lipid bilayer (Kessel and Ben-Tal 2002); Bull-Breese, Bull-Breese hydrophobicity scale (Bull 1974); Eisenberg, normalized consensus hydrophobicity scale (Eisenberg et al. 1984); EM, solvation-free energy (Eisenberg and McLachlan 1986); Fauchere, hydrophobic parameter
from the partitioning of N-acetyl-amino acid amides (Fauchere and Pliska 1983); GES, hydrophobicity property (Engelman et al. 1986); Heijne, transfer-free energy to lipophilic phase (von Heijne and Blomberg 1979);Hopp-Woods, Hopp-Woods hydrophilicity value (Hopp and Woods 1981); KD, Kyte-Doolittle hydropathy index (Kyte and Doolittle 1982); Lawson, transfer-free energy (Lawson et al. 1984); Levitt, hydrophobic parameter (Levitt 1976); Nakashima, normalized composition of membrane proteins (Nakashima et al. 1990); Radzicka, transfer-free energy from 1-octanol to water (Radzicka and Wolfenden 1988); Roseman, solvation-corrected side chain hydropathy (Roseman 1988); Sweet, optimal matching hydrophobicity (Sweet and Eisenberg 1983); Wolfenden, hydration potential (Wolfenden et al. 1981); and WW, Wimley-White scale (Jayasinghe et al. 2001). Replacing the WW scale with each of the above-mentioned hydrophobicity indices, we used the WW algorithm to evaluate the predictive performance of each index.
Measuring accuracy.
To establish whether or not short loops and long membrane helices pose particular problems for prediction methods, we have to deviate from the scores used to evaluate performance of membrane predictions methods (Chen et al. 2002). In particular, we introduced the following scores that describe the difference in performance between short and long loops (
QL(N), Eq. 1
), and that between short and long TMH (
QT(N), Eq. 2
).
(1) Short loops.
We evaluated the performance of predicting short loops, that is, regions connecting two membrane helices with
N residues by compiling the difference between the accuracy in predicting short and long loops:
![]() | ((Eq. 1)) |
QL(n) could adopt values between -100 and 100; negative values indicate that longer loops are predicted more accurately than shorter ones.
(2) Long helices.
In analogy to the score describing the performance for short loops, we evaluated the performance of predicting long TMH by compiling the difference between the accuracy in predicting short and long helices:
![]() | ((Eq. 2)) |
N identified is the number of TMH with
N residues that were correctly predicted and Ntm
N, the number of TMH with
N residues observed. We considered a helix to be correctly predicted if it overlapped at least for 3 residues with the observed helix and if it was predicted as one continuous helix (over the region of the observed helix). This measure is illustrated in the following example for a prediction (T = TM;
loop): Observed: ---------TTTTTTTTTTTTTTTTTTTTT--------------
Predict 1: -------------------------TTTTTTTTTT--------
Predict 2: ---TTTTTTTTTTTTTT-TTTTTTTTTTTTTTTT---------
In this example, Predict 1 is right and Predict 2 is wrong because all we are trying to capture is whether or not methods tended to split long TMH.
QT(N) ranges from -100 to 100; it becomes negative if helices shorter than N residues are predicted more accurately than helices
N.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Arkin, I.T., Brünger, A.T., and Engelman, D.M. 1997. Are there dominant membrane protein families with a given number of helices? Proteins 28: 465466.[CrossRef][Medline]
Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 4548.
Berman, H.M., Westbrook, J., Feng, Z., Gillliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. 1977. The Protein Data Bank: A computer based archival file for macromolecular structures. J. Mol. Biol. 112: 535542.[Medline]
Bowie, J.U. 1997. Helix packing in membrane proteins. J. Mol. Biol. 272: 780799.[CrossRef][Medline]
Bull, H.B. and Breese, K. 1974. Surface tension of amino acid solutions: A hydrophobicity scale of the amino acid residues. Arch. Biochem. Biophys. 161: 665670.[CrossRef][Medline]
Casadio, R., Fariselli, P., Taroni, C., and Compiani, M. 1996. A predictor of transmembrane a-helix domains of proteins based on neural networks. Eur. J. Biophys. 24: 165178.
Chen, C.P., Kernytsky, A., and Rost, B. 2002. Transmembrane helix predictions revisited. Protein Sci. (this issue).
Cid, H., Bunster, M., Canales, M., and Gazitua, F. 1992. Hydrophobicity and structural classes in proteins. Prot. Engin. 5: 373375.
Cserzö, M., Wallin, E., Simon, I., von Heijne, G., and Elofsson, A. 1997. Prediction of transmembrane a-helices in prokaryotic membrane proteins: The dense alignment surface method. Prot. Engin. 10: 673676.
Eisenberg, D. and McLachlan, A.D. 1986. Solvation energy in protein folding and binding Nature 319: 199203.[CrossRef][Medline]
Eisenberg, D., Weiss, R.M., and Terwilliger, T.C. 1984. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Natl. Acad. Sci. 81: 140144.
Engelman, D.M., Steitz, T.A., and Goldman, A. 1986. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem. 15: 321353.[CrossRef][Medline]
Fauchere, J.L. and Pliska, V. 1983. Hydrophobic parameters pi of amino-acid side chains from the partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem. 18: 369375.
Frishman, D. and Mewes, H.W. 1997. Protein structural classes in five complete genomes. Nature Struct. Biol. 4: 626628.[CrossRef][Medline]
Goffeau, A., Nakai, K., Slonimski, P., and Risler, J.-L. 1993. The membrane proteins encoded by yeast chromosome III genes. FEBS Lett. 325: 112117.[CrossRef][Medline]
Gupta, R., Jung, E., Gooley, A.A., Williams, K.L., Brunak, S., and Hansen, J. 1999. Scanning the available Dictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites using neural networks. Glycobiology 9: 10091022.
Hirokawa, T., Boon-Chieng, S., and Mitaku, S. 1998. SOSUI: Classification and secondary structure prediction system for membrane proteins. Bioinformatics 14: 378379.
Hopp, T.P. and Woods, K.R. 1981. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. 78: 38243828.
Ikeda, M., Arai, M., Lao, D.M., and Shimizu, T. 2001. Transmembrane topology prediction methods: A reassessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topologies. Silico Biol. 1: http://www.bioinfo.de/isb/2001/2002/0003/.
Iverson, T.M., Luna-Chavez, C., Cecchini, G., and Rees, D.C. 1999. Structure of the E. coli fumarate reductase respiratory complex. Science 284: 1961.
Iwata, S., Lee, J.W., Okada, K., Lee, J.K., Iwata, M., Rasmussen, B., Link, T.A., Ramaswamy, S., and Jap, B.K. 1998. Complete structure of the 11-subunit bovine mitochondrial cytochrome BC1 complex. Science 281: 6471.
Jayasinghe, S., Hristova, K., and White, S.H. 2001. Energetics, stability, and prediction of transmembrane helices. J. Mol. Biol. 312: 927934.[CrossRef][Medline]
Jones, D.T. 1998. Do transmembrane protein superfolds exist? FEBS Lett. 423: 281285.[CrossRef][Medline]
Jones, D.T., Taylor, W.R., and Thornton, J.M. 1994. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochem. 33: 30383049.[CrossRef][Medline]
Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: Pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22: 25772637.[CrossRef][Medline]
Kessel, A. and Ben-Tal, N. 2002. Free energy determinants of peptide association with lipid bilayers. In Peptidelipid interactions (eds. S. Simon and T. McIntosh). Academic Press, San Diego, CA (in press).
Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 305: 567580.[CrossRef][Medline]
Kyte, J. and Doolittle, R.F. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105132.[CrossRef][Medline]
Lawson, E.Q., Sadler, A.J., Harmatz, D., Brandau, D.T., Micanovic, R., MacElroy, R.D., and Middaught, C.R. 1984. A simple experimental model for hydrophobic interactions in proteins. J. Biol. Chem. 259: 29102912.
Levitt, M. 1976. A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104: 59107.[CrossRef][Medline]
Liu, J. and Rost, B. 2001. Comparing function and structure between entire proteomes. Protein Sci. 10: 19701979.
. 2002. Target space for structural genomics revisited. Bioinformatics 18: 922933.
Möller, S., Kriventseva, E.V., and Apweiler, R. 2000. A collection of well characterised integral membrane proteins. Bioinformatics 16: 11591160.
Möller, S., Croning, D.R., and Apweiler, R. 2001. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17: 646653.
Monne, M. and von Heijne, G. 2001. Effects of `hydrophobic mismatch' on the location of transmembrane helices in the ER membrane. FEBS Lett. 496: 96100.[CrossRef][Medline]
Monne, M., Hermansson, M., and von Heijne, G. 1999a. A turn propensity scale for transmembrane helices. J. Mol. Biol. 288: 141145.[CrossRef][Medline]
Monne, M., Nilsson, I., Elofsson, A., and von Heijne, G. 1999b. Turns in transmembrane helices: Determination of the minimal length of a "helical hairpin" and derivation of a fine-grained turn propensity scale. J. Mol. Biol. 293: 807814.[CrossRef][Medline]
Nakashima, H., Nishikawa, K., and Ooi, T. 1990. Distinct character in hydrophobicity of amino acid composition of mitochondrial proteins. Proteins 8: 173178.[CrossRef][Medline]
Pasquier, C., Promponas, V.J., Palaios, G.A., Hamodrakas, J.S., and Hamodrakas, S.J. 1999. A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12: 381385.
Persson, B. and Argos, P. 1996. Topology prediction of membrane proteins. Protein Sci. 5: 363371.[Abstract]
Radzicka, A. and Wolfenden, R. 1988. Comparing the polarities of the amino acids: Side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution. Biochem. 27: 16641670.[CrossRef]
Roseman, M.A. 1988. Hydrophilicity of polar amino acid side-chains is markedly reduced by flanking peptide bonds. J. Mol. Biol. 200: 513522.[CrossRef][Medline]
Rost, B. 1996. PHD: Predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266: 525539.[CrossRef][Medline]
. 1999. Twilight zone of protein sequence alignments. Protein Eng. 12: 8594.
. 2001. Protein secondary structure prediction continues to rise. J. Struct. Biol. 134: 204218.[Medline]
Rost, B., Casadio, R., Fariselli, P., and Sander, C. 1995. Prediction of helical transmembrane segments at 95% accuracy. Protein Sci. 4: 521533.[Abstract]
Rost, B., Casadio, R., and Fariselli, P. 1996a. Refining neural network predictions for helical transmembrane proteins by dynamic programming. In Fourth International Conference on Intelligent Systems for Molecular Biology (eds. D. States), pp. 192200. AAAI Press, St. Louis, MO, Menlo Park, CA.
. 1996b. Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci. 5: 17041718.[Abstract]
Sayle, R.A. and Milner-White, E.J. 1995. RASMOL: Biomolecular graphics for all. Trends Biochem. Sci. 20: 37.
Sipos, L. and von Heijne, G. 1993. Predicting the topology of eukaryotic membrane proteins. Eur. J. Biochem. 213: 13331340.[Medline]
Sonnhammer, E.L.L., von Heijne, G., and Krogh, A., 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. In Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB98) (eds. J. Glasgow), pp. 175182. AAAI Press, Montreal, Canada.
Sweet, R.M. and Eisenberg, D. 1983. Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J. Mol. Biol. 171: 479488.[CrossRef][Medline]
Toyoshima, C., Nakasako, M., Nomura, H., and Ogawa, H. 2000. Crystal structure of the calcium pump of sarcoplasmic reticulum at 2.6 Ångstrøm resolution. Nature 405: 647.[CrossRef][Medline]
Tsukihara, T., Aoyama, H., Yamashita, E., Tomizaki, T., Yamaguchi, H., Shinzawa-Itoh, K., Nakashima, R., Yaono, R., and Yoshikawa, S. 1996. The whole structure of the 13-subunit oxidized cytochrome C oxidase at 2.8 Å. Science 272: 1136.[Abstract]
Tusnady, G.E. and Simon, I. 1998. Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283: 489506.[CrossRef][Medline]
. 2001. Topology of membrane proteins. J. Chem. Inf. Comput. Sci. 41: 364368.[CrossRef][Medline]
von Heijne, G. 1992. Membrane protein structure prediction. J. Mol. Biol. 225: 487494.[CrossRef][Medline]
. 1996. Prediction of transmembrane protein topology. In Protein structure prediction (ed. M.J. E. Sternberg), pp. 101110. Oxford Univ. Press., Oxford, UK.
von Heijne, G. and Blomberg, C. 1979. Trans-membrane translocation of proteins: The direct transfer model. Eur. J. Biochem. 97: 175181.[CrossRef][Medline]
Wallin, E. and von Heijne, G. 1998. Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 7: 10291038.[Abstract]
Wolfenden, R., Andersson, L., Cullis, P.M., and Southgate, C.C.B. 1981. Affinities of amino acid side chains for solvent water. Biochemistry 20: 849855.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
A. Kernytsky and B. Rost Static benchmarking of membrane helix predictions Nucleic Acids Res., July 1, 2003; 31(13): 3642 - 3644. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |