|
|
||||||||
School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK
Reprint requests to: David R. Westhead, School of Biochemistry and Molecular Biology, University of Leeds, Leeds, LS2 9JT, UK; e-mail: westhead{at}bmb.leeds.ac.uk; fax: 44-113-2333167.
(RECEIVED February 14, 2003; FINAL REVISION May 23, 2003; ACCEPTED May 28, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0306303.
| Abstract |
|---|
|
|
|---|
Keywords: Evolution; conservation; protein surface; protein interactions; bioinformatics; computational
| Introduction |
|---|
|
|
|---|
All the above studies were based on information from sequence alone with inhibitory residues drawn from studies such as site-directed mutagenesis. Without structural information about enzymeinhibitor complexes, it is impossible to know precisely which residues are involved in molecular recognition at the interface. Here, we extend the work of Valdar and Thornton (2001) on homodimers by providing a thorough and systematic sequencestructure study on how the pattern of conservation at the interface differs from the noninteracting surface in seven proteases and their inhibitors, all of which have been used to test docking algorithms in the past. We then discuss the relevance of the results to binding site prediction and their impact on attempts to solve the docking problem.
We were well aware that conservation at enzyme interfaces had already been thoroughly investigated; therefore, our main focus in this study was on the inhibitor interfaces. With this in mind, our priority in choosing the test cases was to represent as many different inhibitor families as possible regardless of the enzymes with which they were in complex. As a consequence, there was some redundancy between the enzyme test cases but seven different inhibitor families were represented including two previously uncharacterized interfaces of the leguminous Kunitz and the Streptomyces subtilisin inhibitor (SSI) families (Table 1
).
|
| Results |
|---|
|
|
|---|
|
|
| Discussion |
|---|
|
|
|---|
The most significant factor making binding site prediction easier for proteases than inhibitors is the lack of conserved vertices at the noninteracting surface of the protease. Even though the cystatin, Kunitz, and potato chymotrypsin I type inhibitors all follow the "rules" of binding site prediction to some extent, there is still a significant amount of conservation outside the interface, which makes identifying the conserved cluster of vertices at the actual interface difficult. This also supports the hypothesis that the residues in the interface are not wholly responsible for the specificity of the inhibitor, and there are some functionally important residues elsewhere (Pritchard and Dufton 1999).
The results described here have implications for the proteinprotein docking problem. Docking algorithms are often facilitated by first identifying the binding sites on the two interacting proteins assuming that the largest cluster of conserved residues occurs at the interface. Many test cases involve enzymeinhibitor complexes, but in almost all the inhibitors studied so far, binding site prediction would fail. When the inhibitor is small, locating the binding site on the enzyme only may suffice, but in instances where the inhibitor is large, binding site prediction for both the enzyme and inhibitor could be important in reducing computing time for the docking calculation.
| Materials and methods |
|---|
|
|
|---|
Blast search
For each test case, a BLAST search (Altschul et al. 1997) for close homologous sequences of both the enzyme and inhibitor sequences was carried out against the Swissprot v40.38 database. All query sequences were extracted from the ATOM records of the PDB file rather than from the SEQRES records. SEQRES records often differ from the ATOM records, so their use may have produced errors when mapping the evolution rates calculated at each amino acid position onto the protein structure.
Each BLAST hit was validated manually before being included in the multiple sequence alignment. Only hits with an e-value of less than 0.001 and containing the domain characteristic of the family to which the query sequence belonged were accepted. If a protein contained more than one such domain, then each domain was isolated and treated as separate sequence. Hits not experimentally proven to perform a similar function to the query, such as probable or hypothetical sequences, were removed, as were fragments and proteins containing ambiguities within the sequence.
Multiple sequence alignment
The homologous sequences from the BLAST search and the query sequence were written to a single file in FASTA format. This file was then used as input to CLUSTAL W, a program that will globally align all the protein sequences and output a multiple sequence alignment file (Thompson et al. 1994).
Calculating the rate of evolution
The rate of evolution was calculated for each residue in the protein sequence using the Rate4Site algorithm (Pupko et al. 2002) accessed via the Consurf server (Glaser et al. 2003). Rate4Site is an extension of the evolutionary trace method devised by Lichtarge et al. (1996) but utilizes an improved tree-building approach. For each multiple sequence alignment, Rate4Site builds an evolutionary tree and then calculates a conservation score for each residue position using either a maximum likelihood or maximum parsimony method. We used the maximum likelihood method. Each score is normalized, so that the average score for all residues is zero, and the standard deviation is 1. This means that the lowest scoring position is not always absolutely conserved, but is considered to be the most conserved residue in that particular protein. The scores are then divided into 4.5 equal intervals above and below zero thus producing nine levels of conservation. Level one contains residues undergoing the fastest rates of evolution in the protein, whereas level nine contains residues undergoing the slowest rates of evolution. For this study, residues were further classified as "variable" (levels 13), "intermediate" (46), or "conserved" (79).
Protein surface generation and interface definition
All the protein surfaces used in this study were solvent excluded surfaces (Connolly 1983) generated with a probe sphere of radius 1.5 Å using code developed by Sanner and Olson (1996). As well as the coordinates of the surface vertices, the program also supplied the solvent accessible surface area of each atom. An atom was defined as part of the interface if it loses more than 99% of its solvent-accessible surface area upon complex formation. Although values less than 99% did not make any significant difference to results, a strict threshold ensured that only the innermost interface atoms were selected. Any atom not allocated to the interface was deemed part of the "noninteracting" surface. Given that each surface atom corresponded to at least one surface vertex, if an atom was part of the interface, then all the surface vertices associated with that atom were assigned to the interface as well. Likewise, given that each surface residue was equivalent to many surface vertices (usually around 30), each surface vertex was labeled with the conservation class associated with its corresponding residue.
| Electronic supplemental material |
|---|
|
|
|---|
Diskette 2 contains the seven multiple sequence alignment files of the inhibitor test cases.
Diskette 1 filenames: 1avw_A.aln, 1cho_E.aln, 1stf_E.aln, 1tab_E.aln, 2ptc_E.aln, 2sic_E.aln, 2sni_E.aln, supp_tables.doc. Diskette 2 filenames: 1avw_B.aln, 1cho_I.aln, 1stf_I.aln, 1tab_I.aln, 2ptc_I.aln, 2sic_I.aln, 2sni_I.aln.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276280.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The protein data bank. Nucleic Acids Res. 28: 235242.
Beuning, L.L., Spriggs, T.W., and Christeller, J.T. 1994. Evolution of the proteinase inhibitor I family and apparent lack of hypervariability in the proteinase contact group. J. Mol. Evol. 39: 644654.[CrossRef][Medline]
Borriello, F. and Krauter, K.S. 1991. Multiple murine
1-protease inhibitor genes show unusual evolutionary divergence. Proc. Natl. Acad. Sci. 88: 94179421.
Chen, R., Mintseris, J., Janin, J., and Weng, Z. 2003. A proteinprotein docking benchmark. Proteins 52: 8891.[CrossRef][Medline]
Connolly, M.L. 1983. Analytical molecular surface calculation. J. Appl. Crystallogr. 16: 548558.[CrossRef]
Creighton, T.E. and Charles, I.G. 1987. Biosynthesis, processing, and evolution of bovine pancreatic trypsin inhibitor. Cold Spring Harbor Symposium on Quantitative Biology. LII: 511519.
Creighton, T.E. and Darby, N.J. 1989. Functional evolutionary divergence of proteolytic enzymes and their inhibitors. Trends Biosci. 14: 319324.
Fujinaga, M., Sielecki, A.R., Read, R.J., Ardelt, W., Laskowski Jr., M., and James, M.N. 1987. Crystal and molecular structures of the complex of
-chymotrypsin with its inhibitor turkey ovomucoid third domain at 1.8 Å resolution. J. Mol. Biol. 195: 397418.[CrossRef][Medline]
Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E., and Ben-Tal, N. 2003. Consurf: Identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19: 163164.
Goodwin, R.L., Baumann, H., and Berger, F.G. 1996. Patterns of divergence during evolution of
1-proteinase inhibitors in mammals. Mol. Biol. Evol. 13: 346358.[Abstract]
Halperin, I., Ma, B., Wolfson, H., and Nussinov, R. 2002. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47: 409443.[CrossRef][Medline]
Hilder, V.A., Barker, R.F., Samour, R.A., Gatehouse, A.M.R., Gatehouse, J.A., and Boulter, D. 1989. Protein and cDNA sequences of Bowman-Birk protease inhibitors from the cowpea (Vigna unguiculata Walp.). Plant Mol. Biol. 13: 701710.[CrossRef][Medline]
Hill, R.E. and Hastie, N.D. 1987. Accelerated evolution in the reactive centre regions of serine protease inhibitors. Nature 326: 9699.[CrossRef][Medline]
Inglis, J.D. and Hill, R.E. 1991. The murine Spi-2 proteinase inhibitor locus: A multigene family with a hypervariable reactive site domain. EMBO J. 10: 255261.[Medline]
Laskowski, M., Kato, I., Ardelt, W., Cook, J., Denton, A., Empie, M.W., Kohr, W.J., Park, S.J., Parks, K., and Schatzley, B.L. 1987. Ovomucoid third from 100 avian species: Isolation, sequences, and hypervariability of enzyme-inhibitor contact residues. Biochemistry 26: 202221.[CrossRef][Medline]
Laskowski Jr., M. and Kato, I. 1980. Protein inhibitors of proteinases. Annu. Rev. Biochem. 49: 593626.[CrossRef][Medline]
Laskowski Jr., M., Kato, I., Kohr, W.J., Park, S.J., Tashiro, M., and Whatley, H.E. 1987. Positive Darwinian selection in evolution of protein inhibitors of serine proteases. Cold Spring Harb. Symp. Quant. Biol. 52: 545553.[Medline]
Lichtarge, O. and Sowa, M.E. 2002. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12: 2127.[CrossRef][Medline]
Lichtarge, O., Bourne, H.R., and Cohen, F.E. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257: 342358.[CrossRef][Medline]
Marquart, M., Walter, J., Deisenhofer, J., Bode, W., and Huber, R. 1983. The geometry of the reactive site and of the peptide groups in trypsin, trypsinogen and its complexes with inhibitors. Acta Crystallogr. Sect. B 39: 480490.[CrossRef]
McPhalen, C.A. and James, M.N. 1988. Structural comparison of two serine proteinaseprotein inhibitor complexes: Eglin-c-subtilisin Carlsberg and CI-2-subtilisin Novo. Biochemistry 27: 65826598.[CrossRef][Medline]
Pritchard, L. and Dufton, M.J. 1999. Evolutionary trace analysis of the Kunitz/BPTI family of proteins: Functional divergence may have been based on conformational adjustment. J. Mol. Biol. 285: 15891607.[CrossRef][Medline]
Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., and Ben-Tal, N. 2002. Rate4Site: An algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18 (Suppl 1): S71S77.[Abstract]
Rheaume, C., Goodwin, R.L., Latimer, J.J., Baumann, H., and Berger, F.G. 1994. Evolution of murine
1-proteinase inhibitors: Gene amplification and reactive centre divergence. J. Mol. Evol. 38: 121131.[CrossRef][Medline]
Rypniewski, W.R., Perrakis, A., Vorgias, C.E., and Wilson, K.S. 1994. Evolutionary divergence and conservation of trypsin. Protein Eng. 7: 5764.
Sanner, M.F. and Olson, A.J. 1996. Reduced surface: An efficient way to compute molecular surfaces. Biopolymers 38: 305320.[CrossRef][Medline]
Song, H.K. and Suh, S.W. 1998. Kunitz-type soybean trypsin inhibitor revisited: Refined structure of its complex with porcine trypsin reveals an insight into the interaction between a homologous inhibitor from Erythrina caffra and tissue-type plasminogen activator. J. Mol. Biol. 275: 347363.[CrossRef][Medline]
Stubbs, M.T., Laber, B., Bode, W., Huber, R., Jerala, R., Lenarcic, B., and Turk, V. 1990. The refined 2.4Å X-ray crystal structure of recombinant human stefin B in complex with the cysteine proteinase papain: A novel type of proteinase inhibitor interaction. EMBO J. 9: 19391947.[Medline]
Takeuchi, Y., Satow, Y., Nakamura, K.T., and Mitsui, Y. 1991. Refined crystal structure of the complex of subtilisin BPN' and Streptomyces subtilisin inhibitor at 1.8 Å resolution. J. Mol. Biol. 221: 309325.[Medline]
Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 46734680.
Tsunogae, Y., Tanaka, I., Yamane, T., Kikkawa, J., Ashida, T., Ishikawa, C., Watanabe, K., Nakamura, S., and Takahashi, K. 1986. Structure of the trypsin-binding domain of Bowman-Birk type protease inhibitor and its interaction with trypsin. J. Biochem. (Tokyo) 100: 16371646.
Valdar, W.S.J. and Thornton, J.M. 2001. Proteinprotein interfaces: Analysis of amino acid conservation in homodimers. Proteins 42: 108124.[CrossRef][Medline]
Zang, X. and Maizels, R.M. 2001. Serine proteinase inhibitors from nematodes and the arms race between host and pathogen. Trends Biosci. 26: 191197.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
S. Liang, C. Zhang, S. Liu, and Y. Zhou Protein binding site prediction using an empirical scoring function Nucleic Acids Res., August 7, 2006; 34(13): 3698 - 3707. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Burgoyne and R. M. Jackson Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces Bioinformatics, June 1, 2006; 22(11): 1335 - 1342. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Bradford and D. R. Westhead Improved prediction of protein-protein binding sites using a support vector machines approach Bioinformatics, April 15, 2005; 21(8): 1487 - 1494. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Lei and Y. Duan Incorporating intermolecular distance into protein-protein docking Protein Eng. Des. Sel., December 1, 2004; 17(12): 837 - 845. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |