|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Bioinformatics Research Unit, Research and Development Division, Fujirebio, Inc., Hachioji-shi, Tokyo 192-0031, Japan
2 Basic Research Program, SAIC-Frederick, Inc., Center for Cancer Research Nanobiology Program, National Cancer Institute, Frederick, Maryland 21702, USA
3 Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Tel Aviv University, Tel Aviv 69978, Israel
(RECEIVED March 28, 2006; FINAL REVISION May 19, 2006; ACCEPTED May 24, 2006)
| Abstract |
|---|
|
|
|---|
Keywords: network; closeness centrality; characteristic path length; conserved central positions; active sites
| Introduction |
|---|
|
|
|---|
The identification of functionally important residues in proteins remains a difficult task. Different methods of combining sequence evolutionary considerations with structural information have been proposed and have successfully predicted active sites in various proteins (Lichtarge et al. 1996; Aloy et al. 2001; Landgraf et al. 2001; Ondrechen et al. 2001; del Sol Mesa et al. 2003). However, new approaches to predict active site residues are still needed, especially for those proteins with known structure and with nonexisting sequence homologs in the databases. Considering a link between protein fold and function, it is interesting to continue exploring the functional information embodied in the topology of the structure.
Recently, Amitai et al. (2004) showed that for a large set of enzymes, active site residues have high centrality values. This result suggested that residue centrality, as an inherent characteristic of the fold, may be evolutionarily maintained to guarantee protein function. Indeed, experimental studies have shown that mutations of most of the residues have little functional effect, while perturbation of a few residues, which are probably centrally located in the interaction network, impairs protein function (Terwillinger et al. 1994; Reddy et al. 1998; Taverna and Goldstein 2002). This led to two questions. First, how good is the performance of residue centrality in identifying active/binding site amino acids in different non-enzyme protein families? Second, why is the origin of the correlation between residue centrality and amino acids important for the protein function? Here, we aimed to elucidate these questions considering residue centrality as a fold characteristic conserved in protein families. Our analysis relied on structural alignments of a set of 46 protein families, including enzyme and non-enzyme families. For each family, we sought to identify aligned positions, which are central in the structures of most family members (below, these residues are termed "centrally conserved positions"). These centrally conserved positions showed significant correlation with active site residues. This correlation was particularly strong for enzyme families. Different types of functional annotations were analyzed, showing that consistently, the catalytic site residues were the best correlated. On the other hand, as expected, other binding sites with flatter shapes are not correlated as well. This could imply that the performance of residue centrality in identifying active site amino acids in enzymes relates to the geometry of the active site clefts. Enzyme clefts have already been shown in a number of studies to constitute the largest holes on the protein surface (Laskowski et al. 1996). Indeed, a detailed analysis of the location of the centrally conserved residues indicated that most tend to be clustered with functionally important amino acids situated in protein surface clefts or cavities. However, closeness centrality, as a global topological characteristic, identifies those cavities or clefts containing residues important for the protein function. This fact suggests that central residues are likely to fulfill important roles in networks' communication. The study of the PDZ domain and the HIV-1 protease families shows that some centrally conserved residues are found to be key amino acids for allosteric communications. A detailed analysis revealed that residue centrality is more conserved than sequence in protein families, highlighting the robustness of protein structures.
| Results |
|---|
|
|
|---|
2.0) in at least 70% of the structures of the family members (Fig. 2). These cutoffs were established to guarantee a significant number of predicted centrally conserved residues (Supplemental Material, Fig. 2), allowing certain flexibility in the analysis when dealing with low-resolution or possible inaccuracies in the structures.
|
|
|
|
-lactamase (Fig. 5). Thus, closeness centrality, as a global topological characteristic, provides more information than just a local analysis of protein cavities and clefts. Indeed, among all protein surface cavities, high closeness residues tend to be clustered around those cavities containing functionally important residues. This finding suggests that protein topology is closely related to the transmission of the information from high-centrality residues to the rest of the protein. A detailed study of the PDZ domain family identified two centrally conserved residues (Leu379 and Phe325) in contact with each other and forming the ligand-binding site (Fig. 6A). Residue Phe325 has been experimentally reported as a key amino acid for maintaining the allosteric communications between the two distantly coupled sites (Lockless and Ranganathan 1999; Ota and Agard 2005). The HIV-1 protease (Fig. 6B) is another illustrative example of the key role of central residues in long-range interactions (del Sol et al. 2006). Our network analysis identified five centrally conserved residues: Leu23, Asp25, Thr26, Ile85, and Arg87. Three of these amino acids are part of the active site: Leu23, Asp25, and Thr26 (Perryman et al. 2004). Ile85 is in contact with three important active site residues: Leu23, Asp25, and Ile84. On the other hand, Ile85 interacts with the nonactive site residues Leu24, Val64, Leu90, and Ile93, whose substitutions were reported to confer drug resistance on the HIV-1 protease (Olsen et al. 1999). Arg87 also interacts with Asp25 and Leu90. Thus, our results suggest that central amino acids are important for the interconnections between all residues in the structure. Interestingly, approximately only half of the centrally conserved residues of all families (54%) are conserved in sequence (Supplemental Material, Table 4). This fact indicates that residue centrality, as a fold topological characteristic, is more conserved than its sequence position, reflecting the robustness of protein structures.
|
|
|
|
| Discussion |
|---|
|
|
|---|
To pursue this goal, we compiled a set of 46 protein families, including a wide variety of biological cases. Based on the family structural alignments and the closeness parameter as a measure of residue centrality, we determined the central positions associated with family folds (centrally conserved positions). A total of 80% of the centrally conserved positions in all of the analyzed families were located in active sites. These predictions were significantly better for enzymes (91%) than for non-enzyme families (48%). A more detailed analysis revealed that centrally conserved positions were much better predictors for catalytic site residues and residues binding hetero-atoms than for proteinprotein binding sites. We attribute these findings to the geometrical differences of the functional sites. The shapes of the binding sites are different. Active sites in enzymes are often characterized by large clefts, and hetero-atom binding residues are located in cavities on the protein's surface. On the other hand, proteinprotein interaction interfaces exhibit a range of shapes, depending on the biological case. Usually, the antibody presents a large cleft for antigen binding, while homodimers tend to have planar interfaces (Laskowski et al. 1996; Valdar and Thornton 2001; Ma et al. 2003). Indeed, our results show that many centrally conserved amino acids are clustered with active site residues in cavities or clefts. Nevertheless, we note that unlike local geometrical considerations on the protein surface, closeness centrality is a global topological characteristic reflecting the effect of all protein residues on single amino acids. It identifies those cavities or clefts comprised of functionally important residues. Thus, centrally conserved residues are assumed to integrate and propagate the information to the rest of the protein. The examples of the PDZ domain and HIV-1 protease illustrate the key role of centrally conserved amino acids in the long-range communications. Residue centrality, as a topological characteristic of the protein fold, is more conserved than the sequence in protein families. This manifests the robustness of protein structures.
| Materials and methods |
|---|
|
|
|---|
Protein sequence analysis
The study of the protein sequence conservation was carried out using the ConSurf server (Glaser et al. 2003).
Graph representation of protein structures
Each protein structure was modeled as an undirected graph, where amino acid residues corresponded to vertices, and contacts between them were represented as edges. Residues i and j were considered to be in contact if at least one atom corresponding to residue i was at a distance of
5.0 Å to an atom from residue j.
The closeness centrality value Ck for residue k is defined as
|
|
where d(i,k) is the shortest path distance between residues i and k, and n is the total number of residues.
Statistical Analysis
The statistically significant central residues were evaluated using the z-score values of the residue closeness centrality, defined as
|
|
where Ck is the closeness centrality of residue k,
is the closeness centrality average value over all protein residues, and
is the corresponding standard deviation.
The sensitivity and specificity of our method were defined as
|
|
|
|
where TP and TN are the number of true positives and true negatives, respectively. Npred is the total number of predicted centrally conserved residues, and Nres is the total number of residues. These variables were calculated based on all the protein families.
| Footnotes |
|---|
Reprint requests to: Antonio del Sol, Bioinformatics Research Unit, Research and Development Division, Fujirebio, Inc., 51 Komiya cho, Hachioji-shi, Tokyo 192-0031, Japan; e-mail: ao-mesa@fujirebio.co.jp; fax: 81-426-46-8325.
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.062249106.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Amitai G., Shemesh A., Sitbon E., Shklar M., Netanely D., Venger I., Pietrokovski S. 2004. Network analysis of protein structures identifies functional residues. J. Mol. Biol. 344: 11351146.[CrossRef][Medline]
Atilgan A.R., Akan P., Baysal C., Vendruscolo M., Dokholyan N.V., Paci E., Karplus M. 2004. Small-world communication of residues and significance for protein dynamics. Small-world view of the amino acids that play a key role in protein folding. Biophys. J. 86: 8591.
Bader G.D., Donaldson I., Wolting C., Ouellette B.F., Pawson T., Hogue C.W. 2001. BINDThe biomolecular interaction network database. Nucleic Acids Res. 29: 242245.
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. 2000. The protein databank. Nucleic Acids Res. 28: 235242.
del Sol A. and O'Meara P. 2004. Small-world network approach to identify key residues in proteinprotein interaction. Proteins 58: 672682.
del Sol A., Fujihashi H., O'Meara P. 2005. Topology of small-world networks of protein-protein complex structures. Bioinformatics 21: 13111315.
http://www.nature.com/msb/journal/v2/n1/synopsis/msb4100063.htmldel Sol A., Fujihashi H., Amoros D., Nussinov R. 2006. Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol. Syst. Biol. 2:.
del Sol Mesa A., Pazos F., Valencia A. 2003. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326: 12891302.[CrossRef][Medline]
Dokholyan N.V., Li L., Ding F., Shakhnovich E.I. 2002. Topological determinants of protein fold. Proc. Natl. Acad. Sci. 99: 86378641.
Glaser F., Pupko T., Paz I., Bell R.E., Bechor-Shental D., Martz E., Ben-Tal N. 2003. ConSurf: Identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19: 163164.
Glaser F., Morris R.J., Najmanovich R.J., Laskowski R.A., Thornton J.M. 2006. A method for localizing ligand binding pockets in protein structures. Proteins 62: 479488.[CrossRef][Medline]
Greene L. and Higman V. 2003. Uncovering network systems within protein structures. J. Mol. Biol. 334: 781791.[CrossRef][Medline]
Jeong H., Mason S.P., Barabasi A.L., Oltvai Z.N. 2001. Lethality and centrality in protein networks. Nature 411: 4142.[CrossRef][Medline]
Landgraf R., Xenarios I., Eisenberg D. 2001. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J. Mol. Biol. 307: 14871502.[CrossRef][Medline]
Laskowski R.A., Luscombe N.M., Swindells M.B., Thornton J.M. 1996. Protein clefts in molecular recognition and function. Protein Sci. 5: 24382452.[Abstract]
Laskowski R.A., Hutchinson E.G., Michie A.D., Wallace A.C., Jones M.L., Thornton J.M. 1997. PDBsum: A Web-based database of summaries and analyses of all PDB structures. Trends Biochem. Sci. 22: 488490.[CrossRef][Medline]
Lichtarge O., Bourne H.R., Cohen F.E. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257: 342358.[CrossRef][Medline]
Lockless S.W. and Ranganathan R. 1999. Evolutionary conserved pathways of energetic connectivity in protein families. Science 286: 295299.
Ma B., Elkayam T., Wolfson H., Nussinov R. 2003. Protein-protein interactions: Structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. 100: 57725777.
Murzin A.G., Brenner S.E., Hubbard T., Chothia C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Olsen D.B., Stahlhut M.W., Rutkowski C.A., Schock H.B., vanOlden A.L., Kuo L.C. 1999. Non-active site changes elicit broad-based cross-resistance of the HIV-1 protease to inhibitors. J. Biol. Chem. 274: 2369923701.
Ondrechen M.J., Clifton J.G., Ringe D. 2001. THEMATICS: A simple computational predictor of enzyme function from structure. Proc. Natl. Acad. Sci. 98: 1247312478.
O'Sullivan O., Suhre K., Abergel C., Higgins D.G., Notredame C. 2004. 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. J. Mol. Biol. 340: 385395.[CrossRef][Medline]
Ota N. and Agard D.A. 2005. Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. J. Mol. Biol. 351: 345354.[CrossRef][Medline]
Perryman A.L., Lin J.-H., McCammon J.A. 2004. HIV-1 protease molecular dynamics of a wild-type and of the V82F/I84V mutant: Possible contribution to drug resistance and a potential new target site for drugs. Protein Sci. 13: 11081123.
Reddy B.V.B., Datta S., Tiguari S. 1998. Use of propensities of amino acids to the local structure environment to understand effect of substitution mutations on protein stability. Protein Eng. 11: 11371145.
Taverna D.M. and Goldstein R.A. 2002. Why are proteins so robust to site mutations? J. Mol. Biol. 315: 479484.[CrossRef][Medline]
Terwillinger T.C., Zabin H.B., Horvath M.P., Sandberg W.S., Schlunk P.M. 1994. In vivo characterization of mutants of the bacteriophage f1 gene V protein isolated by saturation mutagenesis. J. Mol. Biol. 236: 556571.[CrossRef][Medline]
Torrance J.M., Bartlett G.J., Porter C.T., Thornton J.M. 2005. Using a library of structural templates to recognize catalytic sites and explore their evolution in homologous families. J. Mol. Biol. 347: 565581.[CrossRef][Medline]
Valdar W.S.J. and Thornton J.M. 2001. Proteinprotein interfaces: Analysis of amino acid conservation in homodimers. Proteins 44: 150165.[CrossRef][Medline]
Vendruscolo M., Dokholyan N.V., Paci E., Karplus M. 2002. Small-world view of the amino acids that play a key role in protein fold. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 65: 06191010619104.
Watts D.J. and Strogatz S.H. 1998. Collective dynamics of small-world networks. Nature 393: 440442.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
H. David-Eden and Y. Mandel-Gutfreund Revealing unique properties of the ribosome using a network based analysis Nucleic Acids Res., July 14, 2008; (2008) gkn433v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-R. Tang, Z.-Y. Sheng, Y.-Z. Chen, and Z. Zhang An improved prediction of catalytic residues in enzyme structures Protein Eng. Des. Sel., May 1, 2008; 21(5): 295 - 302. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Atilgan, D. Turgut, and C. Atilgan Screened Nonbonded Interactions in Native Proteins Manipulate Optimal Paths for Robust Residue Communication Biophys. J., May 1, 2007; 92(9): 3052 - 3062. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Nicola and I. A. Vakser A simple shape characteristic of protein protein recognition Bioinformatics, April 1, 2007; 23(7): 789 - 792. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |