|
|
||||||||
1 Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, 630-0101, Japan
2 Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
3 Graduate School of Integrated Science, Yokohama City University, Yokohama 230-0045, Japan
4 Structure and Function of Biomolecules, PRESTO, Japan Science and Technology Corporation, Kawaguchi, Saitama 332-0012, Japan
5 Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Kizu, Souraku, Kyoto, 619-0215, Japan
Reprint requests to: Takaaki Nishioka, Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan; e-mail: nishioka{at}scl.kyoto-u.ac.jp; fax: 81-75-753-6408.
(RECEIVED May 7, 2003; FINAL REVISION July 8, 2003; ACCEPTED July 8, 2003)
Supplemental material: See www.proteinscience.org
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0383603.
| Abstract |
|---|
|
|
|---|
Keywords: ATP-binding domains; kinship relations of global folds; purine biosynthesis; structure/function relationship
| Introduction |
|---|
|
|
|---|
The validity of the this picture can be attested if clear clusters of superfamilies of proteins with respective common core functions can be identified in a fold space of proteins with a certain measure of fold similarity. Each such cluster, if identified, should deserve to be called as a fold-based superfamily. The aim of this article is to show that this is actually possible with an objective procedure.
This aim was achieved by first constructing a network of kinship relations of global folds of protein domains. To limit the scope of the study, we focused on proteins with functions involving ATP or its analogs. This choice of proteins was made because of their functional importance and abundance of three-dimensional structures. We mapped all of the known ATP-binding domains on the constructed network. If the mapped domains formed mutually disjointed clear clusters, each with a similar core chemical function, then such clusters are examples of fold-based superfamilies. The fold-based superfamilies thus identified in this article will lead us to some interesting new findings.
| Results and Discussion |
|---|
|
|
|---|
-helix layers. However, we disregarded
-helices from our consideration in this article.
|
-helix arrangements were collected into one group (Supplementary Table 1
|
|
Then, we introduce the kinship relations among the ß-topology groups, based on the assumption described above. A pair of ß-topology groups, related by the deletion or addition of one ß-strand at the edge of a ß-sheet, is defined as having the first degree of kinship, kinship 1 (Fig. 1B
). We note here that a kinship relation between a pair of ß-topology groups can, in some cases, stand for two or more different evolutionary lineages.
When the kinship 1 relationship was introduced into the 428 ß-topology groups, the majority of them were connected into one large network. The remaining ß-topology groups were found either as an isolated orphans or as members of small networks with up to three members. However, a large fraction of these orphans and small networks can be connected to the one large network via a ß-topology group that is not a member of the 428 ß-topology groups. There are 101 such ß-topology groups that can act as a bridge, called bridging ß-topology groups. We added these 101 ß-topology groups to the 428 ß-topology groups in the following analyses. The resulting 529 ß-topology groups now belonged to one major network, except for 36 orphans and seven groups in three isolated small networks (Table 1
; Supplementary Table 2). Even though the majority of the ß-topology groups were connected into one main network, this does not necessarily mean that they are all mutually evolutionarily related. This is because a kinship relation can stand for two or more different evolutionary lineages. In fact, later in this article, we will describe how the major network can be decomposed into
20 disjointed clusters, by an analysis that dissects different evolutionary lineages.
The 36 orphans discussed above are those lacking any intermediates connecting them to the network. This lack of an intermediate group might be due to the limited number of known structures. The occurrence of orphans is rare in ß-topology groups of small ß-sheets; they comprise only 3.5% (13 of 376) among the ß-topology groups of two- to eight-stranded ß-sheets (Table 1
). This rarity supports the validity of the assumption that the addition or deletion of one ß-strand at the existing ß-sheet is the major elementary event that took place during evolution. The lack of intermediates is either due to a paucity of known structures or due to a consequence of minor events during evolution, such as domain swapping (Fukami-Kobayashi et al. 1999). In contrast, orphans are found among almost half (44% = 23 of 52) of the ß-topology groups of 9- to 18-stranded ß-sheets. Some of these orphans may have evolved by the fusion of two smaller ß-sheets rather than by the stepwise addition of a ß-strand (Table 2
).
Basic features of the network of global fold kinship relations
On the network, parallel and antiparallel ß-topology groups are not separated from each other but are connected via mixed type ß-topology groups (Fig. 2
); for example, the P12 ß-topology group is connected to the A12 ß-topology group by six mixed-type, three-stranded ß-topology groups such as M123.
|
About 44% (188 of 428) of the ß-topology groups are connected to the network by a single kinship relation, whereas the rest are multiply connected by more than one kinship relation.
Mapping ATP-binding domains on the network
The next step is to analyze the distribution of the ß-topology groups of the ATP-binding domains on this network. For this purpose, we systematically collected as many three-dimensional structures of ATP-binding proteins as possible, by including those within the CATH nonhomologous domain set. According to the LIGAND chemical database for enzymatic reactions in KEGG (Goto et al. 2000), 406 enzymes use ATP as a substrate or an effector. For 75 among these enzymes, crystal structures have been registered in the Protein Data Bank (PDB, April 2000; Berman et al. 2000). Their ATP-binding domains belong to 32 different Structural Classification of Proteins (SCOP) superfamilies (Hubbard et al. 1997). To enrich the analysis of the distribution, we extended the data from the 75 enzymes to 182 proteins belonging to the 32 superfamilies (see Materials and Methods). In this process of enrichment, some GTP-binding domains were also included. Finally, we obtained 93 ß-topology groups from the PDB structures of 285 ATP-binding (and exceptionally some GTP-binding) domains (Table 3
). Some of the SCOP domains consist of two substructures, which are each defined as a domain in our treatment, for example, the superfamily "Phosphofructokinase". Some of the structures were determined after the collection of the CATH data set. For this reason, their ß-topology groups might not be found in the network.
|
|
Chemical reactions catalyzed by ATP-binding domains
To study the relationship between the global folds and the molecular functions, we introduce five coarse-grained chemical categories of ATP-binding proteins (given in Table 4
). The reactions are first classified into two types: phosphoryl-transfer reactions on the
-phosphorus atom, and nucleotidyl-transfer reactions on the
-phosphorus atom. Both of the reactions are further classified into chemical energy-retaining and -releasing reactions. Chemical energy-retaining reactions produce high-energy compounds, such as acyl phosphates, as a product or reaction intermediate, whereas chemical energy-releasing ones proceed by ATP hydrolysis or produce low-energy phosphate compounds (Walsh 1979; Voet et al. 1999; Berg et al. 2001). We included a fifth category, "other," for the various reactions that occur on atoms other than the
- and
-phosphorus atoms and for ATP binding as an effector.
|
This definition is based on the assumption that no pair of proteins of the same evolutionary origin catalyzes reactions belonging to different categories. Although we think that this is a reasonable assumption, it remains as an assumption in this article and needs to be critically examined in the future. Todd et al. (2001) found that a few number of proteins changed their reaction chemistry during evolution. However, no ATP-binding domains collected in the present study changed the chemical category during evolution. Chemical category is definable at different level of granularity. Reaction chemistry in the study by Todd et al. (2001) was defined at the level of EC-number classification, but core chemical function in our present study is more roughly defined at the level of superfamily classification.
The fold-based superfamilies thus obtained depend on the used cutoff kinship distance. When we tried to form clusters with different cutoff kinship distances, from three to six, a few different clusters appeared, depending on the cutoff kinship distances. Almost the same picture for the evolution of ATP-binding domains emerged. In the following, we describe with the 29 fold-based superfamilies (Table 4
), corresponding to the case with the cutoff kinship distance 4. Each fold-based superfamily often includes more than one SCOP superfamily. Two of the 29 fold-based superfamilies are orphans to the network.
Limitations of fold-based superfamilies
Some pairs of fold-based superfamilies have similar chemical functions, although they are separated by long kinship distances. For instance, the fold-based superfamilies "nucleotidyl polymerase" and "mononucleotidyl transferase" are functionally similar, but they cannot be merged by small kinship distances. Such a united cluster must involve the two-stranded ß-topology group A12 (Fig. 2
). However, this is impossible, because the ATP-binding domains invariably contain a ß-sheet consisting of at least four ß-strands. There are at least 10 such clear cases of separation, including the pair of class I aminoacyl-tRNA synthetase and class II aminoacyl-tRNA synthetase fold-based superfamilies (Fig. 3A,B
).
We should reexamine the identification process for fold-based superfamilies. The categorization was motivated by the need to dissect the complicated network of global fold kinship relations. If we need a more refined fold space than the network used in this work (which would probably also involve information contained in conventional fold representations based on atom positions), then we would be able to identify the fold-based superfamilies as mutually disjointed clear clusters, without resorting to a classification of functions. If separation into two clusters is really impossible in any fold space, then it means that the two should in fact be regarded as defining one cluster and, therefore, exist in one fold-based superfamily with reactions of both categories. Our research has not yet attained this horizon. At the level of using the coarse-grained fold space of the network of kinship relations, the categorization of reactions is a very powerful method to complement its coarse-grained nature.
New findings in the fold-based superfamilies
The glutathione synthetase fold-based superfamily 2 in Table 4
, contains two SCOP single-domain superfamilies (S14 and S15) and three domains that are one CATH domain of the SCOP double-domain superfamilies S12Domain2, S25F1Domain2, and S31Domain2. Each of these SCOP superfamilies is characterized by a reaction that proceeds via a carboxyphosphate or phosphohistidine intermediate, by an energy-retaining phosphoryl-transfer mechanism. One isolated domain, S25F1Domain1, and the histidine kinase fold-based superfamily 7 are in the same chemical reaction category and within a cutoff distance of four from each other. However, these two are treated as independent fold-based superfamilies, because ATP binds to the ß-sheet differently.
The finding that domains related to the five SCOP superfamilies are in fact members of one fold-based superfamily leads to new insight into the evolution of enzymes in the de novo purine biosynthesis. Purine is synthesized in microorganisms by a pathway involving 10 successive enzymes (Buchanan 1973): PurF, PurD, PurT, PurL, PurM, PurK, PurE, PurC, PurB, and PurH, in which the six underlined enzymes use ATP as a substrate. The three-dimensional structures have been determined for all of these ATP-using enzymes, except PurL. A PSI-BLAST sequence analysis of PurL revealed PurM as its sequence homolog. The structures of PurD (Wang et al. 1998), PurK (Thoden et al. 1999), PurT (Thoden et al. 2000), PurC (Levdikov et al. 1998), and PurM (and by inference, also PurL; Li et al. 1999) are distributed in three different SCOP double-domain superfamilies (S12, S25F1, and S31 in Table 3
). The ATP-binding cleft in each enzyme is formed by the two domains. One domain in each of these SCOP double-domain superfamilies was found in our analysis to belong to one common glutathione synthetase fold-based superfamily. The ß-strands in these common domains are shown in red in Figure 4
. The fact that these domains have a common evolutionary origin implies that an ancestral form of this domain alone was capable of catalyzing the chemical reaction of the same category. In fact, a single-domain enzyme, glutamine synthetase, which belongs to the SCOP single-domain superfamily S14 in Table 3
and catalyzes the chemical reaction of the same category (Gill and Eisenberg 2001), is a member of our glutathione synthetase fold-based superfamily. Our present analysis indicates that all the six of the ATP-using enzymes in de novo purine biosynthesis have evolved from an ancestral glutamine synthetase as the common evolutionary origin, by combining with another domain to improve their catalytic specificities.
|
The green-colored fold-based superfamily 19 in Table 4
, nucleotidyl polymerase fold-based superfamily, includes two SCOP single-domain superfamilies, S20 and S21 in Table 3
, and one domain, S13Domain2 in Table 3
, from one SCOP double-domain superfamily. This fold-based superfamily includes all of the enzymes with known structures involved in nucleotidyl polymerization, transcription and capping of polynucleotides, and cyclization of mononucleotides. DNA polymerase ß (S19 in Table 3
), which is incorrectly named as a polymerase but is actually a repair enzyme (Sawaya et al. 1997; Arndt et al 2001), is clustered into a separate fold-based superfamily.
Phosphorylation of sugars is catalyzed by kinases that have been classified into four SCOP superfamilies (S1F3, F5, F12, F13; S4F1; S6Domain1; S8F1 in Table 3
). The domains in these superfamilies are now found to belong to two different fold-based superfamilies, superfamilies 8 and 9 in Table 4
. These superfamilies are treated here as separate families, because of the different topological positions of the binding sites.
SCOP superfamily P-loopcontaining nucleotide triphosphate hydrolases (S1 in Table 3
) now appears to be a mixture of six evolutionarily unrelated proteins: adenylate kinase 1, phosphosugar hydroxykinase I 8, GTPase I 10, GTPase II 12, myosin 11, and PAPS sulfotransferase 22 fold-based superfamilies (Table 4
). On the other hand, the adenylate kinase fold-based superfamily 1 in Table 4
includes S1F1, S1F6, S1F9, and S30 in Table 3
, among which S30 was somehow not assigned to be a member of S1 in SCOP, even though they share the P-loop motif (Saraste et al. 1990; Bertrand et al. 1997).
Some of the interesting new findings are described above. Descriptions of the 29-foldbased superfamilies are provided given in Table 4
, which contains numerous new findings not described above.
Implications for the evolution of proteins
An important aspect of understanding biological systems at the molecular level is not only to know the structure and function of enzymes but also to be able to relate enzymes to each other, from an evolutionary viewpoint. In this work, we found that there are ~
20 independently invented ATP-binding protein families, each with its specific core chemical reaction. During the course of evolution, each fold-based superfamily diverged somewhat in both the fold-space and the function-space, as summarized in Figure 3
and Table 4
. Even during the course of divergence, the specific core chemical reaction was maintained. Because the use of ATP should always be necessary for the core chemical reaction, it leads us to realize that the mechanisms of ATP-binding and ATP-usage in the core chemical reaction are both created at the conception of a fold-based superfamily.
By closely examining the process of divergence, we should be able to establish a phylogeny of proteins in each fold-based superfamily. Through such a process, we could reexamine the currently popular, but often very confusing, concepts of family and superfamily.
| Materials and methods |
|---|
|
|
|---|
-ß classes. We selected 1011 domains by removing those that were low in resolution or determined by NMR. The topology of each ß-sheet assigned by TOPS (Flores et al. 1994) was examined by inspecting the structures on a graphic display. Topologies of barrels and sandwich layers were assigned by assuming that the corresponding flat ß-sheet is first formed and then bent. When one ß-sheet is composed of strands from two or more different chains, we separated it into ß-sheets that are composed of the strands from the same peptide and assigned the topology for each sheet. Among the 1011 domains, 77% consisted of two or more ß-sheets. When the largest ß-sheet in each domain differed from the remaining ones by only one ß-strand, the same domain was analyzed multiple times from the viewpoints of the individual ß-sheets. Because of this treatment, our set of 1011 domains now appears to consist of 1444 domains (Supplementary Table 1). The 32 SCOP superfamilies contain 83 SCOP families and 182 proteins (Supplementary Table 3). In the cases in which a protein had two or more PDB structures derived from different biological species, we selected one PDB structure from each different species. We selected 254 PDB structures from 182 proteins. When the ATP-binding site was composed of two domains, two ATP-binding domains were selected from one PDB structure. Thus, we obtained a total of 263 ATP- and 22 GTP-binding domains from 254 PDB structures. The identification of the ATP-binding domains is confirmed by the references cited in the PDB database and by the distribution of the residues interacting with the bound ATP or its analog. By LIGPLOT (Wallace et al. 1995), we further confirmed that such interacting residues are mainly located on the ß-strands of the largest ß-sheet in the ATP-binding domain or on the loops connecting the ß-strands.
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| References |
|---|
|
|
|---|
Babbitt, P.C. and Gerlt, J.A. 1997. Understanding enzyme superfamilies: Chemistry as the fundamental determinant in the evolution of new catalytic activities. J. Biol. Chem. 272: 3059130594.
Berg, J.M., Tymoczko, J.L., and Stryer, L. 2001. Biochemistry. W.H. Freeman and Co., New York.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235242.
Bertrand, J.A., Auger, G., Fanchon, E., Martin, L., Blanot, D., van Heijenoort, J., and Dideberg, O. 1997. Crystal structure of UDP-N-acetylmuramoyl-L-alanine:D-glutamate ligase from Escherichia coli. EMBO J. 16: 34163425.[CrossRef][Medline]
Buchanan, J.M. 1973. The amidotransferases. Adv. Enzymol. Relat. Areas Mol. Biol. 39: 91183.[Medline]
Fani, R., Lio, P., and Lazcano, A. 1995. Molecular evolution of the histidine biosynthetic pathway. J. Mol. Evol. 41: 760774.[Medline]
Farber, G.K. and Petsko, G.A. 1990. The evolution of
/ß barrel enzymes. Trends Biochem. Sci. 15: 228234.[CrossRef][Medline]
Fetrow, J.S. and Skolnick, J. 1998. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J. Mol. Biol. 281: 949968.[CrossRef][Medline]
Flores, T.P., Moss, D.S., and Thornton, J.M. 1994. An algorithm for automatically generating protein topology cartoons. Protein Eng. 7: 3137.
Fukami-Kobayashi, K., Tateno, Y., and Nishikawa, K. 1999. Domain dislocation: A change of core structure in periplasmic binding proteins in their evolutionary history. J. Mol. Biol. 286: 279290.[CrossRef][Medline]
Gerlt, J.A. and Babbitt, P.C. 1998. Mechanistically diverse enzyme superfamilies: The importance of chemistry in the evolution of catalysis. Curr. Opin. Chem. Biol. 2: 607612.[CrossRef][Medline]
Gill, H.S. and Eisenberg, D. 2001. The crystal structure of phosphinothricin in the active site of glutamine synthetase illuminates the mechanism of enzymatic inhibition. Biochemistry 40: 19031912.[CrossRef][Medline]
Goto, S., Nishioka, T., and Kanehisa, M. 2000. LIGAND: Chemical database of enzyme reactions. Nucleic Acids Res. 28: 380382.
Hasson, M.S., Schlichting, I., Moulai, L., Taylor, K., Barrett, W., Kenyon, G.L., Babbitt, P.C., Gerlt, J.A., Petsko, G.A., and Ringe, D. 1998. Evolution of an enzyme active site: The structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase. Proc. Natl. Acad. Sci. 95: 1039610401.
Hubbard, T.P.J., Murzin, A.G., Brenner, S.E., and Chothia, C. 1997. SCOP: A structural classification of proteins database. Nucleic Acids Res. 25: 236239.
Kraulis, P.J. 1991. MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. 24: 946950.[CrossRef]
Levdikov, V.M., Barynin, V.V., Grebenko, A.I., Melik-Adamyan, W.R., Lamzin, V.S., and Wilson, K.S. 1998. The structure of SAICAR synthase: An enzyme in the de novo pathway of purine nucleotide biosynthesis. Structure 6: 363376.[Medline]
Li, C., Kappock, T.J., Stubbe, J.A., Weaver, T.M., and Ealick, S.E. 1999. X-ray crystal structure of aminoimidazole ribonucleotide synthetase (PurM), from the Escherichia coli purine biosynthetic pathway at 2.5 Å resolution. Structure 7: 11551166.[Medline]
Lo Conte, L., Ailey, B., Hubbard, T.J.P., Brenner, S.E., Murzin, A.G., and Chothia, C. 2000. SCOP: A structural classification of proteins database. Nucleic Acids Res. 28: 257259.
Orengo, C.A., Jones, D.T., and Thornton, J.M. 1994. Protein superfamilies and domain superfolds. Nature 372: 631634.[CrossRef][Medline]
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton, J.M. 1997. CATHA hierarchic classification of protein domain structures. Structure 5: 10931108.[Medline]
Petsko, G.A., Kenyon, G.L., Gerlt, J.A., Ringe, D., and Kozarich, J. W. 1993. On the origin of enzymatic species. Trends Biochem. Sci. 18: 372376.[CrossRef][Medline]
Ptitsyn, O.B. and Finkelstein, A.V. 1980. Similarities of protein topologies: Evolutionary divergence, functional convergence or principles of folding? Q. Rev. Biophysics 13: 339386.[Medline]
Richardson, J.S. 1977. ß-Sheet topology and the relatedness of proteins. Nature 268: 495500.[CrossRef][Medline]
Saraste, M., Sibbald, P.R., and Wittinghofer, A. 1990. The P-loop: A common motif in ATP- and GTP-binding proteins. Trends Biochem. Sci. 15: 430434.[CrossRef][Medline]
Sawaya, M.R., Prasad, R., Wilson, S.H., Kraut, J., and Pelletier, H. 1997. Crystal structures of human DNA polymerase ß complexed with gapped and nicked DNA: Evidence for an induced fit mechanism. Biochemistry 36: 1120511215.[CrossRef][Medline]
Thoden, J.B., Kappock, T.J., Stubbe, J., and Holden, H.M. 1999. Three-dimensional structure of N 5-carboxyaminoimidazole ribonucleotide synthetase: A member of the ATP grasp protein superfamily. Biochemistry 38: 1548015492.[CrossRef][Medline]
Thoden, J.B., Firestine, S., Nixon, A., Benkovic, S.J., and Holden, H.M. 2000. Molecular structure of Escherichia coli purt-encoded glycinamide ribonucleotide transformylase. Biochemistry 39: 87918802.[CrossRef][Medline]
Todd, A.E., Orengo, C.A., and Thornton, J.M. 2001. Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307: 11131143.[CrossRef][Medline]
Voet, D., Voet, J.G., and Pratt, C.W. 1999. Fundamentals of biochemistry. John Wiley & Sons, New York.
Wallace, A.C., Laskowski, R.A., and Thornton, J.M. 1995. LIGPLOT: A program to generate schematic diagrams of proteinligand interactions. Protein Eng. 8: 127134.
. 1996. Derivation of three-dimensional coordinate templates for searching structural databases. Protein Sci. 5: 10011013.[Abstract]
Walsh, C. 1979. Enzymatic reaction mechanisms. W.H. Freeman and Co., San Francisco.
Wang, W., Kappock, T.J., Stubbe, J., and Ealick, S.E. 1998. X-ray crystal structure of glycinamide ribonucleotide synthetase from Escherichia coli. Biochemistry 37: 1564715652.[CrossRef][Medline]
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
J. Viksna and D. Gilbert Assessment of the probabilities for evolutionary structural changes in protein folds Bioinformatics, April 1, 2007; 23(7): 832 - 841. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |