|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 Howard Hughes Medical Institute and 2 Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 753909050, USA
Reprint requests to: Nick V. Grishin, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 753909050; e-mail: grishin{at}chop.swmed.edu; fax: (214) 648-9099.
(RECEIVED September 10, 2004; FINAL REVISION October 18, 2004; ACCEPTED October 19, 2004)
| Abstract |
|---|
|
|
|---|
-helical hairpin at the C-terminus and excludes all other proteins with similar topology; similar domain fusions connect Dak and DegV, and genomic neighborhood organizations connect Dak and EIIA-man. Finally, both Dak and EIIA-man perform similar phosphotransfer reactions, suggesting a phosphotransferase activity for the DegV-like family of proteins, whose function other than lipid binding revealed in the crystal structure remains unknown. Keywords: EDD domain; Dak1; Dak2; dihydroxyacetone kinase; DegV; mannose transporter EIIA; SCOPmap; homology detection; structure similarity; protein classification
Abbreviations: Dak, dihydroxyacetone kinase EIIA-man, mannose transporter IIA domain superfamily PEP, phosphoenolpyruvate PTS, PEP, sugar phosphotransferase system HPR, histidine-containing phosphoryl carrier protein ATP, adenosine triphosphate DegV, DegV-like protein fold
Article published online ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041114805.
| Introduction |
|---|
|
|
|---|
SCOPmap assigns the N-terminal domain of dihydroxyacetone kinase (Dak, PDB ID 1oi2
[PDB]
; Siebold et al. 2003a,b) to the mannose transporter IIA domain superfamily (EIIA-man, PDB ID 1pdo
[PDB]
, Nunn et al. 1996). The structures of both Dak (Fig. 1A
, N-terminal yellow/cyan domain) and EIIA-man (Fig. 1B
) contain a common domain made up of a central parallel
-sheet of four strands (order 2134) with helices bounding either side. In contrast to this assignment, the SCOP database (Murzin et al. 1995) currently groups Dak (Fig. 1A
) with the tubulin GTPase domain (Fig. 1D
) at the superfamily level. These two folds share a similar topology across both Dak domains, assuming various inserted elements (Fig. 1A,D
, white N-terminal domain insertions and gray C-terminal domain insertions). Despite these general similarities, we were not able to find sequence support for a homologous relationship between Dak and tubulin, and their overall structural similarity is low (highest DaliLite Z score 4.6 between tubulin structure 1jff_B and Dak structure 1oi2_A).
|
To help resolve Dak classification, we decided to explore its evolutionary relationships using a combination of sequence- and structure-based methods. This report outlines the resulting support for an evolutionary link between the two SCOP folds represented by EIIA-man and Dak. Furthermore, our data establish a homologous relationship between these two families and members of a third SCOP fold, DegV-like proteins (PDB IDs 1pzx [PDB] and 1mgp, Schulze-Gahmen et al. 2003). We propose to unite these three families (EIIA-Man, Dak, and DegV) into a single superfamily (EDD fold superfamily). In accordance with this classification, structural similarities between three families were previously noted (Schulze-Gahmen et al. 2003; Siebold et al. 2003a,b), although their homologous evolutionary relationships were not appreciated. Functional similarities between EIIA-man and Dak provide additional support for this classification and suggest a common activity for the DegV family of proteins, whose biological activity other than phospholipid binding revealed in crystal structures of two hypothetical proteins remains unknown.
| Results and Discussion |
|---|
|
|
|---|
5) and retain similar active site placement as defined by motifs (conservation score
0.1). The conservation score for Dak and EIIA-Man (0.20) falls well within this accepted range.
To further explore the relationship of the Dak family to existing SCOP classification, SCOPmap was modified as described in Methods to identify multiple homologs of the query structure that fall within established cutoffs. SCOP-map then indicated homology between Dak and an additional family of proteins (DegV) through superfamily level assignments. This link was established by a high degree of structural similarity between Dak queries (PDB IDs 1oi2
[PDB]
and 1un8) and a DegV-like protein, Thermotoga maritima hypothetical protein TM841 (PDB ID 1mgp
[PDB]
; Fig. 1C
) (DaliLite Z-score for 1oi21mgp = 15.8; DaliLite Z-score for 1un81mgp = 15.5). SCOPmap cutoffs for superfamily assignments using such structural comparisons by DaliLite require a Z-score
14, regardless of the degree of sequence similarity. Importantly, the structural similarity of these two protein families extends beyond the N-terminal domain shared with EIIA-man and also encompasses the two C-terminal domains.
Sequence-based support for Dak classification
SCOPmap assignments for at least one query (1oi2) suggest that Dak, EIIA-man, and DegV structures should belong to the same superfamily. To help confirm this implied evolutionary link, we used sequences from each proposed EDD domain as queries in transitive PSI-BLAST searches. A representative EIIA-man sequence (gi| 22538040, range 2..133) identified the N-terminal domain of DegV (gi|26553729, detected in iteration 11 with E- value 0.004) while another representative EIIA-man sequence (gi|10957457, range 2..126) identified the N-terminal domain of Dak (gi| 15082277, detected in iteration 2 with E value 0.002). Additionally, sequence searches with the N-terminal domain of DegV (gi|28379075, range 1..152) find an EIIA-man-like transcription antiterminators (gi|48826184, range 579..663 with E-value 0.001 in iteration 2). The PSI-BLAST runs converge without identifying sequences from other families (false positives), and each of the PSI-BLAST hits encompasses a significant portion of the EDD domain (indicated with * in Fig. 2
). Although representative sequences from Dak and DegV do not detect each other directly using these criteria, the two families display a transitive linkage through the EIIA-man family.
|
-strand and
-helix and the loop connecting the third
-strand and
-helix (numbered consecutively along the core structural elements, as in Fig. 2
To help visualize the evolutionary relationships between members of the EDD domain superfamily, we constructed a stereo plot of representative sequences mapped in Euclidian space according to distances (Fig. 3
) (Grishin and Grishin 2002). In this three-dimensional plot, each symbol represents a sequence colored according to the three different families: Dak (black), EIIA (gray), and DegV (open circles); and the space between symbols reflects evolutionary distances. Those sequences that established links between families with PSI-BLAST are connected by arrows stemming from the queries, and the E-value of the initial PSI-BLAST hits are displayed. This mapping procedure clusters the EIIA family into three distinct subgroups that correspond to their COG classification (reflected by different symbols: circle for EIIA-dak, triangle for EIIA-man, and square for transcription antiterminator). Notably, the sequences from the different subgroups readily detect each other using our PSI-BLAST cutoffs (E-value 0.005, maximum 20 iterations). The Dak sequences form a tight cluster that reflects a high overall degree of sequence similarity between members. Accordingly, profiles built for these sequence queries with PSI-BLAST are not diverse enough to detect homologs from the other families. Sequences from the more diverse groups (EIIA-man and Dak) can establish evolutionary links to the other families.
|


sandwich fold of the EDD domain appears quite frequently in the PDB database, and was even predicted prior to the crystal structure determination based on another Rossmann-type fold, flavodoxin (Markovic-Housley et al. 1994). To determine the degree of structural similarity between these folds and to help resolve existing classification schemes, we chose to compare each structure to a nonredundant (90% sequence identity) library of existing structures (as described in Methods). This library of 8653 representative structures contained two DegV representatives (1mgp and 1pzx), two Dak representatives (1oi2 and 1un8), and a single EIIA-man representative (1pdo).
As shown in Table 1
, the top five hits (ordered by Z-score) for each EDD domain-containing query correspond to representative structures of the three families under consideration. The next best hits in each case are different for each query and belong to different SCOP folds: The EIIA-man query finds a structure belonging to the periplasmic binding protein-like I fold, the Dak query (1un8
[PDB]
) finds a structure belonging to the isocitrate dehydrogenase, the Dak query (1oi2) finds a structure belonging to the chorismate mutase-like fold, and the two DegV queries find NifK structures belonging to the chelatase-like fold. Only one of these next best hits belongs to the tubulin family (chorismate mutase-like fold 1jff). Thus, each query structure finds all EDD domain-containing structures before finding any other structure, despite the common occurrence of this type of fold in the database. EIIA-man, Dak, and DegV structures form a "closed structural group" and are closer to each other than to any other known protein structure. This property of structural similarity has previously been used to delineate SHS2 domains (Anantharaman and Aravind 2004) and is suggestive of EDD domain monophyly.
|
The evolutionary linking of EDD domain containing families justified by both sequence and structure methods leads to a more precise definition of the core fold, which includes an 

sandwich with a four-stranded, parallel
-sheet (order 2134) surrounded by two
-helices (2 and 3) on one side and three on the other (1, 4, and 5). Each EDD domain family possesses various insertions with respect to the core fold. As previously discussed, the EIIA-man core
-sheet (Fig. 1B
) is extended by one
-strand positioned antiparallel to the rest (order 21345) that establishes the swapped dimer. An N-terminal extension of the Dak EDD domain (Fig. 1A
) also extends the core
-sheet by two
-strands in the opposite direction of the EIIA-man extension. Alternatively, the DegV EDD domain (Fig. 1C
) contains a small domain insertion (three-stranded antiparallel
-sheet with
-helix) that covers the lipid-bound active site.
Genomic neighborhood and domain organization of Dak homologs
The subset of EIIA-man sequences (EIIA-dak) that find the Dak N-terminal domain using PSI-BLAST function in the same PTS pathway as Dak. In bacteria these two proteins are often components of a single operon (Gutknecht et al. 2001) found in a tandem array and can be linked by genomic neighborhood analysis (string database score 0.95 of highest confidence) (von Mering et al. 2003). Such close proximity in bacterial genomes supports their emergence from a duplication event that includes the loss (or gain) of the Dak C-terminal domain, especially considering the PSI-BLAST-detected link between these two families.
The similarities between Dak and DegV extend beyond their EDD domains and include similar C-terminal domains that contribute to the active sites of each family. In contrast to the N-terminal EDD domain, this C-terminal domain (Fig. 1
, green and purple) forms a distinctive fold with a topology that is not often found in the PDB. Both the ubiquitous presence of this domain and its unusual topology provide additional support for homology between these two families. Additionally, some members of both the DegV and the Dak families are fused to yet another domain (Dak2). This domain is responsible for binding ATP in the C. freundii Dak structure (1un9; Siebold et al. 2003a,b), and its presence in DegV may provide clues to its function.
Functional implications of Dak classification
While EIIA-man belongs to the PTS system responsible for coupling the import and phosphorylation of sugars in bacterial cells, Dak functions to produce the glycolytic intermediate dihydroxyacetone phosphate. The phosphoryl donor to EIIA-man is HPR from EI of the PTS system (Robillard and Broos 1999), and the phosphoryl donor for Dak is either ATP-bound to a fused Dak2 domain or the EIIA-dak PTS system (Gutknecht et al. 2001). Although these general biological functions differ, the molecular reactions EIIA-man and Dak use to carry out these functions share some common features. Each of the proteins uses a buried aspartic acid (Asp67 in EIIA-man and Asp119 in Dak EDD domain) to form hydrogen bonds with their phosphoryl group acceptor, an invariant histidine residue (His10) in EIIA-man (Nunn et al. 1996) and a Dihydroxyacetone molecule covalently bound to an invariant histidine (His270) in the Dak C-terminal domain (Siebold et al. 2003a, b). Each structure also contains a third spatially conserved serine residue near the phosphoryl acceptor (Ser72 in EIIA-man and Ser60 in Dak EDD domain) that may participate in catalysis. Accordingly, the presence of His10 and Ser72 are essential for EIIA-man activity (Stolz et al. 1993; Nunn et al. 1996).
Despite a preserved spatial location of these residues in the two structures (Fig. 1
, residues in bonds representation), the side chains are contributed by nonidentical positions in the multiple sequence alignment (Fig. 2
, black highlights). Such migration of active-site residues has been noted in many homologous folds (Todd et al. 1999; Kinch and Grishin 2002). Although the positions of these active-site residues are not invariant in the EDD folds, their high degree of conservation within the respective families supports their functional importance. SCOPmap motif conservation scores detected the presence of a common active site in Dak and EIIA-man, despite the noted migration of active-site residues. Two major motifs (Fig. 2
) contribute to the conservation score. The first motif includes residues in the first
-strand and in the loop surrounding the EIIA-man phosphoryl acceptor histidine residue (His10). In both families this loop forms the active site and contains several conserved small residues. The invariant EIIA-man histidine aligns with an invariant Glycine in the Dak family, whose presence probably allows room for substrate binding. The second motif includes residues in the loop connecting the third strand and helix of the EDD fold. This loop includes the EIIA-man active-site residues (Asp67 and Ser72) and the Dak active-site residue (Asp119).
Although the function of DegV remains unknown, the active site of the protein can be assumed based on the presence of several invariant residues near the bound palmitate (Fig. 1C
). This active-site placement agrees with the established evolutionary link to other EDD domain-containing families. Similar to the active site Asp residues found in EIIA-man and Dak, a conserved DegV serine residue (discussed in Sequence-based support for Dak classification) forms a hydrogen bond with the head of the lipid. The side chain of another conserved DegV residue (Thr60), which is located in the small inserted domain unique to this family, forms an additional hydrogen bond with the bound lipid. The role of this Thr residue is mimicked by a conserved histidine (His66) in Dak. Finally the side chain of a conserved histidine residue from the C-terminal domain (His270) falls in an identical structural position to the Dihydroxyacetone-bound C-terminal domain Histidine of Dak, although contributed from a different loop.
Considering the active-site makeup of DegV and its evolutionary relationship to other EDD domain-containing proteins, a tempting speculation for the function of this unknown family of proteins is phosphoryl transfer. Both the N
2 position of His270 and the carboxylate of the palmitate are surface-exposed and provide potential phosphoryl acceptor sites. The presence of Dak2 domains fused to some DegV family members provides additional support for hypothesis, as Dak2 is required for the phosphotransfer reaction to Dihydroxyacetone of Dak (Gutknecht et al. 2001; Siebold et al. 2003a,b).
| Materials and methods |
|---|
|
|
|---|
In cases where hits from multiple superfamilies correspond to the same domain of a query protein, SCOPmap attempts to choose only one correct superfamily assignment. Consequently, distant yet correct evolutionary relationships may be disregarded in favor of closer homologs. In order to evaluate more remote evolutionary relationships with SCOPmap, the assignment strategy was modified to ignore any hits found to library domains belonging to the same SCOP superfamily as the query protein. This modified program was used to identify potential homologs of Dak (PDB IDs 1oi2 [PDB] and 1un8). The library used for these SCOPmap jobs was based on SCOP v1.65.
Sequence similarity searches
To detect sequence homologs of each family, we searched the nonredundant database (nr, Jul 9, 2004; 1,918,886 sequences, filtered for low complexity regions) with PSI-BLAST (Altschul and Koonin 1998) using query sequences (gi|1827887 range 1..135 for EIIA-Man; gi|49176086 range 51..198 for Dak; and gi|42543340 range 1..161 for DegV) with defined parameters (E-value threshold 0.005, maximum 20 iterations). Found homologs were grouped using linkage clustering (score of 1 bit per site threshold, about 50% identity), and representative sequences from each group were used as new queries for subsequent rounds of PSI-BLAST. The iterations were repeated until no new sequences were detected. We used the COG database (http://www.ncbi.nlm.nih.gov/COG/; Tatusov et al. 2003) to define orthologous groups of the detected sequences, the PFAM database (http://pfam.wustl.edu/index.html; Bateman et al. 2004) to define additional domains, and the STRING database (http://string.embl.de/; von Mering et al. 2003) to evaluate genomic neighborhood.
Multiple sequence alignments and Euclidian space mapping
We constructed multiple sequence alignments of EDD domains from detected groups (corresponding to COG2376 for Dak, COG2893 for EIIA-man, COG3412 for EIIA-dak, COG3933 for transcription antiterminators, and COG1307 for DegV) using PCMA (Pei et al. 2003) with manual adjustments. The multiple alignments of each group were merged into a global alignment using structure superpositions, secondary-structure predictions (JPRED server; Cuff et al. 1998), hydrophobicity patterns, and paired BLAST hit alignments as guides. The global multiple sequence alignment was used as input for Euclidian space mapping using the previously described formula (Dij = 1/Uij 1) for distance calculation (Grishin and Grishin 2002). Groups are colored according to the most stable configurations in the mapping procedure.
Structural similarity searches
DaliLite (Holm and Park 2000) was used to determine the closest structural neighbors of Dak, EIIA-man, and DegV representative structures in a library of PDB structures. Clustering at 90% sequence identity of all protein chains (minimum length of 20 amino acids) included in the PDB as of July 20, 2004 was obtained at ftp://ftp.rcsb.org/pub/pdb/derived_data/NR. A library of representative chains (8653 representatives) was assembled from the best representative of each cluster, which is defined as the chain with rank 1 in the cluster. Comparison of each query chain (Dak, EIIA-man, and DegV) with each library chain was performed by DaliLite (Holm and Park 2000). For each query, all pairwise comparisons were then ranked in order of descending Z-score to determine the closest structural neighbors.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
Anantharaman, V. and Aravind, L. 2004. The SHS2 module is a common structural theme in functionally diverse protein groups, like Rpb7p, FtsA, GyrI, and MTH1598/TM1083 superfamilies. Proteins 56: 795807.[CrossRef][Medline]
Bateman, A., Coin, L., Durbin, R., Finn, R.D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E.L., et al. 2004. The Pfam protein families database. Nucleic Acids Res. 32: D138D141.
Cheek, S., Qi, Y., Krishna, S.S., Kinch, L.N., and Grishin, N.V. 2004. SCOP-map: Automated assignment of protein structures to evolutionary superfamilies. BMC Bioinformatics 5: 197.[CrossRef][Medline]
Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., and Barton, G.J. 1998. JPred: A consensus secondary structure prediction server. Bioinformatics 14: 892893.
Esnouf, R.M. 1999. Further additions to MolScript version 1.4, including reading and contouring of electron-density maps. Acta Crystallogr. D Biol. Crystallogr. 55: 938940.[CrossRef][Medline]
Grishin, V.N. and Grishin, N.V. 2002. Euclidian space and grouping of biological objects. Bioinformatics 18: 15231534.
Gutknecht, R., Beutler, R., Garcia-Alles, L.F., Baumann, U., and Erni, B. 2001. The dihydroxyacetone kinase of Escherichia coli utilizes a phosphoprotein instead of ATP as phosphoryl donor. EMBO J. 20: 24802486.[CrossRef][Medline]
Holm, L. and Park, J. 2000. DaliLite workbench for protein structure comparison. Bioinformatics 16: 566567.
Hu, K.Y. and Saier Jr., M.H., 2002. Phylogeny of phosphoryl transfer proteins of the phosphoenolpyruvate-dependent sugar-transporting phosphotransferase system. Res. Microbiol. 153: 405415.[Medline]
Kinch, L.N. and Grishin, N.V. 2002. Evolution of protein structures and functions. Curr. Opin. Struct. Biol. 12: 400408.[CrossRef][Medline]
Markovic-Housley, Z., Balbach, J., Stolz, B., and Genovesio-Taverne, J.C. 1994. Predicted topology of the N-terminal domain of the hydrophilic subunit of the mannose transporter of Escherichia coli. FEBS Lett. 340: 202206.[CrossRef][Medline]
Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia, C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536540.[CrossRef][Medline]
Nunn, R.S., Markovic-Housley, Z., Genovesio-Taverne, J.C., Flukiger, K., Rizkallah, P.J., Jansonius, J.N., Schirmer, T., and Erni, B. 1996. Structure of the IIA domain of the mannose transporter from Escherichia coli at 1.7 Å resolution. J. Mol. Biol. 259: 502511.[CrossRef][Medline]
Pei, J., Sadreyev, R., and Grishin, N.V. 2003. PCMA: Fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics 19: 427428.
Robillard, G.T. and Broos, J. 1999. Structure/function studies on the bacterial carbohydrate transporters, enzymes II, of the phosphoenolpyruvate-dependent phosphotransferase system. Biochim. Biophys. Acta. 1422: 73104.[Medline]
Schulze-Gahmen, U., Pelaschier, J., Yokota, H., Kim, R., and Kim, S.H. 2003. Crystal structure of a hypothetical protein, TM841 of Thermotoga maritima, reveals its function as a fatty acid-binding protein. Proteins 50: 526530.[CrossRef][Medline]
Siebold, C., Arnold, I., Garcia-Alles, L. F., Baumann, U., and Erni, B. 2003a. Crystal structure of the Citrobacter freundii dihydroxyacetone kinase reveals an eight-stranded alpha-helical barrel ATP-binding domain. J. Biol. Chem. 278: 4823648244.
Siebold, C., Garcia-Alles, L.F., Erni, B., and Baumann, U. 2003b. A mechanism of covalent substrate binding in the x-ray structure of subunit K of the Escherichia coli dihydroxyacetone kinase. Proc. Natl. Acad. Sci. 100: 81888192.
Stolz, B., Huber, M., Markovic-Housley, Z., and Erni, B. 1993. The mannose transporter of Escherichia coli. Structure and function of the IIABMan subunit. J. Biol. Chem. 268: 2709427099.
Tatusov, R. L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., et al. 2003. The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4: 41.[CrossRef][Medline]
Todd, A.E., Orengo, C.A., and Thornton, J.M. 1999. Evolution of protein function, from a structural perspective. Curr. Opin. Chem. Biol. 3: 548556.[CrossRef][Medline]
von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel, B. 2003. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31: 258261.
![]()
CiteULike
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
H. Cheng, B.-H. Kim, and N. V. Grishin MALISAM: a database of structurally analogous motifs in proteins Nucleic Acids Res., January 11, 2008; 36(suppl_1): D211 - D217. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Deutscher, C. Francke, and P. W. Postma How Phosphotransferase System-Related Protein Phosphorylation Regulates Carbohydrate Metabolism in Bacteria Microbiol. Mol. Biol. Rev., December 1, 2006; 70(4): 939 - 1031. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |