|
|
||||||||
Stowers Institute for Medical Research, Kansas City, Missouri 64110, USADepartment of Microbiology, Molecular Genetics, and Immunology, University of Kansas Medical Center, Kansas City, Kansas 66160, USA
Reprint requests to: Arcady Mushegian, Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, MO 64110, USA; e-mail: arm{at}stowers-institute.org; fax: (816) 926-2041.
(RECEIVED January 16, 2003; FINAL REVISION March 11, 2003; ACCEPTED April 10, 2003)
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.0302103.
| Abstract |
|---|
|
|
|---|
Keywords: Glycosyltransferases; exostosin; Fringe; Egghead; protein sequence evolution
| Introduction |
|---|
|
|
|---|
On the basis of analogies with better-studied glycosidases, two main catalytic mechanisms for glycosyl transfer reaction have been proposed (Sinnott 1991). In the inverting mechanism, the acceptor is thought to perform a nucleophilic attack at C1 of the nucleotide diphosphosugar donor, and the anomeric configuration of the added sugar is changed (for instance, UDP-glucose
ß-glucoside). In the retaining mechanism of glycosidases, the process is probably two-step, involving formation of a glycosyl-enzyme intermediate, release of the nucleoside diphosphate, and the subsequent attack of the glycosyl enzyme by the acceptor; the configuration of the transferred sugar is retained (for instance, UDP-glucose
-glucoside). The existence of a glycosyl-enzyme intermediate of a retaining reaction, however, has never been demonstrated for any glycosyltransferase, and the validity of the mechanistic analogy with glycosidase reaction remains to be established (Withers et al. 2002).
In both types of glycosyl hydrolysis, the residues with acidic or polar side chains, often aspartates, are known to play the roles of general base and nucleophile (Sinnott 1991; McCarter and Withers 1994). Mechanistic evidence and the information obtained from the analysis of three-dimensional structures suggests an important functional role for the carboxylate residues in GT active centers, although, in ways that may be different from what is known about glycosyl hydrolase reaction. The so-called DxD motif, which is found in many groups of both inverting and retaining glycosyltransferases (Breton et al. 1998; Wiggins and Munro 1998; Breton and Imberty 1999; Unligil and Rini 2000) is thought to be involved in binding of a divalent cation, most commonly Mn2+ or Mg2+, and in catalysis. For example, in four representative structures of inverting glycosyltransferases (Protein Data Bank [PDB; http://www.rcsb.org] structures 1QGQ [PDB] , 1J8X [PDB] , 1FOA [PDB] , and 1FGG [PDB] ), the last aspartate residue in DxD motif binds the divalent metal ion (Tarbouriech et al. 2001). In the retaining galactosyltransferase LgtC (PDB structure 1G9R [PDB] ) complexed with Mn2+ and Udp-2F-Galactose, a single Mn2+ is coordinated by the two phosphate oxygens of UDP as well as the side-chain atoms of His 244, Asp103, and Asp105 (the two Asp residues from the DXD motif). The Asp 103 provides one side-chain oxygen, and Asp105 provides both side-chain oxygen atoms in a bidentate interaction (Persson et al. 2001). Although it is conceivable that the bound divalent cation acts in the catalysis by polarizing a water molecule, which then may participate in the attack at C1, the exact identity of either nucleophile or general base have not been demonstrated directly in these cases. In another set of structures, which belong to the GPGTF (Wrabl and Grishin 2001) superfamily (also known as GT-B, e.g., Bourne and Henrissat 2001), there is no evidence of a bound metal ion associated with catalysis, but there are several partially conserved acidic residues that are involved in interactions with the substrate, and, in some enzymes, the catalytic role has been proposed for two glutamic acid residues in the carboxy-terminal E-X7-E motif (Cid et al. 2000).
Glycosyltransferases catalyze what has been called the most important transfer reaction on earth, considering the biomass involved in turnover of such polysaccharides as chitin, cellulose, starch, glycogen, and microbial cell wall components (Law and Reid 1995). Protein glycosylation, moreover, mediates crucial regulatory events in metazoan development. For example, boundary formation in developing fruit-fly embryo requires Fringe-dependent elongation of O-linked fucose on specific EGF-like repeats in the extracellular domains of Notch receptors, and that modification modulates the activation of Notch, in turn regulating the activation of Notch target genes (Bruckner et al. 2000; Munro and Freeman 2000). Mammalian tumor suppressor exostosin has a glycosyltransferase, namely a heparan sulfate copolymerase, activity, and its fruit-fly homolog, Tout-velu, is required for regulation of movement of patterning factor Hedgehog (Bellaiche et al. 1998). Mutations in glycosyltransferases result in various human diseases, such as Gilbert syndrome (Online Mendelian Inheritance in Man [OMIM] database entry 143500 [OMIM] ), Crigler-Najjar type I (OMIM 218800 [OMIM] ), and type II syndromes (OMIM 606785 [OMIM] ), in which the UDP-glucuronosyltransferase gene UGT1A1 is mutated, and muscle-eye-brain disease (OMIM 253280 [OMIM] ), which is caused by mutations in O-mannose ß-1,2-N-acetylglucosaminyltransferase, POMGNT1. Our future insights into the developmental processes in normal and diseased states, as well as into cellular intermediate metabolism, will be greatly aided by deep understanding of the sequence-structure-function relationships of glycosyltransferases.
A database of enzymes involved in carbohydrate metabolism is maintained by the Glycobiology unit at AFMB-CNRS in Marseille, France (http://afmb.cnrs-mrs.fr/CAZY/index.html). The glycosyltransferase section of the database contains >7000 sequences, organized into 65 families on the basis of high-sequence similarity to one or more founding members with experimentally demonstrated GT activity. In addition, proteins with similar sequences, but different catalytic mechanisms, tend to be placed in separate families, as in the case of a subset of polypeptide GalNAc transferases that were removed from the GT-2 family after being recognized as retaining enzymes, and grouped into a new family, GT-27. A few other distant similarities between the CAZy families have been noted, suggesting that some families within CAZy share common ancestors (Campbell et al. 1997). A natural classification of GTs that would suggest the evolutionary events leading to the emergence of the present-day GT sequences is, however, still unavailable.
Probabilistic methods of database searching, such as PSI-BLAST (Altschul et al. 1997) and HMMer (Eddy 1998), use evolutionary models of a group of related sequences to detect homologs with lower sequence similarity. Statistical theory allows one to validate weak sequence matches detected in that way, by showing that they are unlikely to have arisen by chance alone (Karlin and Altschul 1990). In the case of glycosyltransferases, distant evolutionary relationships between, and a monophyletic origin of 15 families in the CAZy database have been demonstrated recently using PSI-BLAST searches (Wrabl and Grishin 2001). The newly established superfamily is an extension of what has been also called GT-B family. The extended superfamily includes >2700 proteins, which represent all three domains of life and almost every completely sequenced genome so far (with the exception of small genomes of Mollicutes). For several members of the GT-B superfamily, the three-dimensional structures of their catalytic domains have been determined and are virtually the same, consisting of a duplicated Rossmann-like
ß
fold. In addition to the bona fide glycosyltransferases, the GT-B family includes other enzymes involved in sugar metabolism, such as sugar epimerases (Wrabl and Grishin 2001), adding to the growing list of examples in which the catalytic activity is thought to have changed during the evolution of sequence family (Mushegian and Koonin 1994; Copley and Bork 2000; Smit and Mushegian 2000; Nagano et al. 2002).
In this work, we undertook detailed sequence and structure comparison of the 65 families of glycosyltransferases contained in the CAZy database as of March 7, 2003. In addition to further expanding the GT-B superfamily to include at least 20 CAZy families, we delineate 2 other large monophyletic superfamilies of glycosyltransferases (Fig. 1
). One superfamily is the largest of all known GT superfamilies, including 22 CAZy families, as well as a large group of nucleotidyltransferases, and is an extension of the previously defined GT-A family (Bourne and Henrissat 2001). The known three-dimensional structures of several members of this superfamily are all classified as a Rossmann-like nucleoside diphosphosugar transferase (NDS) fold (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.A.html). The other newly defined superfamily includes eight CAZy families of integral membrane proteins, with distant sequence similarity and partial conservation of transmembrane topology. The alignment of the superfamily displays several conserved charged residues located in the extracytoplasmic loops, in which they are likely to play a role in substrate binding or catalysis. We propose to call the latter superfamily GT-C. Ubiquitous distribution of GT-A and GT-B in three domains of life suggests the ancient origin of both superfamilies.
|
| Results and Discussion |
|---|
|
|
|---|
Functionally important, yet short amino acid motifs such as the DxD motif, may be signals of the common ancestry of the enzymes that share them, or could have evolved independently, that is, convergently, in different lineages of evolutionarily unrelated GTs. To understand the evolutionary relationships between GT families, one needs to distinguish between the above two scenarios. Sometimes, the similarity of the three-dimensional structures is taken as a further indication of the common origin of the two proteins sharing a short sequence motif. Structure of several glycosyltransferases with DxD motif is known and is basically the same, namely a Rossmann-like
ß
three-layer with seven-stranded ß-sheet of the 3214657 topology, in which strand 6 is antiparallel to the rest (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.d.hj.A.html). Some (but not all) nucleotidyltransferases, enzymes that are mechanistically close to GTs, also possess the same fold and variations of the functionally important DxD tripeptide (Blankenfeldt et al. 2000; Mosimann et al. 2001; Olsen and Roderick 2001). The problem with the structural argument, however, is that there is no rigorous statistical theory allowing one to distinguish between convergent and divergent three-dimensional structures by direct comparison of the atomic coordinates. In contrast, statistics of random versus nonrandom (evolutionarily relevant) sequence matches is well understood (Karlin and Altschul 1990). Thus, the strongest available support for the common origin and divergent evolution of the two three-dimensional protein structures comes from matching their sequences in the context of large sequence databases (Aravind and Koonin 1999; Copley and Bork 2000; Nagano et al. 2002). Therefore, we obtained the statistics of matching all GT sequences to each other, using the nonredundant protein sequence database at NCBI as the search space.
Defining of the evolutionary ancient, diverse sequence superfamilies requires confident detection of remote homologs. It has been noted that the maximal number of such homologs is recovered when a number of distant members of the already defined family have to be tried as starters in the PSI-BLAST similarity searches, perhaps because this helps to overcome the effect of random fluctuations in sequences and in match scores (Aravind and Koonin 1999; Wrabl and Grishin 2001). Sought in this approach, is a set within which every sequence matches at least one other sequence with a score higher than (or probability lower than) a cutoff, without a requirement for every pair of sequences to pass this threshold. The image of such a group is a network in which the edges correspond to the matches satisfying the threshold requirement, and nodes may represent either single sequences or groups of sequences with very high degree of similarity (Fig. 2
).
|
The lowest level of organization in the network shown in Figure 2
is occupied by well-conserved, fully linked protein families, with the average similarity score, s = 240, and probability of random match in the context of the NR search ranging from 1e-15 to 1e-60. These families correspond almost precisely to the CAZy families (CAZy families GT-2, GT-8, GT-25, and GT-45 are, however, grouped together in our approach, and GT-31 is split into three groups). Important, although not crucial for our definition of extended GT-A superfamily, was the CAZy family GT-2, which includes bacterial spore coat polysaccharide biosynthesis protein SpsA with known structure (PDB entry 1QGQ
[PDB]
). This family is connected to the largest number of other CAZy families. Inspection of the network allows one to define smaller connected groups, occupying an intermediate position between a CAZy family level and the GT-A superfamily level. One such group, represented by families GlmU, RmlA, sialic acid activating synthetase, cytidylytransferase, and MobA at the bottom right corner of Figure 2
, is interesting in that it contains almost all GT-A proteins with nucleotidyltransferase and other non-GT activities. In addition to this functional distinction, members of this cluster appear to share a distinctive structural feature (see below).
In total, there are >5000 members of GT-A in the NR database. They are found in all completely sequenced genomes, in numbers varying from only four in Mycoplasma genitalium and in Chlamydia spp. (the latter are the only group of species that has only GT-A nucleotidyltransferases, but no GT-A glycosyltransferases), to 33 members in a large bacterium such as Escherichia coli strain O157 and 59 homologs in Drosophila melanogaster. Thus, the GT-A superfamily appears to be one of the most commonly used protein superfamilies and spatial folds in cellular life, making the list of 20 most common folds in both Drosophila and E. coli, and the 10 most common folds (accounting for about 1% of all ORFs) in bacterium Helicobacter pylori. Curiously, surveys of fold usage in complete genomes, such as Parts List at Yale University (http://bioinfo.mbb.yale.edu/partslist/), ignore it almost completely, perhaps relying on the earlier releases of the SCOP database, in which the GT-A proteins (the NDS fold, as it is currently known in SCOP) were not classified. In the existing collections of conserved protein families, such as PFAM (Bateman et al. 2002), PRODOM (Servant et al. 2002), and PROTOMAP (Yona et al. 2000), GT-A is split into many families similar to the CAZy families, and nucleotidyltransferase families show no connection with glycosyltransferases. The remote, yet statistically significant sequence similarities, strongly indicative of the monophyletic origin of GT-A that we report here, have apparently not been fully appreciated until now.
Conserved sequence motifs in GT-A, their structural basis and functional significance
There are three regions of sequence conservation shared by all members of the GT-A. Taken together, these sequence regions comprise a substantial part of the structural core of these enzymes. We therefore describe that core, taking into account all 12 families recognized within the NDS fold in the SCOP database, and then discuss the universally conserved sequence motifs in more detail.
The three-dimensional structure of the GT-A members is based on the Rossmann-like fold, one of the most common arrangements of protein spatial structure, observed in dozens of diverse families of enzymes (Lesk 1995). In the most basic arrangement, extended ß-stranded and
-helical regions alternate along the length of the protein, with all strands forming a central relatively planar ß-sheet, and helices filling two layers, one on each side of the plane. As with many other Rossmann-like folds, the amino-terminal ß-strand of the GT-A proteins is located in the middle of the sheet, and the strand topology is 321465, although the seventh strand may be added (Fig. 3
). One of the strands in some Rossmanoids (the sixth in the case of GT-A proteins) is antiparallel to all other strands. Yet another typical feature of Rossmanoid enzymes is that the functionally important, conserved residues are often located in the carboxy-termini of the ß-strands or in the adjoining loops (Lesk 1995). A subset of the GT-A proteins, consisting of the structures from the bottom right cluster in Figure 2
, conform to this plan almost precisely, with an occasional addition of an extra
-helix or a few ß-strands, which, however, seem to be involved mostly in auxiliary roles such as multimerization (Blankenfeldt et al. 2000; Jelakovic and Schulz 2001; Fig. 4E,F
).
|
|
-helices and ß-hairpins, which again appear to play the auxiliary roles with regard to the catalytic reaction. Curiously, at the sequence similarity level as well as from the structural point of view, proteins lacking ß-lip appear to form a more compact group than the ones with ß-lip, despite the fact that the former group includes the diverse range of GT-A enzymes, including all of those that are not technically glycosyltransferases (Fig. 2
The first conserved sequence region in the GT-A enzymes is closest to the amino terminus and encompasses ß-strands S1S3, their carboxy-terminal loops, and connecting
-helices (Fig. 5
). The highly conserved residues are in the strands (mostly hydrophobic side chains) and in the loop after S1 (mostly charged side chains). Analysis of those crystal structures that contain a bound nucleotide indicates that the charged residues in the loop after S1 (S1L) and, in some cases, also in loops after S2 or S3, are making direct contacts with nucleotide base and/or sugar. The presence of glycine residues in S1L has been detected in several cases, and a broad similarity of this loop to the glycine-rich loops of the Walker-type NTP-binding sites (Saraste et al. 1990) has been discussed (Koonin 1995; Kinoshita et al. 1999; Lake et al. 2000). As seen in Figure 5
, there are a few partially conserved glycine residues in S1L, but this loop is, in most cases, not glycine rich. More importantly, the P-loops in Walker-type NTPases bind to the phosphate moiety of the bound nucleotide, not to the base or sugar as in the present case, so there is probably no evolutionary or functional connection between the two types of loops.
|
In a subset of NDS-fold proteins that lack ß-lip, S4L is followed by an
-helix. When a ß-lip is present, S4L is followed by the strand A of the lip. From this point to the carboxy-termini of the proteins, there is considerable sequence diversity and, accordingly, the structure of the rest of the GT-A superfamily is less well conserved (Fig. 3
), except for the two helices following S6. These elements, connected by a loop, form the last conserved sequence region (Fig. 4
), characterized by partial preservation of charged residues within each of the two helices. Whenever the bound sugar, modeling the transferable moiety of the nucleoside diphosphosugar donor, is present in crystals, it makes direct contacts with these charged residues in the helical region.
Thus, all three conserved sequence regions in GT-A appear to have structural roles, namely stabilization of the Rossmann-like core, and functional significance, namely interaction with a bound donor of a sugar and participation in divalent cation-mediated glycosyl transfer. None of the macromolecular sugar acceptors have been cocrystallized with a GT-A enzyme, but it is conceivable that the carboxy-terminal halves of the GT-A proteins, so variable at both sequence and structure levels, mediate interactions with the diverse substrates of these enzymes.
GT-A and GT-B have, respectively, duplicated and stand-alone Rossmann-like fold, but lack discernible sequence similarity
Probabilistic modeling and iterative database searches allowed us to discover deep evolutionary relationship between very diverse enzymes forming the GT-A superfamily. Recently, a conceptually similar study resulted in extension of the GT-B superfamily, another group of GTs with subtle sequence similarity, structure that is best described as duplicated Rossmann-like fold, and broad distribution in all living species (Wrabl and Grishin 2001). We were interested in the fact that no GT-A sequences showed up in searches initiated by GT-B family members or vice versa. Therefore, we explored more permissive cutoffs and different scoring schemes in our searches. We also attempted to force local alignments between GT-A and GT-B, using several approaches. First, we created specialized databases, consisting only of members of either superfamily, and used each database as the search space, interrogating it with members of other superfamily. Second, we attempted to match profiles to families and vise versa using the HMMer package (Eddy 1998). Third, we used the prof_sim program (Yona and Levitt 2002; G.Yona, pers. comm.) that matches PSI-BLAST checkpoint profiles directly. Finally, we created conserved sequence blocks from the multiple alignments of both superfamilies and used the LAMA server (Pietrokovski 1996) to compare them with the databases of other conserved sequence blocks. None of these approaches resulted in detection of sequence similarities between GT-A and GT-B. Even in a forced comparison of two superfamilies to each other, the matches did not correspond to regions conserved in a majority of members of each superfamily, nor did they highlight any residues with known structural or functional significance.
GT-A and GT-B thus appear to be unrelated, despite the similarity of their spatial folds and general ease with which duplication is believed to occur in evolution of sequences (Andrade et al. 2001) and structural domains (Heringa and Taylor 1997). It is notable that, whereas both ß-sheets in GT-B have a very common strand topology (parallel 321456 arrangement found in 12 folds in the SCOP database, mostly, but not exclusively, Rossmanoids), the GT-A topology (321465, with antiparallel sixth strand) seems to be unique. Other properties setting the two families apart include (1) inconsistent presence of metal ions in GT-B and lack of conserved contacts between metal ions and side chains of acidic residues; (2) pyridoxal-dependent mechanism of glycosyl transfer in at least one member of GT-B, glycogen phosphorylase (Klein et al. 1986; Withers et al. 2002); and (3) complex mode of GT-B interaction with sugar donors, involving both Rossmann-like domains and requiring their motions (Ha et al. 2000; Morera et al. 2001). Combination of these structural and functional differences together with the apparent lack of global or local sequence similarity strongly indicates that these two superfamilies have evolved toward their similar molecular function independently, or at least, that their last common ancestor, if it existed, is extremely ancient and not amenable to sequence-based reconstruction.
Several proteins with well-documented roles in the development of multicellular organisms belong to GT-A and GT-B. Drosophila Fringe/Brainiac family and Egghead protein belong to GT-A, Fukutin belongs to GT-B and Exostosin/Tout-velu family, conserved in bacteria and multicellular eukaryotes, to contain both GT-A and GT-B-related domains, with multiple GT activities in one protein (at the time of the revision of this manuscript, the two-domain structure of exostosin has been already reflected in the CAZy database by addition of the family GT-64). A three-dimensional model of Fringe and multiple-sequence alignments of other proteins are available online as Supplemental information.
GT-C superfamily: Integral membrane glycosyltransferases with modified DxD motif in the first extracytoplasmic loop
The GT-C superfamily links 8 CAZy families (Fig. 6
). All of these families consist of large hydrophobic proteins located in ER or on the plasma membrane (Strahl-Bolsinger et al. 1993; Takahashi et al. 1996; Maeda et al. 2001), with 8 to 13 predicted multiple transmembrane domains. In Figure 7
, sequence-similarity information and the alignment of the conserved amino-terminal extracytoplasmic loop in GT-C sequences, the best-preserved sequence element in all eight families, are shown. The conserved element that long loop is a modified DxD signature, aligned to ExD, DxE, DDx, or DEx residues. The role of this motif in an important human enzyme, dolichol-phosphate-mannose-dependent mannosyltransferase (GPI-MT-I) has been studied (Maeda et al. 2001). GPI-MT-I is essential for the formation of glycosylphosphatidylinositol (GPI), which substitutes the carboxyl terminus of many newly synthesized cell-surface proteins and serves as their membrane anchor (Orlean 1990; McConville and Menon 2000). Change of either of the aspartic acid residues in the DxD motif to alanine abolishes the mannosyltransferase activity of GPI-MT-I (Maeda et al. 2001). The mechanistic basis of catalysis in GT-C enzymes is unclear, and it is not known whether the conserved acidic residues in the putative DxD motif bind a divalent cation.
|
|
There is no evidence of the common evolutionary origin of the putative DxD loop in GT-C and the DxD motif in GT-A. The DxD tripeptide in GT-C family is located at the carboxy-terminal end of the first transmembrane helix, and is often followed by a small patch of hydrophobic amino acids, which are predicted to be part of the same extracellular loop. This arrangement is reminiscent of the DxD followed by a ß-strand in a subset of GT-A enzymes, but there is no specific, statistically significant sequence similarity between these regions in GT-A and GT-C that could be detected in the context of the NR database search.
GT-C has more limited phyletic distribution than GT-A and GT-B. Representatives of GT-C are found in all completely sequenced eukaryotic genomes, but are missing from archaea, and rare findings of these enzymes in prokaryotes are limited to parasitic mycobacteria. The list of known biochemical specificities of the GT-C enzymes is also no match with an incredible variety of specificities in GT-A and GT-B, as most GT-C enzymes are only known to synthesize the polysaccharide derivatives of dolichol phosphate. Thus, GT-C appears to be a specialized, evolutionarily recent group of GTs that may have acquired the DxD motif either by convergent evolution, or by a recombinational graft of a short sequence in the extracellular loop of a membrane protein.
Concluding remarks: Trends in glycosyltransferase sequence evolution
In this work, we have delineated three large, diverse, most likely monophyletic superfamilies of glycosyltransferases, which together account for >75% of the families recognized by the CAZy database. In addition to the bona fide glycosyltransferases, GT-A and GT-B sequence superfamilies contain enzymes that utilize activated sugars as substrates, but have distinct enzymatic activities. This observation underscores the notion of the complex interplay of divergence and convergence in enzyme evolution, in which, on the one hand, similar sequences may have different enzymatic specificities, even belonging to different EC classes, and, on the other hand, enzymes with very similar specificities may have dramatically different sequences (Galperin et al. 1998; Copley and Bork 2000; Nagano et al. 2002).
Among the remaining 15 CAZy families, a few smaller superfamilies can be recognized. In at least one case (CAZy family GT-36), the fold can be predicted and appears to be different from both GT-A and GT-B (Fig. 1
; A. Mushegian, unpubl.). Glycosyltransferase activities have apparently evolved independently on several occasions on the basis of different sequences and structures (two distinct Rossmanoids, as in GT-A and GT-B; integral membrane proteins, as in GT-C; and possibly other conserved domains; Fig. 1
). GT-A and GT-B, however, stand out as the most diverse and ubiquitous groups of glycosyltransferases, found in all life forms with the exception of a few small parasitic bacteria with reduced biosynthetic capacity. Reconstruction of detailed phylogeny of both GT-A and GT-B is, however, hampered by insufficient signal retained in multiple alignments of each superfamily, by a very small number of shared derived characters on which to base a cladistic classification (Aravind et al. 2002a)although a ß-lip appears to be a good candidate for one such character within GT-Aand by apparently frequent horizontal transfer of biosynthetic enzymes, including GTs, in the early evolution of bacteria and archaea (Koonin et al. 1997, 2001). Analysis of phyletic distribution of both superfamilies and phylogenetic distances between individual sequences seems to indicate ancient origin of both GT-A and GT-B (A. Mushegian, unpubl.; see http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?txt=cog0463 for more information on a clearly monophyletic, ubiquitous subset of GT-A). On the basis of phyletic distribution, we speculate that GT-A-like and GT-B-like Rossmanoids might have been present in the Last Universal Common Ancestor (LUCA) of all present-day life forms.
The odyssey of the ß-DxD-ß metal-binding motif in the active center of glycosyltransferases and nucleotidyltransferases is of special interest. The LUCA had to possess a nucleotidyltransferase involved in copying its genomic nucleic acid; it has been proposed that such copying proceeded via an RNA intermediate (Leipe et al. 1999), and that said nucleotidyltransferase more than likely was of the palm-domain ferredoxin-like fold, consisting of four- or five-stranded ß-sheet with two
-helices packed against one side (Aravind et al. 2002b). A motif with 13 acidic residues, flanked by two ß-strands, and involved in coordination of two nearby metal ions, is found in all palm-related domains, including RNA-dependent RNA polymerases (viral type), reverse transcriptases, ß-family DNA polymerases, poly(A) polymerases, kanamycin nucleotidyltransferases, adenylyl cyclases, and GDDEF proteins (Aravind et al. 2002b). A few of the ß-family polymerases and adenylyl cyclases (e.g., the evolutionarily more recent members of the palm-domain family) have this motif in the DxD form, and, curiously, they are even seen, although at a statistically insignificant level, in PSI-BLAST searches with GT-A enzymes. The occurrence of DxD between two strands in the ferredoxin fold may be an indication of independent, perhaps the most ancient, cooptation of this motif for nucleotidyltransferase function, followed in evolution by formation of ß-lip in some of the GT-A and emergence of the first extracellular loop in GT-C.
On a more practical note, we have identified the regions of subtle sequence conservation, and their structural and functional correlates, in a number of proteins with important biochemical and developmental roles. Investigation of diverse biochemical processes of consequence to human health, from bacterial cell wall biosynthesis to hereditary multiple exostoses, can now proceed via mechanistic studies of precisely mutated glycosyltransferases of GT-A, GT-B, and GT-C superfamilies.
The approach to classification used in the CAZy database, the authoritative source of information about glycosyltransferases, is to capture both evolutionary divergence, in the form of groups on the basis of high-sequence similarity, and functional variation, in the form of separating retaining and inverting enzymes, even if their sequences are similar. Current classification is based only on sequence relationships, and in many cases, the distance between the related sequences is too high for extrapolation of the exact mechanism of glycosyl transfer. As more information comes from the mechanistic studies of various glycosyltransferases, sequence-based computational prediction of inverting versus retaining mechanism in these enzymes may also become plausible.
| Materials and methods |
|---|
|
|
|---|
Multiple sequence alignment
Related sequences with a high degree of similarity (>50% identity along the entire lengths) were aligned using the T-Coffee (Notredame et al. 2000) program. For families with moderate similarity, representative sequences were aligned with the MACAW (Schuler et al. 1991) program. Multiply aligned families were aligned together, if there was statistical evidence of their distant relationship (see text), using T-Coffee or the profile-to-profile alignment option of the CLUSTALX (Thompson et al. 1997) program. Local similarities between checkpoint profiles generated in the course of PSI-BLAST searches were also investigated using the prof_sim (Yona and Levitt 2002) program. Correct alignment of the sequence motifs was sometimes checked by inspection of the pairwise matches in the PSI-BLAST outputs.
Secondary structure
Prediction of the secondary structure was performed using the JPRED metaserver (Cuff et al. 1998; Bujnicki et al. 2001) and by the PHD algorithm (Rost 1996). In the latter case, only prediction with the accuracy of seven or more was considered. Transmembrane segments were predicted by PHD and by HMMTOP (Tusnady and Simon 2001) program.
Structure modeling
Information on sequence conservation and predicted secondary structure of the fruit-fly Fringe gene product were used to model and evaluate its three-dimensional structure by the WHATIF (Vriend 1990) package. The PDB structures 1FOA
[PDB]
and 1G8O
[PDB]
served as the templates.
| Electronic supplemental material |
|---|
|
|
|---|
| Acknowledgments |
|---|
The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 USC section 1734 solely to indicate this fact.
| REFERENCES |
|---|
|
|
|---|
Andrade, M.A., Perez-Iratxeta, C., and Ponting, C.P. 2001. Protein repeats: Structures, functions, and evolution. J. Struct. Biol. 134: 117131.[CrossRef][Medline]
Aravind, L. and Koonin, E.V. 1999. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J. Mol. Biol. 287: 10231040.[CrossRef][Medline]
Aravind, L., Anantharaman, V., and Koonin, E.V. 2002a. Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: Implications for protein evolution in the RNA. Proteins 48: 114.[CrossRef][Medline]
Aravind, L., Mazumder, R., Vasudevan, S., and Koonin, E.V., 2002b. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12: 392399.[CrossRef][Medline]
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L, Marshall, M., and Sonnhammer, E.L. 2002. The Pfam protein families database. Nucleic Acids Res. 30: 276280.
Bellaiche, Y., The, I., and Perrimon, N. 1998. Tout-velu is a Drosophila homologue of the putative tumour suppressor EXT-1 and is needed for Hh diffusion. Nature 394: 8588.[CrossRef][Medline]
Blankenfeldt, W., Asuncion, M., Lam, J.S., and Naismith, J.H. 2000. The structural basis of the catalytic mechanism and regulation of glucose-1-phosphate thymidylyltransferase (RmlA). EMBO J. 19: 66526663.[CrossRef][Medline]
Bourne, Y. and Henrissat, B. 2001. Glycoside hydrolases and glycosyltransferases: Families and functional modules. Curr. Opin. Struct. Biol. 11:593600.[CrossRef][Medline]
Breton, C. and Imberty, A. 1999. Structure/function studies of glycosyltransferases. Curr. Opin. Struct. Biol. 9: 563571.[CrossRef][Medline]
Breton, C., Bettler, E., Joziasse, D.H., Geremia, R.A., and Imberty, A. 1998. Sequence-function relationships of prokaryotic and eukaryotic galactosyltransferases. J. Biochem. 123: 10001009.
Bruckner, K., Perez, L., Clausen, H., and Cohen, S. 2000. Glycosyltransferase activity of Fringe modulates Notch-Delta interactions. Nature 406: 411415.[CrossRef][Medline]
Bujnicki, J.M., Elofsson, A., Fischer, D., and Rychlewski, L. 2001. Structure prediction meta server. Bioinformatics 17: 750751.
Campbell, J.A., Davies, G.J., Bulone, V., and Henrissat, B. 1997. A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J. 326: 929939.
Cid, E., Gomis, R.R., Geremia, R.A., Guinovart, J.J., and Ferrer, J.C. 2000. Identification of two essential glutamic acid residues in glycogen synthase. J. Biol. Chem. 275: 3361433621.
Copley, R.R. and Bork, P. 2000. Homology among (ß
)(8) barrels: Implications for the evolution of metabolic pathways. J. Mol. Biol. 303: 627641.[CrossRef][Medline]
Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., and Barton, G.J. 1998. Jpred: A consensus secondary structure prediction server. Bioinformatics 14: 892893.
Eddy, S.R. 1998. Profile hidden Markov models. Bioinformatics 14: 755763.
Galperin, M.Y., Walker, D.R., and Koonin, E.V. 1998. Analogous enzymes: Independent inventions in enzyme evolution. Genome Res. 8: 779790.
Ha, S., Walker, D., Shi, Y., and Walker, S. 2000. The 1.9 Å crystal structure of Escherichia coli MurG, a membrane-associated glycosyltransferase involved in peptidoglycan biosynthesis. Protein Sci. 9: 10451052.[Abstract]
Hagen, F.K., Hazes, B., Raffo, R., deSa, D., and Tabak, L.A. 1999. Structure-function analysis of the UDP-N-acetyl-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase. Essential residues lie in a predicted active site cleft resembling a lactose repressor fold. J. Biol. Chem. 274: 67976803.
Heringa, J. and Taylor, W.R. 1997. Three-dimensional domain duplication, swapping and stealing. Curr. Opin. Struct. Biol. 7: 416421.[CrossRef][Medline]
Jelakovic, S. and Schulz, G.E. 2001. The structure of CMP: 2-keto-3-deoxy-manno-octonic acid synthetase and of its complexes with substrates and substrate analogs. J. Mol. Biol. 312: 143155.[CrossRef][Medline]
Ju, B.G., Jeong, S., Bae, E., Hyun, S., Carroll, S.B., Yim, J., and Kim, J., 2000. Fringe forms a complex with Notch. Nature 405: 191195.[CrossRef][Medline]
Karlin, S. and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. 87: 22642268.
Kinoshita, K., Sadanami, K., Kidera, A., and Go, N. 1999. Structural motif of phosphate-binding site common to various protein superfamilies: All-against-all structural comparison of protein-mononucleotides. Protein Eng. 12: 1114.
Klein, H.W., Im, M.J., and Palm, D. 1986. Mechanism of the phosphorylase reaction. Utilization of D-gluco-hept-1-enitol in the absence of primer. Eur. J. Biochem. 157: 107114.[Medline]
Koonin, E.V. 1995. Multidomain organization of eukaryotic guanine nucleotide exchange translation initiation factor eIF-2B subunits revealed by analysis of conserved sequence motifs. Protein Sci. 4: 16081617.[Abstract]
Koonin, E.V., Mushegian, A.R., Galperin, M.Y., and Walker, D.R. 1997. Comparison of archaeal and bacterial genomes: Computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25: 619637.[CrossRef][Medline]
Koonin, E.V., Makarova, K.S., and Aravind, L. 2001. Horizontal gene transfer in prokaryotes: Quantification and classification. Annu. Rev. Microbiol. 55: 709742.[CrossRef][Medline]
Lake, M.W., Temple, C.A., Rajagopalan, K.V., and Schindelin, H. 2000. The crystal structure of the Escherichia coli MobA protein provides insight into molybdopterin guanine dinucleotide biosynthesis. J. Biol. Chem. 275: 4021140217.
Law, S.K.A. and Reid, K.B.M. 1995. Complement, 2nd ed. IRL Press, Oxford, UK.
Leipe, D.D., Aravind, L., and Koonin, E.V. 1999. Did DNA replication evolve twice independently? Nucleic Acids Res. 27: 33893401.
Lesk, A.M. 1995. NAD-binding domains of dehydrogenases. Curr. Opin. Struct. Biol. 5: 775783.[CrossRef][Medline]
Maeda, Y., Watanabe, R., Harris, C.L., Hong, Y., Ohishi, K., Kinoshita, K., and Kinoshita, T. 2001. PIG-M transfers the first mannose to glycosylphosphatidylinositol on the lumenal side of the ER. EMBO J. 20: 250261.[CrossRef][Medline]
McCarter, J.D. and Withers, S.G. 1994. Mechanisms of enzymatic glycoside hydrolysis. Curr. Opin. Struct. Biol. 4: 885892.[CrossRef][Medline]
McConville, M.J. and Menon, A.K. 2000. Recent developments in the cell biology and biochemistry of glycosylphosphatidylinositol lipids. Mol. Membr. Biol. 17: 116.[CrossRef][Medline]
Moloney, D.J., Panin, V.M., Johnston, S.H., Chen, J., Shao, L., Wilson, R., Wang, Y., Stanley, P., Irvine, K.D., Haltiwanger, R.S., et al. 2000. Fringe is a glycosyltransferase that modifies Notch. Nature 406: 369375.[CrossRef][Medline]
Morera, S., Lariviere, L., Kurzeck, J., Aschke-Sonnenborn, U., Freemont, P.S., Janin, J., and Ruger, W. 2001. High resolution crystal structures of T4 phage ß-glucosyltransferase: Induced fit and effect of substrate and metal binding. J. Mol. Biol. 311: 569577.[CrossRef][Medline]
Mosimann, S.C., Gilbert, M., Dombroswki, D., To, R., Wakarchuk, W., and Strynadka, N.C. 2001. Structure of a sialic acid-activating synthetase, CMP-acylneuraminate synthetase in the presence and absence of CDP. J. Biol. Chem. 276: 81908196.
Munro, S. and Freeman, M. 2000. The notch signalling regulator fringe acts in the Golgi apparatus and requires the glycosyltransferase signature motif DXD. Curr. Biol. 10: 813820.[CrossRef][Medline]
Mushegian, A.R. and Koonin, E.V. 1994. Unexpected sequence similarity between nucleosidases and phosphoribosyltransferases of different specificity. Protein Sci. 3: 10811088.[Abstract]
Nagano, N., Orengo, C.A., and Thornton, J.M. 2002. One fold with many functions: The evolutionary relationships between TIM barrel families based on their sequences, structures and functions. J. Mol. Biol. 321: 741765.[CrossRef][Medline]
Notredame, C., Higgins, D.G., and Heringa, J. 2000. T-Coffee: A novel method for multiple sequence alignments. J. Mol. Biol. 302: 205217.[CrossRef][Medline]
Olsen, L.R. and Roderick, S.L. 2001. Structure of the Escherichia coli GlmU pyrophosphorylase and acetyltransferase active sites. Biochemistry 40: 19131921.[CrossRef][Medline]
Orlean, P. 1990. Dolichol phosphate mannose synthase is required in vivo for glycosyl phosphatidylinositol membrane anchoring, O mannosylation, and N glycosylation of protein in Saccharomyces cerevisiae. Mol. Cell. Biol. 10: 57965805.
Persson, K., Ly, H.D., Dieckelmann, M., Wakarchuk, W.W., Withers, S.G., and Strynadka, N.C. 2001. Crystal structure of the retaining galactosyltransferase LgtC from Neisseria meningitidis in complex with donor and acceptor sugar analogs. Nat. Struct. Biol. 8: 166175.[CrossRef][Medline]
Pietrokovski, S. 1996. Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res. 24: 38363845.
Rost, B. 1996. PHD: Predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 266: 525539.[CrossRef][Medline]
Saraste, M., Sibbald, P.R., and Wittinghofer, A. 1990. The P-loopA common motif in ATP- and GTP-binding proteins. Trends Biochem. Sci. 15: 430434.[CrossRef][Medline]
Saxena, I.M. and Brown Jr., R.M. 1997. Identification of cellulose synthase (s) in higher plants: Sequence analysis of processive B-glycosyltransferases with the common motif D, D, D35Q(R,Q)XRW Cellulose 4: 3349.[CrossRef]
Saxena, I.M., Brown Jr., R.M. Fevre, M., Geremia, R.A., and Henrissat, B. 1995. Multidomain architecture of ß-glycosyl transferases: Implications for mechanism of action. J. Bacteriol. 177: 14191424.
Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. 2001. Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29: 29943005.
Schuler, G.D., Altschul, S.F., and Lipman, D.J. 1991. A workbench for multiple alignment construction and analysis. Proteins 9: 180190.[CrossRef][Medline]
Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. 2002. ProDom: Automated clustering of homologous domains. Brief Bioinform. 3: 246251.
Shibayama, K., Ohsuka, S., Tanaka, T., Arakawa, Y., and Ohta, M. 1998. Conserved structural regions involved in the catalytic mechanism of Escherichia coli K-12 WaaO (RfaI). J. Bacteriol. 180: 53135318.
Sinnott, M.L. 1991. Catalytic mechanisms of enzymic glycosyl transfer. Chem. Rev. 90: 11701202.
Smit, A. and Mushegian, A. 2000. Biosynthesis of isoprenoids via mevalonate in Archaea: The lost path