|
|
||||||||
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
Reprint requests to: Andrew F. Neuwald, Cold Spring Harbor Laboratory, 1 Bungtown Road, P.O. Box 100, Cold Spring Harbor, NY 11724, USA; e-mail: neuwald{at}cshl.edu; fax: (516) 367-8461.
(RECEIVED January 19, 2004; FINAL REVISION May 7, 2004; ACCEPTED May 13, 2004)
| Abstract |
|---|
|
|
|---|
(CK2
), for example, the binding of one of the buried waters appears prohibited by the side chain of a leucine that is highly conserved within CK2
and that, along with substitution of lysine for the CMGC-arginine, may contribute to the broad substrate specificity of CK2
by relaxing characteristically conserved, precise interactions near the active site. This leucine is replaced by a conserved isoleucine or valine in other CMGC kinases, thereby illustrating the potential functional significance of subtle amino acid substitutions. Analysis of other CMGC kinases similarly suggests candidate family-specific residues for experimental follow-up. Keywords: CHAIN analysis; proline-directed kinases; contrast hierarchical alignment; sky1p
Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.04637904.
| Introduction |
|---|
|
|
|---|
phosphate from ATP to a serine, threonine or tyrosine hydroxyl group on a protein substrate, often thereby influencing the conformational state of the protein and, as a result, downstream signaling events (for review, see Johnson and Lewis 2001; Huse and Kuriyan 2002; Lu et al. 2002). Protein kinases themselves are commonly regulated by phosphorylation: either directly via phosphorylation of the kinase domain or indirectly via prephosphorylation of the substrate. Activation of cyclin-dependent kinase 2 (CDK2) and mitogen-activated protein (MAP) kinases (MAPKs), for example, occurs via phosphorylation of residues within a flexible activation loop (Johnson et al. 1996). Activation of glycogen synthase kinases (GSKs), on the other hand, occurs via prephosphorylation of a target substrate site that then stabilizes the activation loop for phosphorylation of a second nearby residue in the substrate (ter Haar et al. 2001). Interestingly, the activation loop of GSK constitutively resembles the active conformation of phosphorylation-regulated loops. Other modes of regulation, such as one involving the C-terminal extension of calcium/calmodulin-dependent protein kinase I (Goldberg et al. 1996), also occur.
Protein kinases are often very specific to the substrates they phosphorylate. One way that they achieve this is through direct interactions with substrate residues flanking the phosphorylation (P) site (Pinna and Ruzzene 1996; Songyang et al. 1996). For example, CDKs and MAPKs show a preference for proline at the P + 1 position and thus are termed proline-directed kinases (Pinna and Ruzzene 1996). This preference may be linked to downstream signaling mechanisms mediated by Pin1 proline isomerization (Zhou et al. 1999; Lu et al. 2002).
Downstream signal transmission typically involves co-proteins that form a complex with and participate in the function of a particular protein kinase (Pawson and Nash 2003). CDK2, for example, binds both to a regulatory cyclin subunit and to Cks1, which appears to modulate substrate recognition and/or phosphorylation (Bourne et al. 1996; for review, see Harper 2001). Likewise, GSK binds to Axin (Dajani et al. 2003), a scaffold protein associated with the
catenin signaling pathway (Li et al. 2002), and to the FRAT-tide peptide (Bax et al. 2001), which blocks GSK-3 from interacting with Axin. Notably, both the Cks1 binding site in CDK2 (Bourne et al. 1996) and the Axin/FRATtide binding site in GSK correspond to a insert (see Results and Discussion) present within the C-terminal domain of CMGC kinases, which include CDK, MAPK, GSK, and Cdc-like kinases (CLKs; Hanks and Hunter 1995; Manning et al. 2002).
Protein families within the CMGC group are often highly conserved across organisms that diverged over a billion years ago, implying that important structural features or mechanisms are associated with these conserved residues. Furthermore, such residues fall into functional categories inasmuch as certain residues are conserved in nearly all protein kinases, whereas others are largely conserved only within certain kinase groups (such as the CMGC group) or within specific families or subfamilies. Although some of these residues have known functions, the mere existence of so many conserved residues of unknown function implies that our understanding is skewed toward those kinase structural features most amenable to current experimental approaches. Can we learn anything about the possible roles and relative importance of these uncharacterized, yet important residues based on their patterns of conservation and on their structural locations and mutual interactions?
Here we address this question for CMGC kinases by using an approach called contrast hierarchical alignment and interaction network (CHAIN) analysis (Neuwald et al. 2003) that categorizes and measures the selective constraints imposed on protein sequences and maps these constraints to structural features. Other computational approachessuch as hierarchical analysis of residue conservation (Livingstone and Barton 1993), principal component analysis (Casari et al. 1995), evolutionary trace (Lichtarge et al. 1996), positional entropy (Hannenhalli and Russell 2000), site-specific rate shifts (for review, see Gaucher et al. 2002), and specificity determining residues (Mirny and Gelfand 2002; Li et al. 2003)likewise seek to obtain evolutionarily insights into protein function from multiply aligned sequences. CHAIN analysis differs from these in that it uses (1) routines that can accurately align thousands of related sequences (using a combination of structurally and Gibbs samplingbased procedures), (2) an automated rigorous statistical procedure to optimally detect aligned sequence subgroups (each of which is characterized by strikingly conserved residues that are strikingly nonconserved outside of that subgroup), and (3) routines to identify corresponding specific structural interaction networks (including classical and weak hydrogen bonds, CH
interactions, van der Waals contacts, and aromatic-aromatic interactions). This identifies residues within each category that are subject to the strongest selective constraints and thus are most distinctive of that category. As applied here, this reveals canonical CMGC structural features associated with kinase activation, substrate recognition, and the CMGC-insert region. Both deviations from and additions to these canonical features appear to contribute to functional specialization within individual CMGC families and subfamilies.
| Results and Discussion |
|---|
|
|
|---|
|
|
Here we examine kinase-shared, CMGC-specific, and family- or subfamily-specific constraints. For kinase-shared constraints (i.e., those acting on all or nearly all protein kinases) sequences corresponding to all protein kinases constitute the foreground set, whereas the overall frequency of amino acids generally observed in all proteins serves as an implicit background at each position (Fig. 1A
). For CMGC-specific constraints, which distinguish the CMGC kinases from other protein kinases, CMGC kinases constitute the foreground, whereas all available protein kinases constitute the background (Fig. 1B
). For family or subfamily-specific constraints, which distinguish specific kinases from other CMGC kinases, the sequences corresponding to a specific family or subfamily constitute the foreground, whereas CMGC kinases constitute the background (Fig. 2
). Our analysis also allows for another category of "intermediate" constraints, which here corresponds to residue positions that are highly conserved either within CMGC kinases as a whole or within a particular CMGC family or subfamily, but that are inconsistently conserved across the categories specifically examined here. These category-specific constraints are interpreted in light of available structural data below (a summary of which is given in Table 1
). Note also that sequence fragments and other sequences that, for any reason, failed to align over the entire region of interest were eliminated from alignments.
|
We focus here on regions of the alignment within which most CMGC-conserved residues occur (Fig. 1
). Eukaryotic protein kinases contain an N-terminal ATP binding domain and a C-terminal substrate-binding domain (Knighton et al. 1991; Engh and Bossemeyer 2002); nearly all of these CMGC-specific regions occur within the C-terminal domain. For conceptual and representational clarity, we sometimes define these regions based on the clusters of conserved interactions observed in our analysis (Fig. 3A
), rather than on conventional kinase terminology. Listed by their order in the sequence, these are as follows: (1) the
C region, which essentially corresponds to the protein kinase C-helix; (2) the catalytic loop, which contains an HRD motif; (3) the activation Nt-segment, which corresponds to the N-terminal part of the activation segment (Johnson et al. 1996) up to a conformationally strained residue (see below); (4) the APE region, which corresponds to conserved residues on either side of an APE motif and which thus includes the C-terminal part of the activation segment; (5) the
F-to-
G region, which stretches from the F-helix to the G-helix and which structurally surrounds the APE region; and (6) the CMGC insert. The CMGC insert, which is discussed at length below, is absent from other protein kinases and thus is a distinctive characteristic of the CMGC group.
|
F-to-
G region. (The side chains of such residues are shaded pale magenta in Figs. 3
|
Certain interactions involving kinase-shared residues play a major role in coupling activation loop phosphorylation to the catalytic transfer mechanism (Johnson et al. 1996; Johnson and Lewis 2001). For example, an interaction between the side chain of the HRD-arginine and one of the activation loop phosphorylation sites (Fig. 3A
) helps reposition the activation loop for substrate binding (Johnson et al. 1996). Likewise, a kinase-shared threonine or serine in the APE region (T188 in Fig. 3B
) helps reposition the activation loop for catalysis by hydrogen bonding with both a lysine and a aspartate in the catalytic loop (K155 and D153 in Fig. 3B
; Madhusudan et al. 1994). These and other kinase-shared residues that serve as a structural foundation for group-specific features are considered in the context of CMGC-specific constraints in the following section.
CMGC-specific constraints
A contrast hierarchical alignment corresponding to CMGC-specific constraints is shown in Figure 1B
. Because MAPKs best conserve the CMGC canonical features, we use two MAPK subfamilies, ERK2 and p38, as prototypes of this group. Most CMGC-specific residues (the side chains of which are shaded light yellow in Figs. 3
6
) are located from just before the APE region to just beyond the
F-to-
G region (Fig. 1
). A few other CMGC-specific residues occur outside of the aligned regions shown in Figure 1
, but only two of these are discussed here: a canonical arginine (R68ERK in Fig. 5A
, below) within the C-helix and a canonical phenylalanine/tyrosine (F294ERK in Fig. 6A
, below) ~20 residues from the end of the kinase C-terminal domain.
|
interaction with the main chain of the catalytic loop (Y191p38 in Fig. 3B
interactions with a kinase-shared tryptophan (W210p38 in Fig. 3B
F helix. Furthermore, a kinase-shared serine residue (S211p38 in Fig. 3B
In nearly all CMGC kinases of known structure (apart from casein kinase 2; see below) this serine also hydrogen bonds to or contacts two buried waters, one of which, in turn, often hydrogen bonds to the main chain adjacent to the CMGC-arginine (Fig. 4
). The second buried water extends this hydrogen-bonding network to the main-chain nitrogen of a catalytic aspartate and to another kinase-shared aspartate with a side chain that, in turn, hydrogen bonds to the catalytic loop (D147ERK and D208ERK, respectively, in Fig. 4A
). These buried waters are also found in non-CMGC kinases of known structure and thus may play important structural roles in all protein kinases. Together the buried waters and kinase-shared residues thus appear to stabilize and precisely position the conformation of the main chain of CMGC-arginine relative to the
F helix and the catalytic loop (Fig. 4
).
|
The side chain of the CMGC-arginine also electrostatically interacts with and hydrogen bonds to a second activation loop phosphate moiety present in several CMGC kinase structures (Canagarajah et al. 1997; Bax et al. 2001), implying that it also plays a key role in the activation mechanism. We will call this site the "second phosphorylation site" to distinguish it from the site sensed by the HRD-arginine, which we term the "first phosphorylation site". Indeed, MAP kinases, which contain this second phosphorylation site, are fully activated only upon phosphorylation of both sites (Figs. 3A
, 5A
; Canagarajah et al. 1997), and each site has a distinct role: phosphorylation of the first site promotes ATP binding, whereas phosphorylation of both sites is required for substrate binding (Prowse et al. 2001). The second site corresponds to a CMGC-specific tyrosine (Y185ERK in Fig. 5A
) that occurs in three of the four major CMGC subgroups (i.e., MAPKs, GSKs and CLKs; Manning et al. 2002).
Cdc7 kinases, which CHAIN analysis clearly places outside of the CMGC group, also contain an arginine at the CMGC-arginine position. Notably cdc7 kinases typically possess a very large activation loop, which suggests an atypical activation mechanism, and, as do most non-CMGC kinases, conserve a glycine at the residue position corresponding to the CMGC conformationally strained position.
Other canonical features of CMGC kinases
The canonical features of CMGC kinases are clustered into several regions of the multiple sequence alignment, but by far the most striking of these extends from the activation segment to the
F-to-
G region (Fig. 1
). As described in the following sections, these residues appear to link together the substrate, the activation loop phosphorylation sites, and the CMGC-insert.
CMGC canonical residues within the activation loop and APE region
There are two CMGC canonical residues within the activation loop: the second phosphorylation site tyrosine (Y185ERK), and a valine or isoleucine (V186ERK) that directly follows this tyrosine in the sequence. This valine/ isoleucine, upon phosphorylation of the first activation loop site in ERK2, comes into contact with the aliphatic region of the HRD-arginine side chain (see R146ERK in Fig. 5A
). At the same time, it also packs against another CMGC canonical tyrosine (Y203ERK) that is within the APE region and that hydrogen bonds to the main chain of the first phosphorylation site (T183ERK; Figs. 4
, 5
). Furthermore, this valine/isoleucine directly precedes the conformationally strained position (A187ERK), and is thus sandwiched between two residue positions (i.e., Y185ERK and A187ERK) linked to activation loop conformational changes. Taken together, these observations suggest that this valine/isoleucine plays a role in CMGC kinase activation.
Within the APE region there are two other CMGC canonical residues: an arginine (R189ERK) that likely interacts with the P-2 substrate position (and thus is termed the P-2i-arginine), and an aromatic residue, which is most often a tryptophan (W190ERK), that likely interacts with the P-3 substrate position (and thus termed the P-3i-aromatic; Fig. 4
). We infer these substrate interactions based on homology to three distinct peptide-bound structures (Protein Data Bank [PDB] codes 1IR3
[PDB]
, 1JBP
[PDB]
, and 1QMZ
[PDB]
). The P-2i-ar-ginine, which is sequence adjacent to the kinase-shared serine or threonine (T188ERK) that interacts with the catalytic aspartate (D147ERK), typically hydrogen bonds to a loop between the
F and
G helices that is involved in substrate binding (Brown et al. 1999). Similar to the CMGC-arginine, it also hydrogen bonds with the second activation loop phosphate moiety, when present (Canagarajah et al. 1997; Dajani et al. 2001). Indeed, our analysis indicates that conservation of the P-2i-arginine is most highly correlated with conservation of tyrosine at the second phosphorylation site and vice versa (data not shown), suggesting a functional coupling between this arginine and phosphorylation of this tyrosine. Incidentally, for CDKs, which lack a second activation loop phosphorylation site, the P-2interacting residue is a conserved leucine instead of the canonical arginine; this leucine thus may play an important CDK-specific role.
CMGC structural features within the
F-to-
G region
The
F-to-
G region, which consists of a loop flanked by helices (Fig. 3
), contains several CMGC canonical residues that appear to link this loop to the
G helix and to the substrate-binding pocket. For example, the side chain of a canonical glutamine (Q234ERK; termed the CMGC-glutamine) within the
G helix typically hydrogen bonds to main-chain atoms located on either side of a particular loop residue that is most often a proline (P227ERK). The C
carbon of this loop residue often forms a CH
interaction with the P-3i-aromatic residue (W190ERK in Fig. 6A
). Additional canonical interactions within this region involve both kinase-shared and CMGC-specific conserved residues (as shown in Fig. 6A
E), as well as residues that are non-conserved or inconsistently conserved at the sequence level. For example, a hydrophilic residue (H230ERK) two positions beyond a CMGC-conserved glycine (G228ERK) typically serves as an N-cap for the
G helix (Fig. 6
). Similarly, the main-chain nitrogen of the residue preceding this hydrophilic position forms a hydrogen bond with the side chain of a residue that is usually an aspartate (D233ERK) and that immediately precedes the CMGC-glutamine. Together, these interactions form a structural link between the substrate-binding region and the CMGC insert via the
G helix (Fig. 6
).
An LG[ST]P motif associated with the CMGC-insert
The CMGC-specific consensus pattern LG[ST]P (corresponding to residues 242245 of mouse ERK2 in Fig. 1
) occurs just beyond the
G helix at the start of the CMGC insert. Curiously, the threonine or serine position of this pattern (S244ERK) has a high predictive probability of being phosphorylated (see Materials and Methods), although for some CMGC families, such as DYRK and related kinases, this serine or threonine is nonconserved. The glycine of this motif (G243ERK) is at the end of and perhaps also terminates the
G helix. Both the leucine and proline of this motif typically pack up against a CMGC canonical phenylalanine or tyrosine (F294ERK in Fig. 6A
; not shown in the Fig. 1
alignment) near the C-terminal end of the kinase domain. Conservation of the LG.P residues implies that they perform an important structural role, possibly linked to the adjacent CMGC insert. This insert may be involved in coprotein binding, considering that adaptor or scaffold-like proteins bind to this region in CDK2 and GSK3
(Bourne et al. 1996; Bax et al. 2001; Dajani et al. 2003) and that point mutations in the CMGC insert directly or indirectly affect binding of MEK1 to ERK2 (Robinson et al. 2002).
Links between substrate recognition, activation, and the CMGC insert
The CMGC-arginine and the nearby P-2i-arginine are predicted to interact with bound substrate and with the second activation loop phosphorylated site. Similarly, other CMGC canonical residues directly interact either with one of the activation loop phosphorylation sites or with residues that do. Still other CMGC-specific residues are located between the CMGC insert and the substrate or the activation loop (Figs. 4
6
). What function or mechanism is responsible for the selective constraints acting on these residues? One possibility is that they couple coprotein binding to substrate recognition and kinase activation. Consistent with this notion, hydrogen exchange experiments (Lee et al. 2004) show that substrate docking results in conformational mobility within the P + 1 binding pocket, the
F-to-
G region, and the CMGC insert region of ERK2. Similarly, phosphorylation of the ERK2 activation loop induces conformational changes within the CMGC insert (Canagarajah et al. 1997). CMGC-specific features likewise may play a role in the gated behavior observed for CDKs and MAPKs (Adams 2003), in which phosphorylation induces an activation loop switch from an unfavorable to a favorable substrate binding conformation. Specific aspects of such a mechanism would, of course, depend on the particular CMGC subgroup or familyevolutionary analysis of which may likewise provide useful clues.
Structural features specific to CMGC subgroups
Phylogenetic analysis (Manning et al. 2002) classifies CMGC kinases into four major families: CDKs, MAPKs, GSKs, and CLKs. CHAIN analysis of these reveals family-specific features, which we explore in the light of CMGC canonical features.
MAPK-specific constraints
The MAPKs (Johnson and Lapadat 2002), which include p38 and ERK2, best conserve CMGC canonical features and thus serve as a prototype. The p38 kinases regulate cytokine signaling, whereas ERK2 regulates mitosis, meiosis, and postmitotic functions in differentiated cells. MAPKs are highly P + 1 proline directed Ser/Thr kinases, which is consistent with a role for the CMGC-arginine in proline recognition (see above). High ERK2 activity requires phosphorylation of two activation loop residues (T183 and Y185 in Fig. 5A
; Robbins et al. 1993).
A phenylalanine (F181ERK) within the activation loop and a tyrosine (Y231ERK) at the N-terminal end of the
G helix most distinguish ERK2 and closely related kinases from other CMGC kinases (Fig. 2A
). In the inactive form, the phenylalanine packs up against the CMGC insert, whereas in the active form, it swings away from the CMGC insert and becomes substantially more solvent exposed (data not shown), which suggests a function related to the CMGC insert. A phosphorylated form of the tyrosine could interact with the CMGC arginine and with the P-2i-arginine more or less as does the tyrosine at the second activation loop phosphorylation site (Y231ERK in Fig. 5A
)perhaps thereby altering substrate specificity or rate of catalysis.
CDK-specific constraints
Distinct cyclin-dependent kinases are sequential activated by cyclins and thereby regulate the ordering of events associated with DNA replication and cell division (for review, see Endicott et al. 1999). A prominent feature distinguishing CDKs from other CMGC kinases is the consensus pattern EG.P.T (residues 4247 of CDK2 in Fig. 2B
). This pattern directly precedes and partially overlaps with the cyclin-interacting
C helix that corresponds to the consensus pattern PSTAIRE. The function of the PSTAIRE residues in mediating interactions with cyclin are clear from the structures of cyclin-bound CDKs (Russo et al. 1996; Brown et al. 1999). Nevertheless, CHAIN analysis assigns the strongest CDK-specific constraint to the glycine of the EG.P.T pattern (Fig. 2B
) and, likewise, assigns the strongest cyclin A constraint (data not shown) to a lysine residue (K266) that hydrogen bonds to main-chain oxygens on either side of this glycine. Thus the interaction between these two residues seems quite important.
Another feature distinguishing CDKs from other CMGC kinases is the consensus pattern WP within the CMGC insert (residues 227228 of CDK2 in Fig. 2B
). The tryptophan of this pattern packs up against the kinase C-terminal domain proper and thus may serve as an important link to the CMGC insert. Another distinguishing feature of many CDKs is replacement of the CMGC canonical P-2i-arginine with another conserved residue, typically a leucine (L166CDK2 in Figs. 4C
, 5C
). Notably, CDKs typically lack the second phosphorylation site and thus may not require the P-2i-arginine for phosphate binding. The CDK7 subfamily, however, retains the canonical P-2i-arginine at this position, even though it also lacks a second phosphorylation site. A similar arrangement occurs within SR protein kinases (SRPKs), which may use an alternative activation mechanism involving a surrogate phosphorylation site donated by the substrate (see below).
The CDK2 subfamily manifests specific features absent in other CDKs. One such feature is a conserved arginine (R122CDK2) that is highly buried upon binding to cyclin and that, in the cyclin-bound form, forms a salt bridge with a glutamate (E57CDK2) that is also highly conserved within the CDK2 family (alignments and structures not shown). Also in place of the second phosphorylation site, CDK2 has a highly conserved glutamate (E162CDK2 in Fig. 5B
) that potentially could replace the phosphate electrostatic interaction with the CMGC-arginine; however, such an interaction is observed in only one of the known active form structures of CDK, namely, phospho-CDK2/cyclin A bound to a recruitment peptide (PDB 1h24
[PDB]
; Lowe et al. 2002).
GSK-specific constraints
GSK-3 is a key regulator of glycogen metabolism, the Wnt signaling pathway, protein synthesis, and cell proliferation and differentiation (for review, see Grimes and Jope 2001; Harwood 2001; Doble and Woodgett 2003). Efficient phosphorylation of many of its substrates requires prior phosphorylation at the substrate P + 4 position by another kinase (Dajani et al. 2001; Frame et al. 2001). GSK-3 lacks the first activation loop phosphorylation site, which is located near where this preprimed substrate site is likely to occur. Moreover, in the active structure of GSK3
, both the HRD-arginine and a CMGC-specific arginine interact with a sulfate ion (Figs. 4C
, 5C
), which can mimic a phosphate and is predicted to occupy the same site as the P + 4 phosphate in the preprimed substrate (Dajani et al. 2001; ter Haar et al. 2001). This preprimed substrate thus may serve as a surrogate for the first activation loop phosphorylated site. Likewise, a GSK-specific lysine (K205GSK3
) occurs within the activation loop and, in the active form, hydrogen bonds to a sulfate ion. Because this position corresponds to a conserved arginine that interacts with the first phosphorylated site (R170ERK2 and R150CDK2 in Fig. 5
), this lysine seems likely to interact with the substrate preprimed phosphate as well (Fig. 5C
). It also hydrogen bonds to a GSK-specific conserved asparagine (N213 in Fig. 5C
) that corresponds to the first phosphorylation site threonine in other CMGC kinases. GSK3
also contains a second activation loop (tyrosine) phosphorylation site (Bax et al. 2001).
Some of the strongest GSK-specific constraints are imposed on a tight cluster of conserved residues, namely, Q89, R92, F93, K94, and N95 (Fig. 2C
). These residues seem likely to perform a regulatory role, as this pattern occurs just before a CMGC-specific arginine (R96GSK3
) that typically hydrogen bonds to the phosphate of the first phosphorylation site. Two of the strongest GSK-specific constraints (Fig. 2C
) correspond to two cysteines: one (C218) at the strained position that interacts with the CMGC-arginine (Fig. 4C
, inset) and another (C178) directly before the HRD motif. The side-chain sulfur of this second cysteine (data not shown) packs up against two hydrogen bonds that are conserved in essentially all protein kinases and that are formed between the HRD catalytic loop and a conserved aspartate in the
F helix (D239GSK3
in Fig. 4C
). Thus, this cysteine may influence the chemical nature of these critical interactions.
CLK kinases
The CDK-like kinases are functionally diverse and conserve fewer of the CMGC canonical features. In addition, they generally lack the HRD-arginine and instead often harbor another highly conserved residue at that position. Instead of the hydrophobic residues usually found in other CMGC-kinases at the conformationally strained position (see above), CDK-like kinases typically contain glutamine or serine, the polar side chains of which can form hydrogen bonds. Within this group we specifically examine SRPKs and DYRKs.
SR protein kinases
SRPKs phosphorylate SR dipeptide repeats in RNA processing factors (Colwill et al. 1996). Unlike MAP and CDK kinases, SRPKs do not have a strong requirement for proline at the P + 1 position but rather typically accommodate arginine there. They exhibit a level of constitutive activity but may require a preprimed substrate phosphate for optimum in vivo activity, considering that the structures of SRPKs contain a sulfate ion (which can mimic a phosphate; Nolen et al. 2001, 2003) near its predicted site of interaction with the substrate P + 2 position. This situation thus is analogous to that of GSK3
, the substrate of which must be prephosphorylated for recognition (see above).
To explore the structurally feasibility of an interaction between the P + 2 phosphorylated substrate serine and the CMGC- and P-2i-arginines, both of which are conserved in SRPKs, we constructed an homology model of SRPK with bound substrate (see Materials and Methods). This model indeed suggests that the previously phosphorylated P + 2 substrate position, which is likely to be in roughly the same structural location as a second activation loop phosphate, may function as a surrogate activation site phosphate (Fig. 5E
). This P + 2 phosphate may function to ensure processive phosphorylation of SR proteins (Aubol et al. 2003) rather than or in addition to activating the kinase. Another distinctive feature of SRPKs with a possible role in recognition of the substrate P + 2 phosphate is the insertion of six additional residues within the
F-to-
G region (these correspond to positions 231233 of ERK2 in the hierarchical alignment of Fig. 1
). Some of these inserted residues are located very near the CMGC-arginine and the P-2i-arginine (disordered region in Fig. 6E
), and all of these are near the proposed surrogate phosphorylation site.
SRPKs substantially diverge from canonical features relative to other CMGC kinases. In particular, the P-3i-aromatic residue is typically replaced by a conserved glutamine (position 569 in Fig. 2D
). The yeast Sky1p kinase, is unusual, however, inasmuch as it contains a glutamate instead of a glutamine at this position (E569Sky1p in Fig. 4D
). This may be due to the fact that the Saccharomyces cerevisiae genome lacks SR protein encoding genes, implying that Sky1p is a paralog rather than an ortholog of SRPKs. In any case, this glutamine (termed here the SKPK-specific glutamine) appears well situated to interact with a substrate P-2 serine, as is shown in the homology model for substrate-bound SRPKs (Fig. 5E
).
Another divergent, highly conserved SRPK feature is the replacement of the CMGC-glutamine by a histidine (H618Sky1p in Fig. 2D
), a substitution also observed for the SRPK-related Lammer and CDC-like kinases, many of which are also known to phosphorylate serine/arginine rich substrates (Nikolakaki et al. 2002). Unlike the CMGC-glutamine, the Sky1p histidine fails to interact with the main chain of the substrate interaction loop and instead interacts with a buried water (Nolen et al. 2001, 2003). This water, in turn, hydrogen bonds to the side chain of the kinase-shared tryptophan (W588Sky1p in Fig. 6F
) within the
F helix and to the main chain of the APE region and thus, together with H618Sky1p, forms a network of precise interactions positioning key residues within the APE loop (Fig. 6F
).
A feature of SRPKs possibly related to substrate specificity is a highly conserved glutamine (Q566Sky1p in Fig. 2D
) at the conformationally strained position with a main chain that typically hydrogen bonds to the side chain of the CMGC-arginine. A comparison of the Sky1p structure with those of other CMGC kinases reveals that the side chain of this glutamine displaces the buried water that forms hydrogen bonds to main-chain atoms on either side of the CMGC-arginine (Fig. 4
, insets), resulting in a different geometric arrangement. More specifically, both the side-chain oxygen and nitrogen of this glutamine hydrogen bond to the main-chain atoms on either side of the CMGC-arginine. The side-chain nitrogen also hydrogen bonds to a main-chain oxygen directly preceding a kinase-shared aspartate (D586Sky1p in Fig. 4D
) that, in turn, hydrogen bonds to the main chain of the catalytic loop. Together, these interactions thus displace the hydrogen bonds typical of those CMGC kinases containing water at this position and, as a result, appear to reposition the catalytic loop and APE region relative to each otherpresumably, in a manner more favorable to the specific function of SRPK.
Yet another feature possibly related to SR specificity is a SRPK-specific asparagine (N553 in Fig. 2D
) directly following the protein kinase DFG motif (which, in fact, is most often DLG in SRPKs). This asparagine is predicted to pack up against the substrate P + 1 position (Fig. 5E
), given that, in the structure of CDK2 bound to substrate, the corresponding residue, which is a leucine, extensively packs up against the proline at the substrate P + 1 position. Indeed, the area of contact of the leucine with the P + 1 proline is greater than that of any other residue in CDK2. This SRPK asparagine thus may perform an analogous role in substrate P + 1 arginine recognition.
There is anecdotal evidence, however, that SRPKs favor both arginine and proline at the substrate P + 1 position (Colwill et al. 1996). For example, Npl3p, a budding yeast shuttling protein that is the natural substrate of Sky1p, is phosphorylated by mammalian SRPK1 on a serine followed by a proline (Gilbert et al. 2001). Sky1p likewise can phosphorylate serine residues within RS domains of mammalian proteins (Nolen et al. 2001), which are substrates of mammalian SRPKs. Furthermore, a pattern-based analysis of SR repeat regions within SR proteins (see Materials and Methods) reveals a highly elevated propensity for both arginine and proline at the ambiguous position (x) within the pattern S-R-x, as follows:
| ||||||||||||||||||||||||
This implies a strong selective pressure for proline following RS patterns within RS domains. One possible explanation for this is that proline is also favored at the P + 1 substrate position by SRPKs. This hypothesis also helps explain conservation in SRPKs of the CMGC-arginine.
DYRK and DYRK-like kinases
Dual specificity tyrosine phosphorylated and regulated kinases (DYRKs) phosphorylate serine, threonine, and tyro-sine residues andthough possessing significant constitutive activityare fully activated only after autophosphorylation on a Y-x-Y pattern corresponding to their two activation loop phosphorylation sites (Becker and Joost 1999). Upon full activation, they are specific to substrates with either proline or arginine at the P + 1 position (Himpel et al. 2000; Campbell and Proud 2002).
As for SR kinases, a distinguishing feature of DYRKs is a glutamine (Q323DYRK in Fig. 2E
) at the conformationally strained position within the P + 1 binding pocket. The function of this residue may thus be similar to that in SR kinases. Notably, mutation of this glutamine to asparagine within one DYRK had as great an effect on catalytic activity as mutation of the second phosphorylation site tyrosine to phenylalanine (Wiechmann et al. 2003). Another distinguishing feature is replacement of the HRD-arginine with cysteine (C286DYRK). A homology model of the active form of DYRK, based both on CMGC canonical features and known active conformation structures, suggests that this cysteine might stabilize the activation loop through disulphide bond formation with another DYRK-specific cysteine located nearby in the hypothetical structure (C312DYRK in Fig. 5F
).
Casein kinase 2
CK2 is the only family within the CMGC group that replaces the CMGC-arginine with a lysine. This may allow phosphorylation of substrates with either proline or nonproline at the P + 1 position due to the side-chain flexibility of lysine, which allows hydrogen bonding to the main-chain oxygen at the strained position, as does the CMGC-arginine, yet can accommodate alternative hydrogen bonds as well. A glycine residue (G199) that directly follows this lysine and that likewise is subject to strong CK2-specific constraints (Fig. 2F
) may contribute to the inherent conformational flexibility of the CK2 lysine by allowing a greater range of main-chain conformations.
CK2
lacks the same buried water that is absent in SR-PKs, namely, the water that hydrogen bonds to the main chain near the CMGC-arginine position. (Out of nine available CK2
structures, only the structure of a, possibly functionally deficient, C-terminal deletion mutant [Ermakova et al. 2003] contains a water molecule at this position.) In SRPK this water cavity is occupied by the side-chain atom of a glutamine located at the "strained position" (Fig. 4E
), but in CK2
s this water cavity typically overlaps with a side-chain methyl group of a CK2
-specific leucine (L213CK2
in Fig. 4D
, inset). Notably, although leucine is invariant or nearly invariant at this position in CK2
, it apparently never occurs at this position in other CMGC kinases but rather is typically replaced by a CMGC-specific isoleucine or valine, neither of which prohibits the buried water. It thus appears that even conservative replacement of this leucine by isoleucine or valine or vice versa is highly selected against in these families, implying that even very subtle amino acid differences may have profound effects on protein function. Unlike isoleucine or valine, which apparently forms a C
-H hydrogen bond to the oxygen of the water at this position, leucine is incapable of such a bond. This leucine thus may contribute to the broad substrate specificity of CK2
by relaxing CMGC-canonical interactions near the active site.
A well-conserved CK2-specific glutamate (E180 in Fig. 2F
) seems likely to play a role in N-terminalmediated regulation of the activation loop. In many other CMGC kinases, this residue corresponds to an activation loop arginine that participates in binding to the phosphate of the first activation loop phosphorylation site (Fig. 5
). This glutamate instead hydrogen bonds to an invariant tyrosine (Y23) within the N-terminal region of CK2 (data not shown) and directly precedes three CK2-specific aromatic residues that likewise interact with the N-terminal region (data not shown).
Conclusion
CMGC canonical residues appear to couple kinase activation and substrate recognition to substrate and coprotein binding, whereas both variation of and additions to these features within individual subcategories presumably contribute to CMGC functional specialization. Conserved residues generally shared by all protein kinases and buried waters located below the APE region appear to play important roles in precise geometric positioning of key CMGC-specific residues. Our analysis suggests hypothetical roles for these residues and provides guidance for mutational studies to explore these rolesincluding, for example, conversion of the CMGC-glutamine to glutamate to explore the role of the side-chain nitrogen or mutation of the CK2 leucine to isoleucine. Similar conservative mutations aimed at broadening our understanding of CMGC kinase function can readily be proposed.
| Materials and methods |
|---|
|
|
|---|
Other structural and sequence analysis procedures
Protein hydrogen atoms were added to structural coordinates by using the Reduce program (Word et al. 1999); to add hydrogens to water, we used the method of Hooft et al. (1996). Homology models based on our analysis were manually constructed and optimized by using the RAMP suite of programs (Samudrala et al. 2000) and the O program (Jones 1978). In particular, homology models for DYRK were based on CDK2 (PDB 1qmz
[PDB]
), ERK2 (PDB 2erk
[PDB]
), and Sky1p (PDB 1how
[PDB]
). Structural images were created by using Rasmol (Sayle and Milner-White 1995). Ramachandran plots were examined by using the program PROCHECK (Morris et al. 1992). Secondary structure assignments were made by using the DSSP program (Kabsch and Sander 1983). Phosphorylation site predictions in kinase sequences were performed by using the NetPhos program (Blom et al. 1999). Statistically significant amino acid patterns associated with SR proteins were examined by using the ASSET program (Neuwald and Green 1994). Structural alignments were performed by using the CE program (Shindyalov and Bourne 1998), as previously described (Neuwald 2003).
Sequences displayed in alignments
National Center for Biotechnology Information (NCBI) sequence identifiers for the CMGC alignments in Figure 1
are as follows: ERK2-mouse (2ERK
[PDB]
), 6754632; ERK2-green algae, 11275338; ERK1-slime mold, 1169550; MAPK-fission yeast, 19113755; P38
(1CM8A)-human, 8569500; CDK2-human (1FINA), 16936528; CDC21-rice, 231706; CDC2-like-slime mold, 461704; CDK2-green/blue mold, 2499588; GSK3
-human (1IO9A), 20455502; GSK3
-bread mold, 32405824; Shaggy PK4-petunia, 1076649; GSK3-slime mold, 1730041; CK2-human (1JWHA), 20150571; CK2
2-maize, 11527006; CK2-aerobic yeast, 1694914; CK2
-Leishmania, 10046857; DYRK1b-mouse, 12054926; DYRK-slime mold, 28829499; SRPK2-mouse, 18043214; SRPK1-slime mold, 28829647; Sky1p-budding yeast (1HOWA), 6323872; and SRPKL-thale cress, 11259819.
NCBI sequence identifiers for the ERK2 alignment in Figure 2A
are as follows: Erk2-rat, 3318705; Erk2-sea hare, 1110512; ErkA-fruit fly, 17977692; MAPK-sea urchin, 24286498; ERK1-roundworm, 32564571; MAPK1-bread mold, 32421451; MAPK-smut fungus, 6457281; ERK1-slime mold, 1362214; MAPK-green algae, 11275338; MAPK(Nrk1)-tobacco, 12718824; and EST-blood fluke, 28325407.
NCBI sequence identifiers for the CDK2 alignment in Figure 2B
are as follows: Cdk2-human, 16936528; Cdc2-rice, 231706; Cdk2-sea urchin, 2956719; Cdk2-slime mold, 461704; Cdk2-green/blue mold, 2499588; Cdk2-sporozoan, 1420882; Cdk2-fruit fly, 115918; Cdk2-paramecium, 4959457; Cdk1-roundworm, 17554940; Cdk1-sponge, 21304629; Cdc2-Giardia, 29409213; and Cdc2-trypanosome, 1705673.
NCBI sequence identifiers for the GSK alignment in Figure 2C
are as follows: GSK3
-human, 24987247; GSK3
-sea urchin, 2959981; shaggy-fruit fly, 103318; GSK3
-roundworm, 17509723; GSK3-hydra, 10178642; Shaggy4-petunia, 1076649; GSK3
-bread mold, 32405824; Gsk3
-slime mold, 1730041; GSK3
-Plasmodium, 23957759; shaggy4-algae, 13811965; shaggy-Pyrocystis, 27450763; shaggy6-Giardia, 29249328; EST-green algae, 15697726; and EST-tapeworm, 22789432.
NCBI sequence identifiers for the SRPK alignment in Figure 2D
are as follows: Sky1p-budding yeast, 6323872; SRPK2-mouse, 18043214; SRPK1-fruit fly, 10242347; SRPK-like-thale cress, 11259819; SRPK1-roundworm, 1353067; EST-red alga, AV432962_EST; SRPK1-like-slime mold, 28829647; EST-diatom, CD381433_EST; SRPK-trypanosome, 27447393; and SRPK-Plasmodium, 14578289.
NCBI sequence identifiers for the DYRK alignment in Figure 2E
are as follows: Dyrk1b-human, 18765754; Dyrk1b-mosquito, 31226065; DYRK-slime mold, 28829499; DYRK2-human, 4503427; DYRK-roundworm, 7507072; DYRK-trypanosome, 19263269; DYRK-bread mold, 32416200; DYRK-like-fission yeast, 2130387; MBK-fruit.fly, 24642876; MBK2-roundworm, 7503839; YAK1-amoeba, 17980211; YakA-slime mold, 7489897; and Yak1-thale cress, 15239248.
NCBI sequence identifiers for the CK2 alignment in Figure 2F
are as follows: CK2-human, 20150571; CK2
-owlet moth, 13628721; CK2
-sea urchin, 7209841; CK2
-rice, 22831318; CK2
-roundworm, 17505290; CK2
-bread mold, 30580436; CK2
-slime mold, 28830167; CK2
-Theileria, 125272; CK2
-trypanosome, 14532298; CK2
-Paramecium, 13940371; CK2
-blood fluke, 28354096; CK2
-Giardia parasite, 29245163; CK2
-microsporidia, 19173691; and EST-red algae, 8588611.
Crystal structures used in our analysis
The crystal structure coordinate files used for the figures were obtained from PDB (Berman et al. 2000) and have the following identifiers: 1FIN
[PDB]
(Jeffrey et al. 1995), 1JWH
[PDB]
(Niefind et al. 2001), 1CM8
[PDB]
(Wang et al. 1997), 2ERK
[PDB]
(Canagarajah et al. 1997), 1QMZ
[PDB]
(Brown et al. 1999), 1GNG
[PDB]
(Bax et al. 2001), 1DS5
[PDB]
(Battistutta et al. 2000), 1HOW
[PDB]
(Nolen et al. 2001), and 1LP4