Protein Science
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Protein Science (2005), 14:1091-1103. Published by Cold Spring Harbor Laboratory Press. Copyright © 2005 The Protein Society
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Research Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Wouters, M. A.
Right arrow Articles by Dunwoodie, S. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Wouters, M. A.
Right arrow Articles by Dunwoodie, S. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Evolution of distinct EGF domains with specific functions

Merridee A. Wouters1,5, Isidore Rigoutsos3,4, Carmen K. Chu1, Lina L. Feng1, Duncan B. Sparrow2 and Sally L. Dunwoodie2,5,6

1 Computational Biology and Bioinformatics Program, and 2 Developmental Biology Program, Victor Chang Cardiac Research Institute, Sydney, NSW 2010, Australia3 Bioinformatics and Pattern Discovery Group, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA4 Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA5 Schools of Biotechnology & Biomolecular Sciences, and Medical Sciences, and 6 St. Vincent’s Clinical School, University of New South Wales, NSW 2052, Australia

Reprint requests to: Merridee A. Wouters, Victor Chang Cardiac Research Institute, 384 Victoria St., Darlinghurst 2010, Sydney, NSW, Australia; e-mail: m.wouters{at}victorchang.unsw.edu.au; fax: +61-2-9295-8501.

(RECEIVED November 9, 2004; FINAL REVISION December 21, 2004; ACCEPTED December 22, 2004)


    Abstract
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
EGF domains are extracellular protein modules cross-linked by three intradomain disulfides. Past studies suggest the existence of two types of EGF domain with three-disulfides, human EGF-like (hEGF) domains and complement C1r-like (cEGF) domains, but to date no functional information has been related to the two different types, and they are not differentiated in sequence or structure databases. We have developed new sequence patterns based on the different C-termini to search specifically for the two types of EGF domains in sequence databases. The exhibited sensitivity and specificity of the new pattern-based method represents a significant advancement over the currently available sequence detection techniques. We re-annotated EGF sequences in the latest release of Swiss-Prot looking for functional relationships that might correlate with EGF type. We show that important post-translational modifications of three-disulfide EGFs, including unusual forms of glycosylation and post-translational proteolytic processing, are dependent on EGF subtype. For example, EGF domains that are shed from the cell surface and mediate intercellular signaling are all hEGFs, as are all human EGF receptor family ligands. Additional experimental data suggest that functional specialization has accompanied subtype divergence. Based on our structural analysis of EGF domains with three-disulfide bonds and comparison to laminin and integrin-like EGF domains with an additional inter-domain disulfide, we propose that these hEGF and cEGF domains may have arisen from a four-disulfide ancestor by selective loss of different cysteine residues.

Keywords: EGF; bidirectional signaling; fucosylation; RIP; shedding; HER tyrosine receptor kinase ligand; hydroxylation

Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.041207005.


    Introduction
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
EGF domains are modular protein subunits found singly or in tandem, mostly in the extracellular milieu, where they are involved in a diverse array of functions. Intercellular signaling mediated by EGF-containing ligands and their cognate receptors are important regulators of growth and development. Many EGF-containing molecules, including human epidermal growth factor receptor family (HER tyrosine kinase receptor) ligands such as EGF and TGF-{alpha}, are shed from the surface by extracellular proteases, resulting in longer range extracellular signaling (Harris et al. 2003). Some transmembrane proteins, such as Spitz, undergo cleavage by intramembrane proteases resulting in intracellular signaling cascades including release of transcription factors (Urban and Freeman 2002). In the case of the Notch receptor, ligand-stimulated intracellular cleavage results in release of a transcriptional factor (Lai 2004). The ligand can also be cleaved releasing a domain that localizes to the nucleus and inhibits Notch signaling (Kiyota and Kinoshita 2004). EGF domains are also found in components of the blood coagulation system including factors VII, IX, X, protein C and thrombomodulin where they may mediate interactions between the various components and thus have an adhesive function (Stenflo 1991). Because of their prevalence in long arrays, often combined with other domains in a mosaic fashion, it has been suggested that EGF domains also play an important structural role as a spacer at the cellular level (Campbell and Bork 1993).

Structurally, the EGF domain is typically described as a small domain of 30–40 amino acids primarily stabilized by three disulfides with disulfide connectivity ababcc (1–3,2–4,5–6) (Fig. 1AGo). The domain consists of two {beta}-sheets, usually referred to as the major (N-terminal) and minor (C-terminal) sheets. The half-cystines of the abc motif are arranged in a triangle on the major sheet (Fig. 1BGo).



View larger version (45K):
[in this window]
[in a new window]
 
Figure 1. Disulfide signature of the EGF motif. (A) Arrangement of half-cystines in the primary structure and their connectivity. The colored half-cystines correspond to those shown in the secondary structure in B and C. (B) Arrangement of the triad of half-cystine residues on the major sheet. The residue at X is a discriminator of subtypes 1 and 2 (Fig. 2AGo). In subtype 1, X is typically Gly. In subtype 2, it is a large residue. f indicates the attachment site of O-fucose to serine and threonine residues. (C) The two alternate positions of half-cystine cC within the minor sheet which discriminate between the two major types of three-disulfide EGFs. In the hEGF subunits, half-cystine cC is located two residues N-terminal to the first residue participating in regular backbone hydrogen bonds on the second strand. The cC + 1 residue (X), which is often Glu (Fig. 3AGo), may form part of a {beta}-bulge, and its cross-strand partner is typically Gly. In the cEGF subunits, half-cystine cC is found on the sheet in the alternate register, where it is not hydrogen bonded to its cross-strand partner. Its partner is a large residue, often Leu (L), which forms part of an S3 class bulge (Chan et al. 1993). The figures are based on HERA diagrams (Methods) of P-selectin (1fsb) (B), and the hEGF in C; factor VII (1dan) for the cEGF in C; and laminin (1klo-domain 2 for the four disulfide (4 S-S) EGF) in C. N = N-terminus; C = C-terminus; hydrogen bonds are indicated by arrows with the arrowhead representing the amide. Large gray arrows indicate the direction of the protein chain. (D) Superposition of the hEGF and cEGF subunits based on the triad of half-cystine residues (C{alpha}s shown as balls) of the major sheet. The superposition shows a clear differentiation between the hEGF (green) and cEGF (magenta groups), particularly in the minor sheet (lower half). hEGF structures shown (green) 1ijq, 1dan EGF domain 2, 1fjs, 1aut EGF domain 2, 1dx5 EGF domain 3, 1hj7 EGF domain 1; cEGF structures shown (magenta) 1tpg, 1eqg, 1edm, 1dan EGF domain 1, 1g1t, 1fsb, 1jl9, 1xdt.

 
Two different types of three-disulfide EGF domains can be differentiated on the basis of the location of the C-terminal half of disulfide c (half-cystine cC) in the structure (Bersch et al. 1998). In human EGF-like (hEGF) domains, half-cystine cC is located in the turn of the {beta}-hairpin that comprises the minor sheet (Fig. 1CGo). In C1r-like (cEGF) domains, half-cystine cC is located on the minor sheet itself. In addition to the different position of disulfide c within the secondary structure, the two types of subunits differ in other respects such as the shape and orientation of the minor sheet (Bersch et al. 1998).

In addition to these three-disulfide EGFs, crystal structures of two four-disulfide EGF domains, laminin and integrin, have been solved (Stetefeld et al. 1996; Xiong et al. 2001). Four-disulfide EGF domains have an additional interdomain disulfide (disulfide d) as well as the three intradomain disulfides. In both laminin and integrin, the N-terminal half of disulfide d (dN) is located two or three residues C-terminal to half-cystine cC. Laminin and integrin differ in the location of the C-terminal half-cystine of disulfide d (dC) within the adjacent C-terminal domain. Tandem arrays of laminin EGF modules form stiff rods with each subunit adding 30 Å to the rod (Yurchenco and Cheng 1993; Yurchenco 1994).

Here we describe a structural analysis of EGF domains which compares the two major types of three-disulfide EGF domains to more recently acquired structures of four-disulfide EGF domains. We have derived sequence descriptors for the two major types of three-disulfide EGFs which allow automated detection in sequence databases. By re-annotating Swiss-Prot and correlating the results with experimental data, we present evidence that suggests that the divergence of EGF subtypes has been accompanied by functional specialization.


    Results
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
Structural analysis of EGF domains identifies distinct subtypes
Structural comparison of three-disulfide EGFs clearly suggests the existence of two discrete types of EGF domains as previously noted by Bersch et al. (1998) (Fig. 1DGo). In addition to the clear differences in tertiary structure, the two major types of three-disulfide EGFs can be discriminated by the two alternate positions of half-cystine cC within the secondary structure of the minor sheet (Figs. 1CGo, 2AGo; Table 1Go). In hEGF subunits, half-cystine cC is located in the {beta}-turn of the minor sheet, whereas in cEGFs half-cystine cC is located on the second strand of the sheet itself. Laminin and integrin EGFs resemble hEGFs both in their tertiary structure and the location of half-cystine cC in the {beta}-turn of the minor sheet (Figs. 1CGo, 2Go). The two possible positions of half-cystine cC occupy alternate registers with respect to hydrogen bonding in the sheet. The half-cystine cC of cEGFs is one residue N-terminal to the first residue participating in regular backbone hydrogen bonds on the second strand (the non-H-bonded or "wide" site) (Wouters and Curmi 1995; Fig. 3AGo). Half-cystine cC of hEGFs is two residues N-terminal to the equivalent residue. In the four-disulfide EGFs from laminin another half-cystine, dN, is mostly found three residues C-terminal to half-cystine cC in the non-H-bonded register of the sheet (Figs. 2BGo, 3AGo). This may have interesting implications for the evolution of EGF domains as discussed below.



View larger version (74K):
[in this window]
[in a new window]
 
Figure 2. (A) Superposition of the hEGF/cEGF groups using the hydrogen bonds of the minor sheet. Half-cystine cC of the hEGF group (green) is one residue out of phase in the sequence with half-cystine cC of the cEGF group (magenta). The C{alpha} of half-cystine cC in each protein is depicted as a ball of the same color as the backbone. The two groups of half-cystines occupy different registers with respect to the hydrogen bonds in the sheet. Four-disulfide EGF domains (cyan) adopt a hEGF-like conformation. (B) Same superposition showing the longer loop structure of the cEGFs (magenta) compared to the hEGFs (green) and four-disulfide EGF domains (cyan). The position of the C{alpha} of half-cystine cC (depicted as a yellow ball) is constrained by the disulfide linkage to the major sheet. The nearby half-cystine dN (orange) of the four-disulfide EGFs integrin and laminin (cyan) occupies the same {beta}-sheet register as the cEGF group. Could cEGFs have been generated from a four-disulfide ancestor by loss of half-cystine cC (yellow) leading to capture of nearby half-cystine dN (orange) and generation of the longer cEGF loop length? Imagine grasping the C-terminus of the green and cyan peptide backbones and moving the orange balls up to the position of the yellow balls. hEGF structures show (green) 1ijq, 1dan EGF domain 2, 1fjs, 1aut EGF domain 2, 1dx5 EGF domain 3, 1hj7 EGF domain 1; cEGF structures show (magenta) 1tpg, 1eqg, 1edm, 1dan EGF domain 1, 1g1t, 1fsb, 1jl9, 1xdt. Four-disulfide EGF domains show (cyan) 1klo EGF domains 1, 2, and 3.

 

View this table:
[in this window]
[in a new window]
 
Table 1. Summary of properties of three-disulphide EGF types
 


View larger version (61K):
[in this window]
[in a new window]
 
Figure 3. (A) Structure-based alignment of EGF domains showing the three different three-disulfide EGF subtypes and the two different four-disulfide domains, laminin and integrin. The single structural example of a plant EGF, garlic alliinase, is shown below these. Loops that discriminate between the various subtypes are indicated above. The cN–cC loop discriminates between the hEGF and cEGF groups. In the hEGF group, there are typically eight residues between the two half-cystines of disulfide c, whereas the cEGF group typically has 10–13. The conserved H-bonded pair of residues in the minor sheet is indicated below the alignment with arrows. The bN–aC loop, which may be fucosylated in subtype 1, discriminates between the 1/2 subtypes. The loop is three residues long in subtype 1 and longer in subtype 2. Fucosylation and hydroxylation sites are indicated with f and h, respectively. Residue discriminators are indicated with an asterisk. For example, the aC – 2 residue is a good discriminator for the 1/2 subtypes, and the cN + 6 residue is Gly in the hEGF subunits. Residues which are conserved between hEGFs and cEGFs, indicated with an X, include the cN + 3 residue which is generally Gly, and the cN + 4 residue, which is often F/Y. Abbreviations: FVII, FIX, FX, factors VII, IX, X; pghs, prostaglandin H2 synthase; MSP, merozoite surface protein; thrombo, thrombomodulin. (B) Equivalent half-cystines of different types of EGF domains as proposed by the disulfide-capture model. In both laminin and integrin-like EGFs, half-cystine dN is found three residues C-terminal to half-cystine cC and half-cystine dC is located in a tandem C-terminal EGF domain. Laminin and integrin differ in the location of the half-cystine dC within the adjacent domain. In laminin EGFs, dC is located prior to half-cystine aN. In integrin EGFs, dC is found between half-cystine bN and half-cystine aC. (C) Hypothetical model of EGF module evolution based on disulfide connectivity. (D) Alignment of calcium-binding linker regions between tandem EGF domains. Consideration of the nature of the N-terminal EGF domain (hEGF/cEGF) allows gapless alignment of linker regions. (E) Distribution of disulfide c loop lengths in EGF modules detected in Swiss-Prot 43.

 
Our structural analysis further suggests that the cEGF type can be differentiated into two subtypes (1 and 2) based on structural features between the N-terminal half-cystine of disulfide b (bN) and the C-terminal half-cystine of disulfide a (aC). In cEGF-2 domains, there are usually three residues between half-cystines bN and aC (Fig. 3AGo); and the residue two residues before aC is usually a large side chain (X in Fig. 1BGo). In cEGF-1 domains, there are four or more residues between half-cystines bN and aC; and the residue two residues before aC is Gly or another small residue. Based on these criteria all hEGFs examined were exclusively of subtype 1. Further evolutionary diversification of these identified subtypes was also evident. For example, a portion of the domains of the cEGF-2 subtype which are involved with blood clotting, have a classic bulge (Chan et al. 1993) on the N-terminal strand of the major sheet (C1r, protein C, factor X).

An all-against-all structural comparison showed that ~ 60% of EGF domains of known structure cluster into these three groups—hEGF, cEGF-1, and cEGF-2—when superimposed semi-automatically. EGFs that fall outside these clusters generally have atypical loop lengths that unduly influence the superposition due to the lack of a true hydrophobic core in these tiny domains. An alignment of these EGFs is shown in Figure 2AGo. For the hEGF-1 cluster, the C{alpha} atoms of 23 residues in eight structures can be aligned with a root-mean-square deviation of 1.6 Å (PDB codes 1tpg [PDB] , 1eqg, 1edm, 1dan1, 1g1t, 1fsb, 1jl9, 1xdt).

Evolutionary and functional implications of subtypes
In sequence-based alignments of EGF domains all six of the highly conserved cysteines are aligned implying homology between these cysteines (e.g., Bersch et al. 1998). On the basis of these alignments one might assume that the two groups of three-disulfide EGF subunits have diverged from insertion or deletion of residues in the {beta}-turn of the minor sheet. If hydrogen-bonding residues in the {beta}-sheet are maintained and the cysteines are homologous, this must have occurred as two separate events: insertion/deletion of residues N-terminal to half-cystine cC in the {beta}-turn and insertion/deletion of a residue between half-cystine cC and the hydrogen-bonding residue of the second strand of the sheet. Given the presence of half-cystine cC in alternate registers in the two types of three-disulfide EGFs, it is also possible that these cysteines may not be homologous and the evolutionary situation is reflected more correctly by a structure-based sequence alignment based on hydrogen-bonding in the minor sheet (Fig. 3AGo). Furthermore, the presence of half-cystine cC in alternate registers in the two types of three-disulfide EGFs and the location of the nearby half-cystine dN in four-disulfide EGFs suggests an intriguing evolutionary scenario: that generation of the cEGF group from a four-disulfide ancestor with an hEGF-like minor sheet conformation involved a single event where half-cystine cC was lost and the nearby half-cystine dN was captured as the half-cystine cN pair partner. Such an event would simultaneously generate the longer loop length of the cEGF group and, if the {beta}-sheet register was maintained, put the new half-cystine cC (cC') in the alternate register (Fig. 2BGo). This disulfide-capture model requires that a four-disulfide EGF module is an ancestral form of the EGF domain. The hEGFs and cEGF types are derived from the ancestor by selective loss of two cysteines of the ancestral sequence. The hEGF type retains the ancestral connectivity of disulfide c and the conformation of the minor sheet, with disulfide d being lost. In the cEGF type, half-cystine dN subsumes the role of half-cystine cC, with the other halves of disulfide c and d being lost (Fig. 3BGo). As a result, the hairpin loop is lengthened and the novel half-cystine cC, while retaining its original registration in the {beta}-sheet, appears to adopt a new registration. In addition to the structural data, the disulfide-capture model is further supported by the bimodal nature of the distribution of disulfide c loop lengths found in EGF modules (Fig. 3EGo).

A proposed evolutionary model originating with a four-disulfide model is presented in Figure 3CGo. As hEGFs seem to be exclusively of subtype 1, it is likely the divergence of the cEGF-1 and 2 subtypes occurred after the hEGF split (Fig. 3CGo). This hypothesis could be tested by examining the appearance of EGF subtypes in complete genomes. However, all extant three and four-disulfide subtypes are represented in Caenorhabditis elegans, the earliest diverging metazoan with a completely sequenced genome. No EGF domains have been annotated in plant genomes such as Arabidopsis, although there are reports in the literature of plant EGFs. Integrin {beta}-chains containing integrin EGFs have been reported in Arabidopsis and sponges, an earlier diverging metazoan than C. elegans. It would appear that EGFs are very ancient, and that genomes of earlier diverging multicellular organisms are required to test the hypothesis and date the appearance of the various EGF modules.

The identified subtypes correlate with functional data. The bN–aC loop identified in the structural analysis as an important discriminator of the 1 and 2 subtypes is fucosylated in some EGF domains. Indeed, EGF domains undergo several unusual forms of post-translational modification including hydroxylation of aspartate and asparagine residues and two rare forms of O-glycosylation.

O-fucose modifications of EGF domains have been demonstrated to modulate Notch, TGF{beta} family (nodal) and urinary-type plasminogen activator (uPA) signal transduction (Haltiwanger 2002). We surveyed all of the O-glycosylation modifications of the bN–aC loop reported in the literature and found they have been reported in hEGF domains only. In addition to the requirement for a serine or threonine at the aC – 1 position, investigation of sequence determinants of modification suggests fucosylation is dependent on the presence of five residues in the bN–aC loop (Fig. 2AGo). Several fucosylated sequences were found to conform to a CXXGG(T/S)C motif (Harris and Spellman 1993). However, additional studies suggest that alanine is also permissible at position 5 and several other variations (D, Q) are allowable at position 4 (Panin et al. 2002).

As it has already been demonstrated that proper folding of the EGF domain is a requirement for O-fucose modification (Wang et al. 1996; Wang and Spellman 1998), we investigated the conformations of five-residue loops in known structures. A superposition of hEGF structures with five residue bN–aC loops shows that all structures solved to date adopt one of two conformers (data not shown). Conformer 1, which is a type I' {beta}-turn, is shared by tPA (1tpg), Cox-1, the N-terminal EGF domains of factors VII, IX, and X, neuregulin, and heparin-binding growth factor, while conformer 2 is adopted by E-selectin. All structures that are able to be fucosylated adopt conformer 1, suggesting that the structure of the epitope may be an important determinant of fucosylation.

{beta}-Hydroxylation of an aspartate or asparagine residue at the aC + 2 position of the aC–bC loop has been demonstrated in over 25 calcium-binding EGF modules (e.g., Przysiecki et al. 1987; Stenflo et al. 2000). The biological role for this post-translational modification, which appears to be restricted to EGF domains, is unclear. However, knockout mice which lack the genomic locus containing the enzyme catalyzing aspartyl {beta}-hydroxylation have developmental defects and an increased incidence of intestinal neoplasia (Dinchuk et al. 2002). The consensus motif previously determined for {beta}-hydroxylation is CX[DN]4X[FY]XCXC (PROSITE signature PS00010) (Stenflo et al. 1988), where the hydroxylated residue is in bold, X is any residue, and residues within square brackets represent possible options for that amino acid position. A review of the 25+ instances of {beta}-hydroxylation which have been confirmed experimentally shows there is a one-to-one correspondence between hydroxylation of aspartic acid and the hEGF type; and hydroxylation of asparagine and the cEGF type. This suggests the consensus motif for cEGF hydroxylation may be expressed as CXN4X[F,Y]XCXC, whereas the motif for hEGF hydroxylation is CXD4X[F,Y]XCXC.

Development of sequence descriptors for the two EGF types
Differential post-translational processing of these EGF domain subtypes suggested possible functional specialization to us, so we wished to search specifically for the different EGF types in Swiss-Prot and correlate the EGF type with functional information about the proteins. The sequence motif databases do not differentiate between the two types, so it was first necessary to construct our own sequence descriptors.

At the sequence level, the two major types of three-disulfide EGFs can be differentiated by a number of features. hEGF subunits almost always have eight residues between the two half-cystines of disulfide c (Fig. 2AGo). The integrin {beta}-4 subunit and prostaglandin H-synthase are the only structurally characterized hEGF subunits that do not comply with this rule, having nine residues instead. cEGF subunits typically have 10–13 residues separating the half-cystines of disulfide c (Fig. 2AGo). These length differences and the fact that different secondary structures have different sequence preferences, allow the construction of more specific sequence descriptors through pattern discovery methods (Materials and Methods).

In order to evaluate the sensitivity and specificity of our pattern-based approach, we referred to InterPro (Apweiler et al. 2001), which amalgamates several sequence detection databases including PROSITE, PRINTS, SMART, Pfam, and ProDom. A search of Swiss-Prot 43 for the relevant InterPro signatures shows the relative performance of several EGF detection methods (Table 2Go). The amalgamated results from these databases detect matches in 640 proteins, which are grouped as InterPro IPR006209. The two PROSITE signatures EGF_1 and EGF_2 represent the best single method for EGF detection and, in addition to the Swiss-Prot annotations we will use these as our benchmark.


View this table:
[in this window]
[in a new window]
 
Table 2. Number of true positive EGF domains detected by various methods as annotated in Swiss-Prot
 
The PROSITE signatures are of additional interest for benchmarking because a portion of the hEGF group, but no cEGFs, is specifically extracted by the PROSITE signature EGF_1 CXCX(5)GX(2)C (PS00022). At present there is no PROSITE motif that specifically extracts cEGF subunits. PROSITE uses two additional motifs to identify additional three-disulfide EGF domains in protein sequences. The EGF_2 PROSITE motif is generically based on the C-terminus of three-disulfide EGF subunits CXCX(2)[GP] [FYW]X(4,8)C (PS01186). This motif nonspecifically extracts hEGF and cEGF subunits simultaneously. The other motif relies on the identification of a conserved N-terminal calcium-binding motif which may be found in both types of domains (PS01187). Calcium binding appears to be an allosteric mechanism for controlling the relative orientation of pairs of EGF domains (Downing et al. 1996). Although EGFs are often classified into two types based on the ability or nonability to bind calcium, calcium binding does not appear to be monophyletic (Campbell and Bork 1993; Figs. 3AGo, 4Go). Some of both the hEGF and cEGF types bind calcium, suggesting that it is an ancestral feature that has been lost multiple times in the descendents. For example, the three domains of the LDL receptor appear to have arisen by subunit duplication but only domains 1 and 2 bind calcium (Fig. 4Go). Thus, this feature can only be relied upon to detect a subset of EGF subunits of both types and is not considered further.



View larger version (80K):
[in this window]
[in a new window]
 
Figure 4. Domain arrangements of structurally characterized and other pertinent EGF-containing proteins. Molecules subject to metalloprotease shedding are indicated with an "s" next to the cleavage arrow. All proteins shown are human with the exception of Spitz, Keren, Gurken (Drosophila); and agrin (chick). Abbreviations follow. Proteins: tPA, tissue plasminogen activator; LTBP-1, latent transforming growth factor binding protein 1; hbEGF, heparin-binding EGF. Domains: FI, Fibronectin type I; K, Kringle; SP, serine protease; NTC, Notch; D, DSL; Clec, Clectin; s, sushi; HB, heparin-binding; M12B, metalloprotease M12B; d, disintegrin; CR, cysteine rich; IgC2, immunoglobulin; G1, globular domain 1; G2, globular domain 2; CUB, C1r/C1s module; TGFBP, transforming growth factor {beta} (TGF-{beta}) binding protein; A LDL, receptor class A module; C, cadherin; LG, laminin G; R, Reeler; B, BNR domain; GLA, {gamma}-carboxyglutamate module; TSR, thrombin-sensitive region; SP, serine protease; k, kazal; SEA, sperm protein/enterokinase/agrin module; LamIV, laminin domain IV; G, Laminin G-like; PSI, plexin/semaphorin/integrin domain; bA, {beta}-chain A domain; bTD, {beta}-chain tail domain.

 
Our hEGF and cEGF pattern-descriptor collections were used to search Swiss-Prot 43. The hEGF collection successfully extracted 1580 hits from 418 sequences above a threshold that results in only five false positives in a total of 1580 instances (score > 3; see supplemental data file 1). This result corresponds to a 13% improvement over the 1397 matches in 319 proteins that the PROSITE motif EGF_1 extracts. Also, the cEGF collection proved significantly better than the PROSITE motif EGF_2 at extracting cEGFs. In particular, 1098 cEGFs were detected in 231 proteins above a threshold (score > 9; see supplemental data file 2) that results in only six false positives in a total of 1098 instances: a detection improvement of 26% over the 874 cEGFs detected by EGF_2. Other comparative figures are presented in Table 2Go.

Another measure of the quality of our cEGF and hEGF pattern-descriptors is their ability to find EGF domains that are already annotated in the Swiss-Prot database. Our pattern-based scheme was able to find 92% of the 2662 annotated three-disulfide EGF domains in Swiss-Prot compared to 85% for the combined PROSITE motifs EGF_1 and EGF_2. Our scheme failed to identify 222 sequences in Swiss-Prot, or 8% of the annotated EGFs: These false negatives seem to represent additional heterogeneity in EGF sequences. For example, 14 of these sequences carried additional annotations such as "atypical," "incomplete," and "truncated." Other detected anomalies included 20 EGF domains with an atypical number of cysteines (usually odd, suggesting one Cys exists as a free thiol), and 56 domains with either atypically short (fewer than eight residues) or long (> 15) disulfide c loops.

But more importantly, our method was able to detect a number of novel EGF domains. Supporting evidence for the EGF identification was available for 169 of these. Firstly, 14 cEGFs in plasmodial Merozoite surface proteins (Chitarra et al. 1999; Morgan et al. 1999) have been confirmed by recent structures or homology to them. These plasmodial sequences are likely to have been transferred from a host organism in a single lateral transfer event. Secondly, the discovered potential EGFs are in proteins homologous to those with annotated EGFs. These include four novel hEGFs in adamalysins. Sixty-two additional EGF domains were detected in proteins that contain EGF domains or that contain domains which themselves are often associated with EGF domains. These include one additional hEGF domain in human and mouse netrin G2; mouse netrin G1; human and mouse tenascin N; the long form of human LTBP-1; C. elegans LIN-12; human, mouse, and rat Jagged-1, as well as zebrafish Jagged-3; and Electric ray agrin: two additional domains in the long form of mouse LTBP-1; mouse perlecan; the human and mouse scavenger receptor 2; human slit 2: three additional EGFs in human and mouse attractin; UN-52, the C. elegans homolog of perlecan, and the human scavenger receptor: one additional cEGF domain in bovine, human, and rat thrombomodulin; Drosophila cadherin N 1 and 2; the cattle tick protective antigen BM86; the C. elegans proteins C14orf27 and ZK112.7; human and mouse LTBP-3; human, pig, and rabbit zonadhesin; bovine, human, pig, and mouse fibronectin 1, and human fibronectin 2; and two additional cEGF domains in mouse fibronectin 2. Some proteins had additional hEGFs and cEGFs including human crumbs homolog (hEGF, cEGF); and Drosophila starry night protein (hEGF, 2 x cEGFs).

Additional novel domains suggested by the less specific hidden Markov models and confirmed by X-ray structures include 85 I-EGF domains in 28 integrin {beta}-chains (Xiong et al. 2001); and four hEGF domains in variants of alliinase, a protein attributed with antibacterial properties in garlic (Kuettner et al. 2002). In both cases, these deviate from the cEGF and hEGF templates. The alliinase EGF domain is very atypical (Fig. 3AGo). It lacks disulfide a, and contains an additional disulfide joining the C-terminus of the EGF domain to the N-terminus of the minor sheet. On the basis of the cN–cC loop length and the location of half-cystine cC, garlic alliinase belongs to either the hEGF or four-disulphide group. In addition, it has a C-terminal {beta}-turn, formed by residues Gln 55 and Gly 56, reminiscent of the {beta}-turn in the four-disulphide EGF laminin. Several other hits in plants suggested by the hidden Markov models were not confirmed by the more specific patterns. However, they shared some common features such as association with BULB lectin domains (InterPro: IPR001480) (Apweiler et al. 2001). These plant EGF domains may be false positives, or alternatively, the failure of the more specific patterns to detect them may suggest EGF domains have diverged significantly in the plant kingdom, a view supported by the structure of garlic alliinase.

Analysis of EGFs in Swiss-Prot based on cEGF/hEGF grouping
Using the cEGF and hEGF sequence descriptors, we reclassified the three-disulfide EGF domains in Swiss-Prot into the two groups, observed the occurrence of the two groups in mosaic proteins, and correlated this information with functional data.

Many mosaic proteins are homogeneous with respect to EGF type. For example, many developmentally important proteins such as Notch and Delta as well as EGFs that are mitogenic contain only hEGFs (Fig. 4Go). Proteins that contain solely cEGFs, on the other hand, include thrombomodulin and the LDL receptor. However, there are a significant number of mosaic proteins that contain both types. For these mixed EGF proteins, a bipartite structure where the different EGF types are grouped together is the most common, but other interleaved arrangements are also found (Fig. 4Go). For example, most of the proteins involved in blood coagulation are a mixture of hEGFs and cEGFs with the hEGF always N-terminal to the cEGF (Fig. 4Go). Fibrillin and LTBP-1, components of the extracellular matrix (ECM), have a similar arrangement. They predominantly consist of the cEGF type but are predicted to have one to three hEGFs at the N-terminus. In contrast, the LDL receptor-related protein 1 (LRP1), nidogen, the transmembrane receptor adhesion protein MUA3, and proEGF also contain predominantly cEGFs but have the opposite arrangement, with the few hEGF subunits disposed toward the C-terminus near the membrane. In addition, mosaic proteins that contain laminin EGFs contain hEGFs only. Perlecan and agrin are examples (Fig. 4Go).

Functionally distinct domains of LTBP-1 contain distinct EGF types
Proteins with the cEGF and hEGF domains disposed in a bipartite fashion are particularly illuminating with regard to specific functions for hEGF and cEGF subunits. An example is provided by LTBP-1, in which the two EGF groups seem to have different roles in the function of the protein: the hEGFs in targeting the assembly to the ECM; and the cEGFs in some unspecified role in TGF-{beta} activation after its separation from the hEGF subunits. LTBP-1 is a binding protein that anchors TGF-{beta} to the ECM in its latent form until it is required. Association with the ECM is effected by the N-terminus. LTBP-1 is expressed tissue specifically in two alternatively-spliced forms: a short form which has a single hEGF domain at the N-terminus, and a long form which has an additional hEGF domain. The long form associates more efficiently with the ECM (Olofsson et al. 1995), suggesting a role for hEGFs in targeting to the ECM. TGF-{beta} is attached to LTBP-1 via a cysteine-rich domain interleaved between tandem cEGF subunits. The ability of most of these cEGF subunits to bind calcium suggests the region probably forms a stiff rod in its presence. All but one of the C-terminal cEGFs are separated from the N-terminus by a hinge region that can be cleaved by various proteases (Sato and Rivkin 1989; Yu and Stamenkovic 2000), resulting in release of the tandem cEGF substructure from the ECM and activation of TGF-{beta}. Thus, the hEGF and cEGF subunits seem to function at different stages during TGF-{beta}’s deployment.

A similar pattern of proteolytic separation of the hEGF and cEGF subunits is apparent for proEGF and LRP1 (Fig. 4Go). For proEGF, the soluble mature 6kD EGFR ligand is derived from the most distal repeat, the only hEGF subunit (Dempsey et al. 1997). LRP1 is proteolysed in a series of cleavage events culminating in the release of its intracellular domain by the intramembrane protease {gamma}-secretase. The first of these cleavages, which is catalyzed by furin in a late secretory compartment, separates the bulk of the cEGF subunits from the hEGF subunit (Willnow et al. 1996). The two halves of the protein subsequently remain noncovalently associated. A second cleavage is believed to occur close to the plasma membrane prior to {gamma}-secretase cleavage.

Differential glycosylation and hydroxylation of EGF types
Other indicators of a structure/function relationship are provided by differential post-translational modification of EGF subtypes. There is a clear differentiation between the two types in terms of glycosylation and proteolytic processing. EGFs are glycosylated in unusual ways, which to date, have been detected on few other protein domains. Only hEGFs have been reported to undergo O-glycosylation of the aN–bN and bN–aC loops. Fucosylation seems to be restricted to a subset of hEGFs having been identified on uPA (Kentzer et al. 1990), tPA (Harris et al. 1991), and coagulation factors VII (Bjoern et al. 1991), IX (Nishimura et al. 1992), and XII (Harris et al. 1992), as well as components of the Notch/Delta system (Shao et al. 2003). O-fucose modifications on EGF repeats have recently been shown to play significant roles in several signal transduction pathways. O-fucose on uPA was shown to be required for activation of the uPA receptor (Rabbani et al. 1992). O-fucose on the EGF repeat of Cripto was demonstrated to be essential for Cripto to mediate Nodal-dependent signaling (Schiffer et al. 2001; Yan et al. 2002). Furthermore, it is now clear that Fringe modulates Notch function by altering O-fucose structures on Notch (Bruckner et al. 2000; Hicks et al. 2000; Okajima and Irvine 2002). Similarly, Notch ligands also undergo O-fucosylation (Panin et al. 2002).

Hydroxylation is also dependent on EGF type. The details of the functional significance of {beta}-hydroxylation remain to be determined but {beta}-hydroxylase knockout mice have multiple developmental defects including craniofacial abnormalities, mild palatal defects, and soft tissue syndactyly (Dinchuk et al. 2002). These developmental abnormalities resemble those seen in mutants of Jagged-2, an EGF domain-containing protein in the Notch signaling pathway. This suggests that hydroxylation may be important in signaling pathways. Given the proximity of the O-fucosylation and {beta}-hydroxylation sites in some EGF domains (Fig. 3AGo), it has been suggested that these modifications may influence each other (Dinchuk et al. 2002). Differentiating between the cEGF and hEGF types suggests more specific motifs for hydroxylation. These new motifs should be useful for restricting searches for potential {beta}-hydroxylated residues in large proteins such as Notch. For example, 22 of the 36 Drosophila Notch EGF domains contain the {beta}-hydroxylation consensus sequence, but only 18 have the Asp hydroxylation consensus sequence which seems to be necessary for hEGF {beta}-hydroxylation.

Role of hEGFs in shedding
Further intriguing evidence for specific functions for hEGFs and cEGFs is suggested by proteins that undergo shedding from the membrane and/or regulated intramembrane proteolysis (RIP). Most of these proteins contain at least one hEGF domain. Those that contain a single domain are always of the hEGF type. These include L-selectin; the EGFR ligands TGF-{alpha}, amphiregulin, epiregulin, betacellulin, and hbEGF; the EGFR-related HER4 ligand, neuregulin (Harris et al. 2003); E and N-cadherin in Drosophila; as well as the Drosophila EGF ligands, Spitz, Gurken, and Keren. Shed proteins that contain multiple EGFs either consist of tandem hEGFs like Notch, Delta and Jagged; or contain a mixture of hEGF and cEGF subunits with the hEGFs arranged nearest to the membrane. Given the predominance of juxtamembrane hEGF subunits in proteins that undergo shedding or RIP, it is likely that they have some role in recognition or regulation of this penultimate cleavage step. In support of this, the hEGF domain in L-selectin has been demonstrated to be crucial in regulated shedding of L-selectin from the membrane (Zhao et al. 2001).

The role of hEGFs in shedding may extend to the sheddases. Regulated shedding has been extensively attributed to zinc metalloproteases of the ADAM family (Sahin et al. 2004). These proteases also contain a hEGF domain (Fig. 4Go), suggesting a like-recognizes-like recognition mode between the hEGF ADAM sheddases and hEGFs of the proteins being shed.

In summary, data from LTBP-1 suggest separate functions for cEGF and hEGF subunits in the protein. Data on EGFs that undergo functionally related post-translational modifications demonstrate the modifications are dependent on EGF type.


    Discussion
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
The structure and sequence data suggest the existence of several clearly distinguishable three-disulfide subtypes. While it was noted several years ago that there are two structurally different types of three-disulfide EGF domains (Bersch et al. 1998), this information does not have wide currency. Neither the structural database SCOP nor any of the sequence motif databases differentiate between the two types. The results of our study support the distinction of cEGFs and hEGFs as two major EGF subgroups and indicate these subgroups are associated with some distinct functions. In particular, we show that important post-translational modifications of three-disulfide EGFs, including unusual forms of glycosylation and post-translational proteolytic processing, are dependent on EGF subtype. We believe distinction of the two three-disulfide subunits should elucidate further structure/function relationships.

An alternative classification system of EGF domains based on the size of the linking regions between EGFs (Downing et al. 1996) is not inconsistent with the hEGF/cEGF classification system. The Downing classification system recognized two major classes of EGF tandem pairs: class I pairs where tandem EGF domains are separated by a one linker residue, and class II pairs separated by two linker residues. Thus, for class I pairs the last cysteine (half-cystine [cC]1) of the N-terminal EGF is separated from the first cysteine (half-cystine [aN]2) of the C-terminal EGF by five residues; whereas for class II pairs the cysteines are separated by six residues. These numbers are consistent with separations for cEGF and hEGF pairs, respectively. Downing et al. (1996) also noted that class I and II pairs have striking differences in the length of the loops connecting the 5th and 6th half-cystines (disulfide c). Thus class I pairs correspond to a pair of cEGF subunits and class II pairs are two hEGF subunits. Given the difference in registration of the cysteines with respect to the {beta}-sheet (Fig. 3AGo), it can be seen that the number of residues between domain pairs of the two different groups is effectively the same. Allowing for the difference in registration of the cysteines between the two groups also brings the calcium-binding residues in the linker into alignment without gaps (Fig. 3DGo). This suggests that a pair of hEGFs could be effectively modeled by a cEGF pair if the structural nonequivalence of the last cysteine in each module is taken into account. While there are structures of cEGF pairs in the structure database, to date no pair of hEGF domains has been solved. However, the previous observations suggest that the minor sheet of the N-terminal EGF and the major sheet of the C-terminal EGF should superimpose well. Differences in the pitch of the screw axis of multiple tandem hEGF or cEGF pairs will arise from the different orientation of the major sheet with respect to the minor sheet within hEGF and cEGF domains.

Downing et al. (1996) concluded that proteins with tandem EGF domains consist entirely of either hEGFs (class II) or cEGFs (class I) and cited a single exception to this rule: that of protein S. The more extensive data available to our study show there are many more exceptions. Although many proteins are homogeneous with respect to EGF type, there is a significant number that contain both types. We identified four distinct combinations of EGF types in mosaic proteins: those that contain hEGFs only; those that contain cEGFs only; mixed cEGF/hEGF proteins both of a bipartite and interleaved nature; and mixed hEGF/laminin proteins. An interesting feature of many of the bipartite cEGF/hEGF proteins is the existence of a cleavage site in the vicinity of the hEGF/cEGF boundary. Examples include proEGF, LTBP-1, and LRP1. In the case of LTBP-1, there is good evidence that the two halves of the protein encode distinct functions. We speculate this may be a general feature of cleaved bipartite EGFs.

This study raises some interesting question about the evolution of EGF domains. Domains of both types have in the past been aligned such that all six cysteines are in register, implying all six cysteines are homologous and the difference between the two groups is merely a difference in the length of a variable loop. However, structural analysis of EGF domains suggests that the sixth cysteine of the two groups may not be homologous. Here we mooted a disulfide-capture model based on observed differences between EGF types. This model postulates derivation of EGF types from a four-disulfide EGF progenitor. Alternatively, the observed conformational similarities of the minor sheet between hEGFs and four-disulfide EGFs may suggest these groups are the more closely related with the cEGF group being the outlier. Further evolutionary studies should elucidate this question. Finally, the existence of proteins containing both hEGFs and cEGFs also suggests these tandem EGF proteins contain higher order structures and have not evolved simply from multiple EGF duplication events.

In summary, for the first time, these structurally different types of three-disulfide EGF domains have been related to experimental data suggesting EGF subtypes may have distinct functional roles. Additionally, very sensitive and specific in silico detection of structural and functional domains in protein sequences is now feasible. The ability to detect instances of the two EGF types in an automated manner should enhance further investigation into structure/function relationships.


    Materials and methods
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
For the structural analysis, current EGF structures were extracted from the Protein Data Bank (PDB) (Berman et al. 2000) using SCOP as a guide (Murzin et al. 1995). EGF domains were aligned on the basis of structure as well as sequence. Subtypes were classified with the aid of HERA diagrams (Hutchinson et al. 1990). Additional information on secondary structures was obtained from DSSP (Kabsch and Sander 1983). An all-against-all structural comparison was performed with STAMP (Russell and Barton 1992) based on starting alignments generated with clustalW (Thompson et al. 1994). Superpositions were viewed using In-sightII (Accelerys).

Improved motifs to specifically extract EGF domains of the hEGF and cEGF types from sequence databases were generated using structure-based alignments of the different C-termini of the two types. The motifs were created and refined using an iterative technique which involved initial extraction of hEGF and cEGFs by regular expression, generation of hidden Markov models, and finally generation of the final motifs using pattern discovery. EGF domains were initially extracted from Swiss-Prot (Boeckmann et al. 2003) by an iterative heuristic method using the program Find-Patterns (GCG, Wisconsin Group). The initial regular expressions were based on the sequences of types from the structural analysis. The output was compared to Swiss-Prot annotations and the regular expressions modified until all sequences fitting the structural templates were extracted. False positives were then removed from the lists based on several criteria. Firstly, as EGFs have to date only been confirmed in metazoans and plasmodia, all bacterial and fungal sequences were removed. Secondly, sequences in which the detected EGF was alternatively annotated as a different type of domain were removed. This group included a large number of zinc fingers. Thirdly, sequences were removed on the basis of function and cellular location. For example, proteins annotated as transcription factors were removed as well as proteins annotated as nuclear or cytosolic. Fourthly, the inability to detect a majority of homologs was also used as a criterion for exclusion, since it indicated that features detected were not conserved. The remaining true positives belonging to the hEGF and cEGF groups were aligned with ClustalW, hidden Markov models (HMMs) were constructed for each of the two types using HMMER (Eddy 1996) and used to search Swiss-Prot for members of the two groups. As the results of the HMM search included a substantial fraction of false positives, they were filtered manually: in particular, only the known positives and potential EGFs, which were verified through BLAST (Altschul et al. 1990) searches, were included; additionally, all of the four-disulfide EGFs which were detected by the hEGF-HMM were removed from the hEGF training set.

The resulting collection of hEGF and cEGF instances was subsequently used as a training set in conjunction with a pattern discovery algorithm (Rigoutsos and Floratos 1998a, b) to derive sequence descriptors that were both sensitive and specific for each of the two groups. These pattern-based descriptors were subsequently used to process Swiss-Prot anew and identify more instances of the two groups. The instances were finally correlated with functional data from the literature and the contents of the Swiss-Prot annotations.


    Electronic supplemental material
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
Files in this data supplement: (1) hEGF.pdf—list of matches to hEGF pattern descriptors in SwissProt 43 and (2) cEGF.pdf—list of matches to cEGF pattern descriptors in SwissProt 43. Likely false positives are indicated with an * in both files.


    Footnotes
 
Supplemental material: see www.proteinscience.org


    Acknowledgments
 
This work was supported by a Freedman Foundation Fellowship to M.A.W., a Westfield-Belconnen Fellowship to D.B.S, and Pharmacia Foundation Australia Fellowship to S.L.D. The authors thank Siiri Iismaa for critical comments on the manuscript.


    References
 TOP
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 Electronic supplemental material
 References
 
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410.[CrossRef][Medline]

Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., et al. 2001. InterPro—an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29: 37–40.[Abstract/Free Full Text]

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28: 235–242.[Abstract/Free Full Text]

Bersch, B., Hernandez, J.F., Marion, D., and Arlaud, G.J. 1998. Solution structure of the epidermal growth factor (EGF)-like module of human complement protease C1r, an atypical member of the EGF family. Biochemistry 37: 1204–1214.[CrossRef][Medline]

Bjoern, S., Foster, D.C., Thim, L., Wiberg, F.C., Christensen, M., Komiyama, Y., Pedersen, A.H., and Kisiel, W. 1991. Human plasma and recombinant factor VII. Characterization of O-glycosylations at serine residues 52 and 60 and effects of site-directed mutagenesis of serine 52 to alanine. J. Biol. Chem. 266: 11051–11057.[Abstract/Free Full Text]

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., et al. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.Nucleic Acids Res. 31: 365–370.[Abstract/Free Full Text]

Bruckner, K., Perez, L., Clausen, H., and Cohen, S. 2000. Glycosyltransferase activity of Fringe modulates Notch–Delta interactions. Nature 406: 411–415.[CrossRef][Medline]

Campbell, I.D. and Bork, P. 1993. Epidermal growth factor-like modules. Curr. Opin. Struct. Biol. 3: 385–392.

Chan, A.W., Hutchinson, E.G., Harris, D., and Thornton, J.M. 1993. Identification, classification, and analysis of {beta}-bulges in proteins. Protein Sci. 2: 1574–1590.[Abstract]

Chitarra, V., Holm, I., Bentley, G.A., Petres, S., and Longacre, S. 1999. The crystal structure of C-terminal merozoite surface protein 1 at 1.8 Å resolution, a highly protective malaria vaccine candidate. Mol. Cell 3: 457–464.[CrossRef][Medline]

Dempsey, P.J., Meise, K.S., Yoshitake, Y., Nishikawa, K., and Coffey, R.J. 1997. Apical enrichment of human EGF precursor in Madin-Darby canine kidney cells involves preferential basolateral ectodomain cleavage sensitive to a metalloprotease inhibitor. J. Cell. Biol. 138: 747–758.[Abstract/Free Full Text]

Dinchuk, J.E., Focht, R.J., Kelley, J.A., Henderson, N.L., Zolotarjova, N.I., Wynn, R., Neff, N.T, Link, J., Huber, R.M., Burn, T.C, et al. 2002. Absence of post-translational aspartyl beta-hydroxylation of epidermal growth factor domains in mice leads to developmental defects and an increased incidence of intestinal neoplasia. J. Biol. Chem. 277: 12970–12977.[Abstract/Free Full Text]

Downing, A.K., Knott, V., Werner, J.M., Cardy, C.M., Campbell, I.D., and Handford, P.A. 1996. Solution structure of a pair of calcium-binding epidermal growth factor-like domains: Implications for the Marfan syndrome and other genetic disorders. Cell 85: 597–605.[CrossRef][Medline]

Eddy, S.R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6: 361–365.[CrossRef][Medline]

Haltiwanger, R.S. 2002. Regulation of signal transduction pathways in development by glycosylation. Curr. Opin. Struct. Biol. 12: 593–598.[CrossRef][Medline]

Harris, R.J. and Spellman, M.W. 1993. O-linked fucose and other post-translational modifications unique to EGF modules. Glycobiology 3: 219–24.[Abstract/Free Full Text]

Harris, R.J., Leonard, C.K., Guzzetta, A.W., and Spellman, M.W. 1991. Tissue plasminogen activator has an O-linked fucose attached to threonine-61 in the epidermal growth factor domain. Biochemistry 30: 2311–2314.[CrossRef][Medline]

Harris, R.J., Ling, V.T., and Spellman, M.W. 1992. O-linked fucose is present in the first epidermal growth factor domain of factor XII but not protein C. J. Biol. Chem. 267: 5102–5107.[Abstract/Free Full Text]

Harris, R.C., Chung, E., and Coffey, R.J. 2003. EGF receptor ligands. Exp. Cell Res. 284: 2–13.[CrossRef][Medline]

Hicks, C., Johnston, S.H., diSibio, G., Collazo, A., Vogt, T.F., and Weinmaster, G. 2000. Fringe differentially modulates Jagged1 and Delta1 signaling through Notch1 and Notch2. Nat. Cell Biol. 2: 515–520.[CrossRef][Medline]

Hutchinson, E.G. and Thornton, J.M. 1990. HERA—A program to draw schematic diagrams of protein secondary structures. Proteins 8: 203–212.[CrossRef][Medline]

Kabsch, W. and Sander, C. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.[CrossRef][Medline]

Kentzer, E.J., Buko, A., Menon, G., and Sarin, V.K. 1990. Carbohydrate composition and presence of a fucose-protein linkage in recombinant human prourokinase. Biochem. Biophys. Res. Commun. 171: 401–406.[CrossRef][Medline]

Kiyota, T. and Kinoshita, T. 2004. The intracellular domain of X-Serrate-1 is cleaved and suppresses primary neurogenesis in Xenopus laevis. Mech. Dev. 121: 573–585.

Kuettner, E.B., Hilgenfeld, R., and Weiss, M.S. 2002. The active principle of garlic at atomic resolution. J. Biol. Chem. 277: 46402–46407.[Abstract/Free Full Text]

Lai, E.C. 2004. Notch signaling: Control of cell communication. Development 131: 965–973.[Abstract/Free Full Text]

Morgan, W.D., Birdsall, B., Frenkiel, T.A., Gradwell, M.G., Burghaus, P.A., Syed, S.E., Uthaipibull, C., Holder, A.A., and Feeney, J. 1999. Solution structure of an EGF module pair from the Plasmodium falciparum merozoite surface protein 1. J. Mol. Biol. 289: 113–122.[CrossRef][Medline]

Murzin, A.G., Brenner, S.E., Hubbard, T., and Chothia C. 1995. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.[CrossRef][Medline]

Nishimura, H., Takao, T., Hase, S., Shimonishi, Y., and Iwanaga, S. 1992. Human factor IX has a tetrasaccharide O-glycosidically linked to serine 61 through the fucose residue. J. Biol. Chem. 267: 17520–17525.[Abstract/Free Full Text]

Okajima, T. and Irvine, K.D. 2002. Regulation of Notch signaling by O-linked fucose. Cell 111: 893–904.[CrossRef][Medline]

Olofsson, A., Ichijo, H., Moren, A., ten Dijke, P., Miyazono, K., and Heldin, C.H. 1995. Efficient association of an amino-terminally extended form of human latent transforming growth factor-{beta} binding protein with the extra-cellular matrix. J. Biol. Chem. 270: 31294–31297.[Abstract/Free Full Text]

Panin, V.M., Shao, L., Lei, L., Moloney, D.J., Irvine, K.D., and Haltiwanger, R.S. 2002. Notch ligands are substrates for protein O-fucosyltransferase-1 and Fringe. J. Biol. Chem. 277: 29945–29952.[Abstract/Free Full Text]

Przysiecki, C.T., Staggers, J.E., Ramjit, H.G., Musson, D.G., Stern, A.M., Bennett, C.D., Friedman, P.A. 1987. Occurrence of {beta}-hydroxylated asparagine residues in non-vitamin K-dependent proteins containing epidermal growth factor-like domains. Proc. Natl. Acad. Sci. 84: 7856–7860.[Abstract/Free Full Text]

Rabbani, S.A., Mazar, A.P., Bernier, S.M., Haq, M., Bolivar, I., Henkin, J., and Goltzman, D. 1992. Structural requirements for the growth factor activity of the amino-terminal domain of urokinase. J. Biol. Chem. 267: 14151–14156.[Abstract/Free Full Text]

Rigoutsos, I. and Floratos, A. 1998a. Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14: 55–67.[Abstract/Free Full Text]

———. 1998b. Motif Discovery without alignment or enumeration. Proceedings 2nd International Conference on Computational Molecular Biology (RECOMB ’98), pp. 221–227. ACM Press, New York, NY.

Russell, R.B. and Barton, G.J. 1992. Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels. Proteins 14: 309–323.[CrossRef][Medline]

Sahin, U., Weskamp, G., Kelly, K., Zhou, H.M., Higashiyama, S., Peschon, J., Hartmann, D., Saftig, P., and Blobel, C.P. 2004. Distinct roles for ADAM10 and ADAM17 in ectodomain shedding of six EGFR ligands. J. Cell Biol. 164: 769–779.[Abstract/Free Full Text]

Sato, Y. and Rifkin, D.B. 1989. Inhibition of endothelial cell movement by pericytes and smooth muscle cells: Activation of a latent transforming growth factor-{beta} 1-like molecule by plasmin during co-culture. J. Cell Biol. 109: 309–315.[Abstract/Free Full Text]

Schiffer, S.G., Foley, S., Kaffashan, A., Hronowski, X., Zichittella, A.E., Yeo, C.Y., Miatkowski, K., Adkins, H.B., Damon, B., Whitman, M., et al. 2001. Fucosylation of Cripto is required for its ability to facilitate nodal signaling. J. Biol. Chem. 276: 37769–37778.[Abstract/Free Full Text]

Shao, L., Moloney, D.J., and Haltiwanger, R. 2003. Fringe modifies O-fucose on mouse Notch1 at epidermal growth factor-like repeats within the ligand-binding site and the Abruptex region. J. Biol. Chem. 278: 7775–7782.[Abstract/Free Full Text]

Stenflo, J. 1991. Structure–function relationships of epidermal growth factor modules in vitamin K-dependent clotting factors. Blood 78: 1637–1651.[Free Full Text]

Stenflo, J., Ohlin, A.K., Owen, W.G., and Schneider, W.J. 1988. {beta}-Hydroxy-aspartic acid or {beta}-hydroxyasparagine in bovine low density lipoprotein receptor and in bovine thrombomodulin. J. Biol. Chem. 263: 21–24.[Abstract/Free Full Text]

Stenflo, J., Stenberg, Y., and Muranyi, A. 2000. Calcium-binding EGF-like modules in coagulation proteinases: Function of the calcium ion in module interactions. Biochim. Biophys. Acta 1477: 51–63.[CrossRef][Medline]

Stetefeld, J., Mayer, U., Timpl, R., and Huber, R. 1996. Crystal structure of three consecutive laminin-type epidermal growth factor-like (LE) modules of laminin {gamma}1 chain harboring the nidogen binding site. J. Mol. Biol. 257: 644–657.[CrossRef][Medline]

Thompson, J.D., Higgins, D.G., and Gibson, T.J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22: 4673–4680.[Abstract/Free Full Text]

Urban, S. and Freeman, M. 2002. Intramembrane pro