PROSITE: PDOC00021 (documentation)

{PDOC00021}
{PS00022; EGF_1}
{PS01186; EGF_2}
{BEGIN}
******************************
* EGF-like domain signatures *
******************************

A sequence of  about thirty  to forty amino-acid  residues  long found in  the
sequence of  epidermal  growth  factor  (EGF)  has been  shown  [1 to 6] to be
present, in  a more or less conserved form, in a large number of other, mostly
animal proteins. The proteins currently known to contain one or more copies of
an EGF-like pattern are listed below.

 - Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6 copies).
 - Agrin, a basal lamina protein  that causes the aggregation of acetylcholine
   receptors on cultured muscle fibers (4 copies).
 - Amphiregulin, a growth factor (1 copy).
 - Betacellulin, a growth factor (1 copy).
 - Blastula  proteins  BP10  and  Span from sea urchin which are thought to be
   involved in pattern formation (1 copy). 
 - BM86, a glycoprotein antigen of cattle tick (7 copies).
 - Bone morphogenic protein 1 (BMP-1), a  protein which induces cartilage  and
   bone formation  and  which  expresses  metalloendopeptidase  activity  (1-2
   copies). Homologous proteins are found in sea urchin - suBMP (1 copy) - and
   in Drosophila - the dorsal-ventral patterning protein tolloid (2 copies).
 - Caenorhabditis elegans developmental proteins lin-12 (13 copies)  and glp-1
   (10 copies).
 - Caenorhabditis elegans APX-1 protein, a patterning protein (4.5 copies).
 - Calcium-dependent serine proteinase (CASP) which degrades the extracellular
   matrix proteins type I and IV collagen and fibronectin (1 copy). 
 - Cartilage matrix protein CMP (1 copy).
 - Cartilage oligomeric matrix protein COMP (4 copies).
 - Cell surface antigen 114/A10 (3 copies).
 - Cell surface glycoprotein complex transmembrane subunit ASGP-2  from rat (2
   copies).
 - Coagulation associated proteins C, Z (2 copies) and S (4 copies).
 - Coagulation factors VII, IX, X and XII (2 copies).
 - Complement C1r components (1 copy). 
 - Complement C1s components (1 copy).
 - Complement-activating component of Ra-reactive factor (RARF) (1 copy).
 - Complement components C6, C7, C8 alpha and beta chains, and C9 (1 copy).
 - Crumbs, an epithelial development protein from Drosophila (29 copies).
 - Epidermal growth factor precursor (7-9 copies).
 - Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy).
 - Fat protein, a Drosophila cadherin-related tumor suppressor (5 copies).
 - Fetal  antigen  1, a probable neuroendocrine differentiation protein, which
   is derived from the delta-like protein (DLK) (6 copies).
 - Fibrillin 1 (47 copies) and fibrillin 2 (14 copies).
 - Fibropellins  IA  (21 copies), IB (13 copies), IC (8 copies), II (4 copies)
   and III   (8   copies)  from  the  apical  lamina  -  a  component  of  the
   extracellular matrix - of sea urchin.
 - Fibulin-1 and -2, two extracellular matrix proteins (9-11 copies).
 - Giant-lens  protein (protein Argos), which regulates cell determination and
   axon guidance in the Drosophila eye (1 copy). 
 - Growth factor-related proteins from various poxviruses (1 copy).
 - Gurken protein, a Drosophila developmental protein (1 copy).
 - Heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor
   alpha (TGF-alpha), growth factors Lin-3 and Spitz (1 copy);  the precursors
   are membrane proteins, the mature form is located extracellular.
 - Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies).
 - LDL  and  VLDL receptors, which bind and transport low-density lipoproteins
   and very low-density lipoproteins (3 copies).
 - LDL  receptor-related  protein  (LRP), which  may  act  as  a  receptor for
   endocytosis of extracellular ligands (22 copies).
 - Leucocyte  antigen  CD97  (3  copies),  cell  surface  glycoprotein EMR1 (6
   copies) and cell surface glycoprotein F4/80 (7 copies).
 - Limulus clotting factor C, which is involved in hemostasis and host defense
   mechanisms in japanese horseshoe crab (1 copy).
 - Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1 copy).
 - Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies).
 - Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy).
 - Neurexins from mammals (3 copies).
 - Neurogenic  proteins  Notch, Xotch and the human homolog Tan-1 (36 copies),
   Delta (9  copies)  and  the  similar  differentiation  proteins  Lag-2 from
   Caenorhabditis elegans (2  copies), Serrate (14 copies) and Slit (7 copies)
   from Drosophila.
 - Nidogen  (also called entactin), a basement membrane protein from chordates
   (2-6 copies).
 - Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4 copies).
 - Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy).
 - Perforin, which lyses non-specifically a variety of target cells (1 copy).
 - Proteoglycans  aggrecan (1 copy), versican (2 copies), perlecan (at least 2
   copies), brevican (1 copy) and chondroitin sulfate proteoglycan (gene PG-M)
   (2 copies).
 - Prostaglandin G/H synthase 1 and 2  (EC 1.14.99.1) (1 copy), which is found
   in the endoplasmatic reticulum.
 - S1-5,  a  human  extracellular  protein whose ultimate activity is probably
   modulated by the environment (5 copies).
 - Schwannoma-derived growth factor (SDGF), an autocrine growth factor as well
   as a mitogen for different target cells (1 copy).
 - Selectins. Cell  adhesion  proteins such  as  ELAM-1 (E-selectin),  GMP-140
   (P-selectin), or the lymph-node homing receptor (L-selectin) (1 copy).
 - Serine/threonine-protein  kinase  homolog  (gene  Pro25)  from  Arabidopsis
   thaliana, which  may   be   involved   in   assembly   or   regulation   of
   light-harvesting chlorophyll A/B protein (2 copies).
 - Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1 copy).
 - Stromal cell derived protein-1 (SCP-1) from mouse (6 copies).
 - TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy).
 - Tenascin  (or  neuronectin),  an  extracellular matrix protein from mammals
   (14.5 copies), chicken (TEN-A) (13.5 copies) and the related proteins human
   tenascin-X (18  copies)  and  tenascin-like  proteins  TEN-A and TEN-M from
   Drosophila (8 copies).
 - Thrombomodulin   (fetomodulin),  which  together  with  thrombin  activates
   protein C (6 copies).
 - Thrombospondin  1, 2 (3 copies), 3 and 4 (4 copies), adhesive glycoproteins
   that mediate cell-to-cell and cell-to-matrix interactions.
 - Thyroid peroxidase 1 and 2 (EC 1.11.1.8) from human (1 copy).
 - Transforming  growth  factor  beta-1  binding protein (TGF-B1-BP) (16 or 18
   copies).
 - Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.112) (3 copies).
 - Urokinase-type  plasminogen  activator  (EC  3.4.21.73)  (UPA)  and  tissue
   plasminogen activator (EC 3.4.21.68) (TPA) (1 copy).
 - Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies).
 - Vitamin  K-dependent  anticoagulants  protein C (2 copies) and protein S (4
   copies) and  the  similar  protein Z, a single-chain plasma glycoprotein of
   unknown function (2 copies).
 - 63 Kd sperm flagellar membrane protein from sea urchin (3 copies).
 - 93 Kd protein (gene nel) from chicken (5 copies).
 - Hypothetical  337.6  Kd  protein  T20G5.3  from  Caenorhabditis elegans (44
   copies).

The functional  significance  of  EGF  domains  in what appear to be unrelated
proteins is not yet clear. However, a common feature is that these repeats are
found in the extracellular  domain  of membrane-bound  proteins or in proteins
known to  be  secreted (exception: prostaglandin G/H synthase). The EGF domain
includes six  cysteine  residues which have been shown (in EGF) to be involved
in disulfide  bonds.  The main structure is a two-stranded beta-sheet followed
by a  loop  to  a  C-terminal short two-stranded sheet. Subdomains between the
conserved cysteines  strongly  vary  in  length  as  shown  in  the  following
schematic representation of the EGF-like domain:

                 +-------------------+        +-------------------------+
                 |                   |        |                         |
  x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x
       |                   |         ************************************
       +-------------------+

'C': conserved cysteine involved in a disulfide bond.
'G': often conserved glycine
'a': often conserved aromatic amino acid
'*': position of both patterns.
'x': any residue

The region between the 5th and 6th cysteine contains two conserved glycines of
which at  least  one  is  present  in  most  EGF-like  domains. We created two
patterns for  this  domain,  each  including one of these C-terminal conserved
glycine residues.

-Consensus pattern: C-x-C-x(5)-G-x(2)-C
                    [The 3 C's are involved in disulfide bonds]
-Sequences known  to belong to this class detected by the pattern: A majority,
 but not  those  that  have very long or very short regions between the last 3
 conserved cysteines of their EGF-like domain(s).
-Other sequence(s)  detected  in  SWISS-PROT:  87 proteins, of which 27 can be
 considered as possible candidates. 

-Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C
                    [The three C's are involved in disulfide bonds]
-Sequences known  to belong to this class detected by the pattern: A majority,
 but not  those  that  have very long or very short regions between the last 3
 conserved cysteines of their EGF-like domain(s).
-Other sequence(s)  detected  in  SWISS-PROT:  83 proteins, of which 49 can be
 considered as possible candidates. 

-Note: The  beta chain of the integrin family of proteins contains 2 cysteine-
 rich repeats  which were said to be dissimilar with the EGF pattern [7].
-Note: Laminin  EGF-like repeats (see <PDOC00961>) are longer than the average
 EGF module  and  contain  a further disulfide bond C-terminal of the EGF-like
 region. Perlecan  and  agrin  contain  both EGF-like domains and laminin-type
 EGF-like domains.
-Note: the  pattern do not detect all of the repeats of proteins with multiple
 EGF-like repeats.
-Note: see <PDOC00913> for an entry describing specifically the subset of EGF-
 like domains that bind calcium.

-Last update: November 1997 / Patterns and text revised.

[ 1] Davis C.G.
     New Biol. 2:410-419(1990).
[ 2] Blomquist M.C., Hunt L.T., Barker W.C.
     Proc. Natl. Acad. Sci. U.S.A. 81:7363-7367(1984).
[ 3] Barker W.C., Johnson G.C., Hunt L.T., George D.G.
     Protein Nucl. Acid Enz. 29:54-68(1986).
[ 4] Doolittle R.F., Feng D.F., Johnson M.S.
     Nature 307:558-560(1984).
[ 5] Appella E., Weber I.T., Blasi F.
     FEBS Lett. 231:1-4(1988).
[ 6] Campbell I.D., Bork P.
     Curr. Opin. Struct. Biol. 3:385-392(1993).
[ 7] Tamkun J.W., DeSimone D.W., Fonda D., Patel R.S., Buck C., Horwitz A.F.,
     Hynes R.O.
     Cell 46:271-282(1986).
{END}

If you have problems or comments...

Back to the ExPASy molecular biology server home page