DBGET Database Document: LIGAND



            /       _ _/    ___  |     _  /      _  _/  _ _/  _ ___ |
           /         /     /   _/     /  |        / |    /     /    |
          /         /     /          /   |       /  |   /     /     |
         /         /     /          ____ |      /   |  /     /     /
        /         /     /  _ _/    /     |     /    | /     /     /
       /         /     /    /     /      |    /      /     /     /
     ______/  ___/    _____/   ___/   ___/ ___/   ___/   _______/



         LIGAND  -  Ligand Chemical Database for Enzyme Reactions

                          Release 15.1, April 1997

                                User Manual



                             Takaaki Nishioka,
             Graduate School of Agriculture, Kyoto University,

                      Susumu Goto and Minoru Kanehisa 
             Institute for Chemical Research, Kyoto University

                         E-mail: www@genome.ad.jp



1. INTRODUCTION

The Ligand Chemical Database for Enzyme Reactions (LIGAND) is designed to 
provide the linkage between chemical and biological aspects of life in
the light of enzymatic reactions.  The database consists of two sections:
the ENZYME section and the COMPOUND section.  The ENZYME section is a
collection of all known enzymatic reactions classified according to the
nomenclature of the International Union of Biochemistry and Molecular
Biology (IUBMB):

    International Union of Biochemistry and Molecular Biology,
    "Enzyme Nomenclature: Recommendations (1992) of the Nomenclature
    Committee of the International Union of Biochemistry and Molecular
    Biology", Academic Press, New York (1992).

Each entry of ENZYME is identified by the EC (Enzyme Code) number, and
contains information of naming, chemical reactions, metabolic compounds,
metabolic pathways, genes encoding the enzyme for several organisms,
genetic diseases, and links to other databases including protein sequences
and 3D structural data.

The COMPOUND section is a collection of metabolic compounds including
substrates, products, and inhibitors.  Each of the chemical substances
that appear in the ENZYME section is identified by an accession number
and stored in this section.  Each entry of COMPOUND contains information
of naming, chemical formula, structural formula in a separate GIF file
and a MOL file, and CAS (Chemical Abstracts Service) registry number.

The LIGAND database is a major component of the DBGET integrated
database system (http://www.genome.ad.jp/dbget/dbget.links.html),
providing useful links among the existing databases.  LIGAND is now
tightly coupled to the metabolic pathway database of the KEGG system
(http://www.genome.ad.jp/kegg/kegg.html), providing links to the gene
catalogs of a number of organisms.

Please cite the following paper when making use of the LIGAND database:

    Suyama, M., Ogiwara, A., Nishioka, T., and Oda, J., "Searching for
    amino acid sequence motifs among enzymes: the Enzyme-Reaction
    Database", Comput. Appl. Biosci. 9, 9-15 (1993).



2. CONVENTIONS

2.1. General Data Format

LIGAND is constructed as a flat-file database.  Similar to the data
formats of PIR and GenBank databases, a fixed number of columns are
assigned to specify the attributes of data.  Each attribute is identified
by the keyword that appears on columns 1-12.  When these columns are
blank, the attribute of the preceding line continues.  Columns from 13
are used to describe the entities of data.

For the ENZYME section the following data items appear on columns 1-12:

        ENTRY
        NAME
        CLASS
        SYSNAME
        REACTION
        SUBSTRATE
        PRODUCT
        INHIBITOR
        COFACTOR
        EFFECTOR
        COMMENT
        PATHWAY
        DISEASE
        MOTIF
        GENES
        DBLINKS
        ///

An entry begins with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry
(///) data item.  All entries except for those deleted and transferred
contain the data items ENTRY, NAME, CLASS, and end-of-entry; the other
data items are optional.

Example:

  ENTRY       EC 2.7.1.30
  NAME        Glycerol kinase
  CLASS       Transferases
	      Transferring phosphorus-containing groups
	      Phosphotransferases with an alcohol group as acceptor
  SYSNAME     ATP:glycerol 3-phosphotransferase
  REACTION    ATP + Glycerol = ADP + sn-Glycerol 3-phosphate
  SUBSTRATE   ATP
	      UTP
	      Glycerol
	      L-Glyceraldehyde
	      Glycerone
  PRODUCT     ADP
	      sn-Glycerol 3-phosphate
  COMMENT     Glycerone and L-glyceraldehyde can act as acceptors; UTP (and,
	      in the case of the yeast enzyme, ITP and GTP) can act as donors.
  PATHWAY     PATH: MAP00561  Glycerolipid metabolism
  DISEASE     MIM: 307030  Glycerol kinase deficiency; Glycerol kinase deficiency
			   (2)
  MOTIF       PS: PS00445  [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-
			   x(2)-[AS]-[TAIVM]-[LIVMFY]-[DEQ]
	      PS: PS00933  [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-
			   [DENQTR]-[ENQH]
  GENES       ECO: ECOLI_3824(glpK)
	      HIN: HI0691(glpK)
	      BSU: glpK
	      MGE: MG038(glpK)
	      MPN: D09_orf508(glpK)
	      SYN: slr1672
	      SCE: YHL032c(GUT1)
	      HSA: 119271
  DBLINKS     University of Geneva ENZYME DATA BANK: 2.7.1.30
	      WIT (What Is There) Metabolic Reconstruction: 2.7.1.30
	      SCOP (Structural Classificaion of Proteins): 2.7.1.30
	      PDB: 1GLA    1GLB    1GLC    1GLD    1GLE    
	      PIR: B45868  B64204  KIECGL  PQ0058  S02067  S33907  S36175  
  ///

For the COMPOUND section the following data items appear on columns 1-12:

        ENTRY
        NAME
        FORMULA
        DBLINKS
        ///

A COMPOUND entry also starts with the ENTRY data item and ends with the
end-of-entry data item.  The data items ENTRY, NAME, and end-of-entry are
mandatory, while the other data items are optional.

Example:

  ENTRY       C02426
  NAME        L-Glyceraldehyde
  FORMULA     C3H6O3
  DBLINKS     CAS: 497-09-6
              EC: 2.7.1.30    
  ///

In addition, the molecular structure is stored in a separate GIF file for
each compound, which is automatically displayed between the FORMULA and
DBLINKS data items in the WWW version of DBGET.


2.2. Continuation Lines

The name of an enzyme or a chemical compound is sometimes too long to fit
in one line, in which case continuation lines are used.  The continuation
line is indicated by the dollar sign ($) on column 13.  Note that a long
name is simply separated into two lines without any hyphenation.  This
rule applies to the following data items: NAME, SYSNAME, REACTION,
SUBSTRATE, PRODUCT, INHIBITOR, COFACTOR, and EFFECTOR.

Examples:

  NAME          5-Methyltetrahydropteroyltriglutamate--homocysteine
                $ S-methyltransferase


  REACTION      UDP-N-acetyl-D-galactosamine +
                (N-Acetylneuraminyl)-D-galactosyl-D-glucosylceramide =
                UDP + N-Acetyl-D-galactosaminyl-(N-acetylneuraminyl)-D-
                $galactosyl-D-glucosylceramide


2.3. Database Links

The reference to other databases is made by the convention used in the
DBGET system; namely, the combination of a database name and an identifier
(entry name or accession number) separated by a colon (:) is used for
cross-reference.  The database name may be an abbreviation defined in
DBGET.  This rule applies to the following data items: PATHWAY, DISEASE,
MOTIF, and DBLINKS.



3. ENZYME SECTION

3.1.  The ENTRY Data Item

The ENTRY data item contains the entry identifier, which is the EC number
assigned by NC-IUBMB (Nomenclature Committee of International Union of
Biochemistry and Molecular Biology).  The EC number was originally devised
by the first Enzyme Commission in 1961, representing the hierarchical
classification scheme with the four figures separated by periods:

    (1) the first figure is one of the six main divisions (classes),
    (2) the second figure indicates the subclass,
    (3) the third figure gives the sub-subclass, and
    (4) the fourth figure is the serial number in the sub-subclass.

These numbers are prefixed by 'EC ' in this data item.  The entire table
of classification may be browsed both in the DBGET and KEGG systems
(http://www.genome.ad.jp/htbin/get_htext?ECtable). For each entry the
meaning of the first three elements is given in the CLASS data item.
This ENTRY data item is mandatory for all entries.


3.2.  The NAME Data Item

The NAME data item contains the recommended name and, if any, alternative
names.  One name is given per one line, except for the name too long to
fit in one line (see Conventions).  The recommended name is always placed
at the first line.  This item is mandatory for all entries.


3.3.  The CLASS Data Item

The CLASS data item contains the meaning of the EC number.  Each line
corresponds to the class, subclass, and sub-subclass of the enzyme.  This
item is mandatory for all entries.


3.4.  The SYSNAME Data Item

The SYSNAME data item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction.


3.5.  The REACTION Data Item

The REACTION data item contains the chemical reaction in the form of an
equation or a text description; for example:

  REACTION      S-Adenosyl-L-methionine + Nicotinamide = 
                S-Adenosyl-L-homocysteine + 1-Methylnicotinamide

  REACTION      Hydrolysis of 1,4-beta-linkages between N-acetylmuramic
                acid and N-acetyl-D-glucosamine residues in a 
                peptidoglycan and between N-acetyl-D-glucosamine residues
                in chitodextrins

In the reaction equation, a compound name starts with the upper-case
letter excluding the prefixes.  If necessary, one name may continue to
the next line (see Conventions). If there are more than one reaction,
each reaction except for the last one ends with a semicolon (;).


3.6.  The SUBSTRATE and PRODUCT Data Items

The SUBSTRATE and PRODUCT data items contain the chemical compounds
that appear, respectively, on the left and right sides of the reaction
equation given in the REACTION data item.  In the case when the reaction
is described in text, the substrates and the products are picked up from 
the description.  The compounds known to be recognized by the enzyme as
substrates or products but not given in the REACTION data item are also
listed in this field.  Each compound name is given per line.


3.7.  The INHIBITOR, COFACTOR, and EFFECTOR Data Items

The INHIBITOR, COFACTOR, and EFFECTOR data items contain the chemical
compounds that act, respectively, as inhibitors, cofactors, and effectors.
These compounds do not appear in the reaction equation of the REACTION
data item but most of them are described in the COMMENT data item.  Each
compound name is given per line.


3.8.  The COMMENT Data Item

The COMMENT data item contains the text information commenting on the
enzyme.  The articles stating the compounds not listed in the Enzyme
Nomenclature such as newly synthesized inhibitors are also cited in this
data item.


3.9. The PATHWAY Data Item

The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) metabolic pathway database: the
pathway map accession number followed by the description.  By clicking
on this number in the WWW version of DBGET, the metabolic pathway
diagram containing the enzyme is displayed.


3.10.  The DISEASE Data Item

The DISEASE data item contains the link information to the OMIM (On-line
Mendelian Inheritance in Man) database: the MIM number followed by the
description.


3.11.  The MOTIF Data Item

The MOTIF data item contains the link information to the PROSITE
database: the PROSITE accession number followed by the sequence pattern.


3.12.  The GENES Data Item

The GENES data item contains the link information to the KEGG gene catalogs:
the abbreviation of organisms followed by the list of genes that encode the
enzyme.  The organisms are limited to Bacillus subtilis, Homo sapiens and
those whose whole genomes were sequenced and reported.  The meaning of the
abbreviation or organisms is as follows.

	ECO	Escherichia coli
	HIN	Haemophilus influenzae
	BUS	Bacillus subtilis
	MGE	Mycoplasma genitaluim
	MPN	Mycoplasma pneumoniae
	MJA	Methanococcus jannaschii
	SYN	Synechocystis sp.
	SCE	Saccharomyces cerevisiae
	HSA	Homo sapiens


3.13. The DBLINKS Data Item

The DBLINKS data item contains the link information to other databases,
including the University of Geneva ENZYME Data Bank, WIT (What Is There)
Interactive Metabolic Reconstruction on the Web at Argonne National
Laboratory, SCOP (Structural Classification of Proteins) at MRC Laboratory
of Molecular Biology and Centre for Protein Engineering, the Brookhaven
Protein Data Bank (PDB), and the PIR Protein Sequence Database.


3.14. The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'.  This item
is mandatory for all entries.



4. COMPOUND SECTION

4.1.  The ENTRY Data Item

The ENTRY data item contains the compound accession number of the LIGAND
database.  This number also corresponds to the name of the GIF file
containing the molecular structure.  This data item is mandatory for all
entries.


4.2.  The NAME Data Item

The NAME data item contains the recommended name and, if any, alternative
names.  One name is given per one line, except for the name too long to
fit in one line (see General Data Format).  The recommended name is always
placed at the first line.  This item is mandatory for all entries.


4.3.  The FORMULA Data Item

The FORMULA data item contains the chemical formula for the compound.


4.4.  The Molecular Structure File

The molecular structure (structural formula) of the compound may be
viewed and manipulated in the WWW version of DBGET.  The image of the
molecular structure stored in a GIF file can be seen between the FORMULA
and DBLINKS data items.  The two dimensional atomic coordinates are
stored in an MDL-MOL file, which can be retrieved to launch a proper
application, such as ISIS/Draw and ChemDraw, in your WWW browser.  See
the instructions (http://www.genome.ad.jp/dbget/isis_doc.html) for detail.


4.5. The DBLINKS Data Item

The DBLINKS data item contains the link information to other databases,
including the CAS (Chemical Abstracts Service) registry number and the
EC number.


4.6. The end-of-entry Data Item

The end-of-entry data item marks the end of the entry.  It is denoted by
the identifier consisting of three consecutive slashes, '///'.  This item
is mandatory for all entries.



5. ACKNOWLEDGMENTS

During 1991-1995 this database was supported by the Grant-in-Aid for
Scientific Research on the Priority Areas "Genome Informatics" to T.N.
from the Ministry of Education, Science, Sports, and Culture of Japan.
We thank Dr. Mikita Suyama for his excellent work during this period.
We also thank Ms. Takako Nishikawa and Ms. Saeko Adachi for generating
molecular structure files and Dr. Yukiteru Sugiyama for checking the
molecular strucures.