/ _ _/ ___ | _ / _ _/ _ _/ _ ___ |
/ / / _/ / | / | / / |
/ / / / | / | / / |
/ / / ____ | / | / / /
/ / / _ _/ / | / | / / /
/ / / / / | / / / /
______/ ___/ _____/ ___/ ___/ ___/ ___/ _______/
LIGAND - Ligand Chemical Database for Enzyme Reactions
Release 15.1, April 1997
User Manual
Takaaki Nishioka,
Graduate School of Agriculture, Kyoto University,
Susumu Goto and Minoru Kanehisa
Institute for Chemical Research, Kyoto University
E-mail: www@genome.ad.jp
1. INTRODUCTION
The Ligand Chemical Database for Enzyme Reactions (LIGAND) is designed to
provide the linkage between chemical and biological aspects of life in
the light of enzymatic reactions. The database consists of two sections:
the ENZYME section and the COMPOUND section. The ENZYME section is a
collection of all known enzymatic reactions classified according to the
nomenclature of the International Union of Biochemistry and Molecular
Biology (IUBMB):
International Union of Biochemistry and Molecular Biology,
"Enzyme Nomenclature: Recommendations (1992) of the Nomenclature
Committee of the International Union of Biochemistry and Molecular
Biology", Academic Press, New York (1992).
Each entry of ENZYME is identified by the EC (Enzyme Code) number, and
contains information of naming, chemical reactions, metabolic compounds,
metabolic pathways, genes encoding the enzyme for several organisms,
genetic diseases, and links to other databases including protein sequences
and 3D structural data.
The COMPOUND section is a collection of metabolic compounds including
substrates, products, and inhibitors. Each of the chemical substances
that appear in the ENZYME section is identified by an accession number
and stored in this section. Each entry of COMPOUND contains information
of naming, chemical formula, structural formula in a separate GIF file
and a MOL file, and CAS (Chemical Abstracts Service) registry number.
The LIGAND database is a major component of the DBGET integrated
database system (http://www.genome.ad.jp/dbget/dbget.links.html),
providing useful links among the existing databases. LIGAND is now
tightly coupled to the metabolic pathway database of the KEGG system
(http://www.genome.ad.jp/kegg/kegg.html), providing links to the gene
catalogs of a number of organisms.
Please cite the following paper when making use of the LIGAND database:
Suyama, M., Ogiwara, A., Nishioka, T., and Oda, J., "Searching for
amino acid sequence motifs among enzymes: the Enzyme-Reaction
Database", Comput. Appl. Biosci. 9, 9-15 (1993).
2. CONVENTIONS
2.1. General Data Format
LIGAND is constructed as a flat-file database. Similar to the data
formats of PIR and GenBank databases, a fixed number of columns are
assigned to specify the attributes of data. Each attribute is identified
by the keyword that appears on columns 1-12. When these columns are
blank, the attribute of the preceding line continues. Columns from 13
are used to describe the entities of data.
For the ENZYME section the following data items appear on columns 1-12:
ENTRY
NAME
CLASS
SYSNAME
REACTION
SUBSTRATE
PRODUCT
INHIBITOR
COFACTOR
EFFECTOR
COMMENT
PATHWAY
DISEASE
MOTIF
GENES
DBLINKS
///
An entry begins with the ENTRY data item, which is followed by the other
data items in the order shown above, and ends with the end-of-entry
(///) data item. All entries except for those deleted and transferred
contain the data items ENTRY, NAME, CLASS, and end-of-entry; the other
data items are optional.
Example:
ENTRY EC 2.7.1.30
NAME Glycerol kinase
CLASS Transferases
Transferring phosphorus-containing groups
Phosphotransferases with an alcohol group as acceptor
SYSNAME ATP:glycerol 3-phosphotransferase
REACTION ATP + Glycerol = ADP + sn-Glycerol 3-phosphate
SUBSTRATE ATP
UTP
Glycerol
L-Glyceraldehyde
Glycerone
PRODUCT ADP
sn-Glycerol 3-phosphate
COMMENT Glycerone and L-glyceraldehyde can act as acceptors; UTP (and,
in the case of the yeast enzyme, ITP and GTP) can act as donors.
PATHWAY PATH: MAP00561 Glycerolipid metabolism
DISEASE MIM: 307030 Glycerol kinase deficiency; Glycerol kinase deficiency
(2)
MOTIF PS: PS00445 [GSA]-x-[LIVMFYW]-x-G-[LIVM]-x(7,8)-[HDENQ]-[LIVMF]-
x(2)-[AS]-[TAIVM]-[LIVMFY]-[DEQ]
PS: PS00933 [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x-
[DENQTR]-[ENQH]
GENES ECO: ECOLI_3824(glpK)
HIN: HI0691(glpK)
BSU: glpK
MGE: MG038(glpK)
MPN: D09_orf508(glpK)
SYN: slr1672
SCE: YHL032c(GUT1)
HSA: 119271
DBLINKS University of Geneva ENZYME DATA BANK: 2.7.1.30
WIT (What Is There) Metabolic Reconstruction: 2.7.1.30
SCOP (Structural Classificaion of Proteins): 2.7.1.30
PDB: 1GLA 1GLB 1GLC 1GLD 1GLE
PIR: B45868 B64204 KIECGL PQ0058 S02067 S33907 S36175
///
For the COMPOUND section the following data items appear on columns 1-12:
ENTRY
NAME
FORMULA
DBLINKS
///
A COMPOUND entry also starts with the ENTRY data item and ends with the
end-of-entry data item. The data items ENTRY, NAME, and end-of-entry are
mandatory, while the other data items are optional.
Example:
ENTRY C02426
NAME L-Glyceraldehyde
FORMULA C3H6O3
DBLINKS CAS: 497-09-6
EC: 2.7.1.30
///
In addition, the molecular structure is stored in a separate GIF file for
each compound, which is automatically displayed between the FORMULA and
DBLINKS data items in the WWW version of DBGET.
2.2. Continuation Lines
The name of an enzyme or a chemical compound is sometimes too long to fit
in one line, in which case continuation lines are used. The continuation
line is indicated by the dollar sign ($) on column 13. Note that a long
name is simply separated into two lines without any hyphenation. This
rule applies to the following data items: NAME, SYSNAME, REACTION,
SUBSTRATE, PRODUCT, INHIBITOR, COFACTOR, and EFFECTOR.
Examples:
NAME 5-Methyltetrahydropteroyltriglutamate--homocysteine
$ S-methyltransferase
REACTION UDP-N-acetyl-D-galactosamine +
(N-Acetylneuraminyl)-D-galactosyl-D-glucosylceramide =
UDP + N-Acetyl-D-galactosaminyl-(N-acetylneuraminyl)-D-
$galactosyl-D-glucosylceramide
2.3. Database Links
The reference to other databases is made by the convention used in the
DBGET system; namely, the combination of a database name and an identifier
(entry name or accession number) separated by a colon (:) is used for
cross-reference. The database name may be an abbreviation defined in
DBGET. This rule applies to the following data items: PATHWAY, DISEASE,
MOTIF, and DBLINKS.
3. ENZYME SECTION
3.1. The ENTRY Data Item
The ENTRY data item contains the entry identifier, which is the EC number
assigned by NC-IUBMB (Nomenclature Committee of International Union of
Biochemistry and Molecular Biology). The EC number was originally devised
by the first Enzyme Commission in 1961, representing the hierarchical
classification scheme with the four figures separated by periods:
(1) the first figure is one of the six main divisions (classes),
(2) the second figure indicates the subclass,
(3) the third figure gives the sub-subclass, and
(4) the fourth figure is the serial number in the sub-subclass.
These numbers are prefixed by 'EC ' in this data item. The entire table
of classification may be browsed both in the DBGET and KEGG systems
(http://www.genome.ad.jp/htbin/get_htext?ECtable). For each entry the
meaning of the first three elements is given in the CLASS data item.
This ENTRY data item is mandatory for all entries.
3.2. The NAME Data Item
The NAME data item contains the recommended name and, if any, alternative
names. One name is given per one line, except for the name too long to
fit in one line (see Conventions). The recommended name is always placed
at the first line. This item is mandatory for all entries.
3.3. The CLASS Data Item
The CLASS data item contains the meaning of the EC number. Each line
corresponds to the class, subclass, and sub-subclass of the enzyme. This
item is mandatory for all entries.
3.4. The SYSNAME Data Item
The SYSNAME data item contains the systematic name given by the Enzyme
Commission, representing the nature of the chemical reaction.
3.5. The REACTION Data Item
The REACTION data item contains the chemical reaction in the form of an
equation or a text description; for example:
REACTION S-Adenosyl-L-methionine + Nicotinamide =
S-Adenosyl-L-homocysteine + 1-Methylnicotinamide
REACTION Hydrolysis of 1,4-beta-linkages between N-acetylmuramic
acid and N-acetyl-D-glucosamine residues in a
peptidoglycan and between N-acetyl-D-glucosamine residues
in chitodextrins
In the reaction equation, a compound name starts with the upper-case
letter excluding the prefixes. If necessary, one name may continue to
the next line (see Conventions). If there are more than one reaction,
each reaction except for the last one ends with a semicolon (;).
3.6. The SUBSTRATE and PRODUCT Data Items
The SUBSTRATE and PRODUCT data items contain the chemical compounds
that appear, respectively, on the left and right sides of the reaction
equation given in the REACTION data item. In the case when the reaction
is described in text, the substrates and the products are picked up from
the description. The compounds known to be recognized by the enzyme as
substrates or products but not given in the REACTION data item are also
listed in this field. Each compound name is given per line.
3.7. The INHIBITOR, COFACTOR, and EFFECTOR Data Items
The INHIBITOR, COFACTOR, and EFFECTOR data items contain the chemical
compounds that act, respectively, as inhibitors, cofactors, and effectors.
These compounds do not appear in the reaction equation of the REACTION
data item but most of them are described in the COMMENT data item. Each
compound name is given per line.
3.8. The COMMENT Data Item
The COMMENT data item contains the text information commenting on the
enzyme. The articles stating the compounds not listed in the Enzyme
Nomenclature such as newly synthesized inhibitors are also cited in this
data item.
3.9. The PATHWAY Data Item
The PATHWAY data item contains the link information to the KEGG (Kyoto
Encyclopedia of Genes and Genomes) metabolic pathway database: the
pathway map accession number followed by the description. By clicking
on this number in the WWW version of DBGET, the metabolic pathway
diagram containing the enzyme is displayed.
3.10. The DISEASE Data Item
The DISEASE data item contains the link information to the OMIM (On-line
Mendelian Inheritance in Man) database: the MIM number followed by the
description.
3.11. The MOTIF Data Item
The MOTIF data item contains the link information to the PROSITE
database: the PROSITE accession number followed by the sequence pattern.
3.12. The GENES Data Item
The GENES data item contains the link information to the KEGG gene catalogs:
the abbreviation of organisms followed by the list of genes that encode the
enzyme. The organisms are limited to Bacillus subtilis, Homo sapiens and
those whose whole genomes were sequenced and reported. The meaning of the
abbreviation or organisms is as follows.
ECO Escherichia coli
HIN Haemophilus influenzae
BUS Bacillus subtilis
MGE Mycoplasma genitaluim
MPN Mycoplasma pneumoniae
MJA Methanococcus jannaschii
SYN Synechocystis sp.
SCE Saccharomyces cerevisiae
HSA Homo sapiens
3.13. The DBLINKS Data Item
The DBLINKS data item contains the link information to other databases,
including the University of Geneva ENZYME Data Bank, WIT (What Is There)
Interactive Metabolic Reconstruction on the Web at Argonne National
Laboratory, SCOP (Structural Classification of Proteins) at MRC Laboratory
of Molecular Biology and Centre for Protein Engineering, the Brookhaven
Protein Data Bank (PDB), and the PIR Protein Sequence Database.
3.14. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
4. COMPOUND SECTION
4.1. The ENTRY Data Item
The ENTRY data item contains the compound accession number of the LIGAND
database. This number also corresponds to the name of the GIF file
containing the molecular structure. This data item is mandatory for all
entries.
4.2. The NAME Data Item
The NAME data item contains the recommended name and, if any, alternative
names. One name is given per one line, except for the name too long to
fit in one line (see General Data Format). The recommended name is always
placed at the first line. This item is mandatory for all entries.
4.3. The FORMULA Data Item
The FORMULA data item contains the chemical formula for the compound.
4.4. The Molecular Structure File
The molecular structure (structural formula) of the compound may be
viewed and manipulated in the WWW version of DBGET. The image of the
molecular structure stored in a GIF file can be seen between the FORMULA
and DBLINKS data items. The two dimensional atomic coordinates are
stored in an MDL-MOL file, which can be retrieved to launch a proper
application, such as ISIS/Draw and ChemDraw, in your WWW browser. See
the instructions (http://www.genome.ad.jp/dbget/isis_doc.html) for detail.
4.5. The DBLINKS Data Item
The DBLINKS data item contains the link information to other databases,
including the CAS (Chemical Abstracts Service) registry number and the
EC number.
4.6. The end-of-entry Data Item
The end-of-entry data item marks the end of the entry. It is denoted by
the identifier consisting of three consecutive slashes, '///'. This item
is mandatory for all entries.
5. ACKNOWLEDGMENTS
During 1991-1995 this database was supported by the Grant-in-Aid for
Scientific Research on the Priority Areas "Genome Informatics" to T.N.
from the Ministry of Education, Science, Sports, and Culture of Japan.
We thank Dr. Mikita Suyama for his excellent work during this period.
We also thank Ms. Takako Nishikawa and Ms. Saeko Adachi for generating
molecular structure files and Dr. Yukiteru Sugiyama for checking the
molecular strucures.