The DOCK suite of programs is designed to find favorable orientations of a ligand in a "receptor." It can be subdivided into (i) those programs related directly to docking of ligands and (ii) accessory programs. We limit the discussion here to only those programs and methods related to docking a ligand in a receptor. A typical receptor might be an enzyme with a well-defined active site, though any macromolecule may be used (e.g. a structural protein, a nucleic acid strand, a "true" receptor). We'll use an enzyme as an example in the rest of this discussion.
The starting point of all docking calculations is generally the crystal structure of an enzyme from an enzyme-ligand complex. The ligand structure may be taken from the crystal structure of the enzyme-ligand complex or from a database of compounds, such as the Cambridge Crystallographic Database [1] or the Concord-generated [2] set of coordinates from the Available Chemicals Directory (from Molecular Design, Ltd., San Leandro, CA). The primary consideration in the design of our docking programs has been to develop methods which are both rapid and reasonably accurate. These programs can be separated functionally into roughly two parts, each somewhat independent of the other:
The ligand orientation in a receptor site is broken down into a series of steps, in different programs. First, a potential site of interest on the receptor is identified. (Oftentimes, the active site is the site of interest and is known a priori.) Within this site, points are identified where ligand atoms may be located. A routine from the DOCK suite of programs identifies these points, called sphere centers, by generating a set of overlapping spheres which fill the site. Rather than using DOCK to generate these sphere centers, important positions within the active site may be identified by some other mechanism and used by DOCK as sphere centers. For example, the positions of atoms from the bound ligand may be used as these sphere centers. Or, a grid may be generated within the site and each grid point may be considered as a sphere center. Our sphere centers, however, attempt to capture shape characteristics of the active site (or site of interest) with a minimum number of points and without the bias of previously known ligand binding modes.
To orient a ligand within the active site, some of the sphere centers are "matched" with ligand atoms. That is, a sphere center is "paired" with an ligand atom. Many sets of these atom-sphere pairs are generated, each set containing only a small number of sphere-atom pairs. In order to limit the number of possible sets of atom-sphere pairs, a longest distance heuristic is used; (long) inter-sphere distances are roughly equal to the corresponding (long) inter-atomic ligand distances. A set of atom-sphere pairs is used to calculate an orientation of the ligand within the site of interest. The set of sphere-atom pairs which are used to generate an orientation is often referred to as a match. The translation vector and rotation matrix which minimizes the rmsd of (transformed) ligand atoms and matching sphere centers of the sphere-atom set are calculated and used to orient the entire ligand within the active site.
The orientation of the ligand is evaluated with a shape scoring function and/or a function approximating the ligand-enzyme binding energy. All evaluations are done on (scoring) grids in order to minimize the overall computational time. At each grid point, the enzyme contributions to the score is calculated and stored. That is, receptor contributions to the score, potentially repetitive and time consuming, are calculated only once; the appropriate terms are then simply fetched from memory.
The shape scoring function is an empirical function resembling the van der Waal attractive energy. To generate the shape score, the receptor terms from the grid point nearest to each non-hydrogen ligand atom are summed together. That is, the shape score is determined simply by the position of each ligand atom on the shape scoring grid.
The ligand-enzyme binding energy is taken to be approximately the sum of the van der Waal attractive, van der Waal dispersive, and Coulombic electrostatic energies. Approximations are made to the usual molecular mechanics attractive and dispersive terms for use on a grid. To generate the energy score, the ligand atom terms are combined with the receptor terms from the nearest grid point, or combined with receptor terms from a "virtual" grid point with interpolated receptor values. The score is the sum of over all ligand atoms for these combined terms. In this case, the energy score is determined by both ligand atom types and ligand atom positions on the energy grids.
As a final step, in the energy scoring scheme, the orientation of the ligand may be varied slightly to minimize the energy score. That is, after the initial orientation and evaluation (scoring) of the ligand, a grid-based rigid body simplex minimization is used to locate the nearest local energy minimum. The sphere centers themselves are simply approximations to possible atom locations; the orientations generated by the sphere-atom pairing, although reasonable, may not be minimal in energy.
Spheres are generated to fill the target site. The sphere centers are putative
ligand atom positions. Their use is an attempt to limit the enormous number of
possible orientations within the active site. Like ligand atoms, these spheres
touch the surface of the molecule at two points and do not intersect the
molecule. The spheres are allowed to intersect other spheres; i.e. they
have volumes which overlap. Each sphere is represented by the coordinates of
its center and its radius. Only the coordinates of the sphere centers are used
to orient ligands within the active site (see above).
The number of orientations of the ligand in free space is vast. The number of
orientations possible from all sets of sphere-atom pairings is smaller but
still large and cannot be generated and evaluated (scored) in a reasonable
length of time. Consequently, various filters can be (and are) used to
eliminate from consideration, before evaluation, sets of sphere-atoms pairs,
which will generate poorly scoring orientations. That is, only a small subset
of the number of possible ligand orientations are actually generated and
scored. The longest-distance heuristic is one filter. Sphere "coloring" and
identification of "critical" spheres are other filters.
In the longest-distance heuristic, sphere-sphere distances are compared to
atom-atom distances. Sets of sphere-atom pairs are generated in the following
manner: sphere i is paired with atom I if and only if for every
sphere k in the set and for every atom K in the set,
Longest inter-atomic distances are considered first. When the atom-sphere
pair set contains four or more pairs, and when primarily the longest distances
are considered, the ligand orientation which is generated will likely fit in
the active site, since the rmsd of transformed coordinates of the (matching)
atoms and coordinates of the spheres will likely be small. This can be seen by
examining the simple (extreme) case: if dik = dIK for every sphere
i and k and every atom I and K, and there are the
same number of atoms as spheres, then the transformed coordinates of the
spheres and the coordinates of atoms will be the identical, provided that the
chirality of the spheres is the same as that of the matching ligand atoms.*
*Note: since DOCK matches spheres with ligand atoms by comparing distances
between sphere pairs and ligand atom pairs, the mirror image of the ligand
atoms (used in the match) may be a better fit, in the rmsd sense, to the
spheres, than the atoms of the real, non-mirror-reflected ligand. Consider,
for example, the distances between the four atoms bonded to a chiral carbon
center and the distances between the four atoms bonded to the mirror-image of
that chiral carbon center: the distances are the same, but the two sets of
four atoms cannot be superimposed upon one another, unless the chirality of one
is reversed. In a similar manner, the chirality of the ligand atoms used in
the match may be opposite to that of the matching spheres.
Program sphgen identifies the active site, and other sites of interest,
and generates the sphere centers which fill the site. It has been described in
the original paper: Kuntz, et al. [5]. Program distmap
generates the grid-based shape scoring function; details can be found in
Shoichet, Bodian and Kuntz [6]. Program chemgrid generates the energy
based scoring function; details can be found in Meng, Shoichet and Kuntz [7].
Within the DOCK suite of programs, the program DOCK matches spheres (generated
by sphgen) with ligand atoms and uses scoring grids (from distmap
and chemgrid) to evaluate ligand orientations; descriptions can be found
in Kuntz, et al. [5]) and Shoichet, Bodian and Kuntz [6].
Program DOCK
also minimizes energy based scores; description of minimization can be found in
Meng, Gschwend, Blaney and Kuntz [8].
Several stand-alone docking-related programs exist. Program cluster
generates alternative clusters of sphere centers within the active site. It
uses as input, files from program sphgen. Program scoreopt scores
a ligand orientation with the DOCK scoring functions; scoring grids are needed;
for energy scoring, ligand atom van der Waal attractive and dispersive factors
and partial charges are also required. Program dockmin_sim alters a
ligand orientation to minimize its energy score; an energy scoring grid is
needed as well as ligand van der Waal attractive and dispersive factors (types)
and partial charges.
1. Allen, F.H., Bellard, S., Brice, M.D., Cartwright, B.A., Doubleday, A.,
Higgs, H., Hummelink, T., Hummelink-Peters, B.G., Kennard, O., Motherwell,
W.D.S., Rodgers, J.R. and Watson, D.G. The Cambridge Crystallographic Data
Centre: computer-based search, retrieval, analysis and display of information.
Acta Cryst. B35: 2331-2339, 1979.
2. Rusinko, A., Sheridan, R.P., Nilakatan, R., Haraki, K.S., Bauman, N. and
Venkataraghavan, R. Using CONCORD to construct a large database of
3-dimensional coordinates from connection tables. J. Chem. Info. Comput.
Sci. 29: 251-255, 1989.
3. Kuntz, I.D. Structure-based strategies for drug design and discovery.
Science 257: 1078-1082, 1992.
4. Kuntz, I.D., Meng, E.C. and Shoichet, B.K. Structure-based molecular design.
Acc. Chem. Res. 27(5): 117-123, 1994.
5. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. and Ferrin, T.E. A
geometric approach to macromolecule-ligand interactions. J. Mol. Biol.
161: 269-288, 1982.
6. Shoichet, B.K., Bodian, D.L. and Kuntz, I.D. Molecular docking using shape
descriptors. J. Comp. Chem. 13(3): 380-397, 1992.
7. Meng, E.C., Shoichet, B.K. and Kuntz, I.D. Automated docking with grid-based
energy evaluation. J. Comp. Chem. 13: 505-524, 1992.
8. Meng, E.C., Gschwend, D.A., Blaney, J.M. and Kuntz, I.D. Orientational
sampling and rigid-body minimization in molecular docking. Proteins.
17(3): 266-278, 1993.
Specific Concepts: mechanisms to limit CPU time
Sphere Centers
From an unknown source: "...what's good about DOCK is that it uses spheres;
what's bad about DOCK is that it uses spheres..."Chemical Matching
DOCK spheres are generated without regard to the chemical properties of the
nearby receptor atoms. Sphere "chemical matching" or "coloring" associates a
chemical property to spheres and a sphere of one "color" can only be matched
with a ligand atom of complementary color. These chemical properties may be
things such as "hydrogen-bond donor," "hydrogen-bond acceptor," "hydrophobe,"
"electro-positive," "electro-negative," "neutral," etc. Neither the
colors themselves, nor the complementarity of the colors, are determined by the
DOCK suite of programs; DOCK simply uses these labels. With the inclusion of
coloring, only ligand atoms with the appropriate chemical properties are
matched to the complementarily colored spheres. It is probably more likely,
then, that the orientation generated will produce a favorable score.
Conversely, by excluding colored spheres from pairing with certain ligand
atoms, the number of (probably) unfavorable orientations which are generated
and evaluated can be reduced. Note that requiring complementarity in matching
does not mean that all ligand atoms will lie in chemically complementary
regions of the enzyme. Rather, only those ligand atoms, when paired with a
colored sphere which is part of the sphere-atom match, will be guaranteed to be
in the chemically complementary region of the enzyme (provided chirality of the
spheres is the same as that of the matching ligand atoms).Critical Spheres
The "critical sphere" filter requires that critical spheres be part of the set
of sphere-atom pairs used to orient the ligand. Designating spheres as critical
spheres forces the ligand to have at least one atom in that area of the enzyme,
where that sphere is located. This filter may be useful, for example, when it
is known that a ligand must occupy a particular area of an active site. This
filter removes from consideration any orientation that does not guarantee at
least one ligand atom in critical areas of the enzyme (provided chirality of
the spheres is the same as that of the matching ligand atom).Scoring Filters
After a ligand is oriented within the active site, the orientation is
evaluated. In an attempt to reduce the total computational time, after the
ligand is oriented in the site, ligand atoms are first checked to determine
whether or not they occupy space already occupied by the receptor. This is
often referred to as "bump checking." If too many of such "bumps" are found,
then the ligand probably intersects the receptor; consequently, the ligand
orientation is discarded before evaluation.
Caveats
In the attempt to balance computational processing time and accuracy,
approximations and simplifications were made to the scoring functions. The
interaction energy function, for example, lacks explicit hydrogen-bonding
terms, solvation/desolvation terms, or hydrophobicity terms. More accurate
methods do exist for evaluating ligand docking, but at the expense of
additional computational time. DOCK will do no better than the accuracy of its
scoring function. That is, its ability to predict a novel ligand binding
orientation and reproduce a crystal orientation is limited by the accuracy of
its scoring function.
Programs and References
The routines which actually perform the steps described above can be found in
different programs and details can be found in various papers. We list a small
subset of papers. Review articles on the method can be found in Kuntz [3] and
Kuntz, Meng and Shoichet [4].
Bibliography
Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)