Kuntz Home / DOCK Home / DOCKumentation Contents / New Features
prev section prev toc up next next section

Degeneracy Checking

Daniel Gschwend
Overview / Theory / Usage / Output / Cautions

Overview

Given the large number of spatially distributed spheres and atoms involved in docking, it is not surprising that there are many ways of pairing them which give rise to similar geometric orientations. This is obviously the result of over-sampling in certain regions, but without which some binding modes would be under-sampled or even overlooked. In the absence of refinement, this over-sampling provides a sort of rigid body minimization itself. A better way to optimize local interactions is to find only one orientation per "family" (i.e. mode of binding) and energy-minimize that orientation and never again pay close attention to further orientations generated in that family. Code to this end has been implemented in the form of what is hereafter termed "degeneracy checking" (Gschwend, manuscript in preparation). Hence, only the first orientation found in a family is actually minimized. Subsequent orientations that fall into the same family are not scored. This not only judiciously reduces the number of orientations that are minimized, but also the number of orientations written out and viewed by the user on completion of the DOCK run (in the case of SINGLE mode).

Analogous to the reasoning behind saving one member of productive ("interesting") families to prevent excessive minimization within a binding mode, it is desirable to do the same for non-productive (bumping) families. Any orientation found to exceed the number of allowed bumps constitutes a family within which no other orientations will be permitted (subject to the degenerate_save_interval parameter, see below), thus saving precious minimization time in uninteresting binding modes.

Caution: Degeneracy checking is still an experimental feature. It is designed to be used as a tool in speeding up DOCK runs involving force-field score minimization, but it can often be equally effective to use minimization without degeneracy checking with greatly reduced bin sizes.

Theory

Usage

Virtual spheres are a reduced set of averaged spheres derived from your
sphgen sphere center output used in determining orientation degeneracy. These spheres act as way points within the active site for defining the geometry of an orientation. The following program works by clustering all spheres which have any neighbor within a specified radius and computing an average position for a so-called "virtual sphere." The fewer virtual spheres used the more orientations will be considered degenerate and run time will be shorter; but beware, too few virtual spheres results in overlooked binding modes.

Create virtual spheres as follows. Reduce the actual sphgen sphere cluster to virtual spheres by running the interactive program virtual_spheres on your sphgen sphere center file. For the intersphere cutoff distance (parameter vsph), use a value of 2.0 Å. Smaller cutoff distances have less of a reducing effect on the original sphere set than do larger cutoff distances. Having more virtual spheres (smaller cutoff distance) provides fewer degenerate orientations. Consequently, runtime will be longer but accuracy improved. Note that a file called merge.lst was created - this file lists which true sphgen spheres were averaged (merged) into which virtual sphere. View the virtual sphere PDB file on a graphics terminal to verify that they are evenly spaced at a sufficient density for your application.

Removing degenerate orientations requires a knowledge of which virtual sphere is closest to any atom of an orientation. This is easily accomplished by creating a grid with this information which can merely be looked up, rather than calculated on-the-fly. Run the interactive program sphgrid. You will need to supply the following information:

A grid file with a .sph extension will be written.

Add the check_degeneracy keyword to your INDOCK parameter file. Other parameters should be customized by using the appropriate keyword, as described below:

bin sizes
should be increased relative to a minimization with no degeneracy checking run because degeneracy checking circumvents many minimizations.

bump_maximum
should be taken under consideration. In contrast with minimization without degeneracy checking, increasing the number of bumps does not necessarily increase result quality. A bell-shaped curve is observed for "result quality" vs. number of bumps. At too few bumps, it is often the case that an orientation near the true binding mode can never be found because too few steric clashes are being allowed for minimization to resolve - the minimizer is not used to its fullest potential. At the other end of the spectrum, too many bumps allows storage of potentially terrible orientations to which other orientations will be considered degenerate - the best orientations in this binding mode will probably be thrown out.

degeneracy_wobble
dictates how stringent degeneracy checking will be. This parameter is inversely related to run-time. Higher degeneracy_wobble values permit looser degeneracy checking, resulting in more degenerate orientations and fewer orientations minimized. degeneracy_wobble of 0 is usually too stringent and run-times can become inflated. degeneracy_wobble of 2 often provides fast results of good quality.

degenerate_save_interval
specifies the number of times an orientation must be found in any given family before minimizing and attempting to save another member of the same family. So, if a particular binding mode is found 100 times and degenerate_save_interval is 25, DOCK will minimize and evaluate four additional members of this family in hopes of finding a better scoring representative. This parameter provides a method for retaining additional members of popular families and smoothing the bias of sphere locations. Recommended values are in the range 10 to 25.

check_degenerate_children
indicates whether to check for degeneracy against orientations which were saved as multiples of a family due to the degenerate_save_intervalparameter. Recommended value is off.

Output

Check the OUTDOCK file to insure that the run is proceeding or has completed successfully. Confirmation that minimization and degeneracy checking have taken effect should be reported. A number of statistics concerning degeneracy and minimization are given at the end of the OUTDOCK file.

SINGLE runs: For each center for which matching has been completed, statistics on the number of matches attempted, number of unique orientations found, and number and percent of degenerate orientations are given. A file called family.log is also written which contains an orientation degeneracy histogram, providing the energy, starting energy, and rms deviation of each unique orientation, as well as how many times it was seen. Also included are the parent orientation for any children saved due to the degenerate_save_interval parameter.

SEARCH runs: For each compound, the number of matches attempted, number of unique matches, and number of degenerate orientations are reported.

An interesting feature: the family.log statistics (energy, starting energy, and RMS deviation) may actually be compiled while still minimizing every orientation generated. This can be accomplished by setting degenerate_save_interval to 1 (thus saving every multiple in a family) and turning check_degenerate_children off (degeneracy_wobble may assume any value). This method does, of course, require the creation of all files required for degeneracy checking.

Cautions


prev section prev toc up next next section

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)