Kuntz Home / DOCK Home / DOCKumentation Contents / New Features

Degeneracy Checking

Daniel Gschwend

Overview / Theory / Usage / Output / Cautions

Overview

Given the large number of spatially distributed spheres and atoms involved in docking, it is not surprising that there are many ways of pairing them which give rise to similar geometric orientations. This is obviously the result of over-sampling in certain regions, but without which some binding modes would be under-sampled or even overlooked. In the absence of refinement, this over-sampling provides a sort of rigid body minimization itself. A better way to optimize local interactions is to find only one orientation per "family" (i.e. mode of binding) and energy-minimize that orientation and never again pay close attention to further orientations generated in that family. Code to this end has been implemented in the form of what is hereafter termed "degeneracy checking" (Gschwend, manuscript in preparation). Hence, only the first orientation found in a family is actually minimized. Subsequent orientations that fall into the same family are not scored. This not only judiciously reduces the number of orientations that are minimized, but also the number of orientations written out and viewed by the user on completion of the DOCK run (in the case of SINGLE mode).

Analogous to the reasoning behind saving one member of productive ("interesting") families to prevent excessive minimization within a binding mode, it is desirable to do the same for non-productive (bumping) families. Any orientation found to exceed the number of allowed bumps constitutes a family within which no other orientations will be permitted (subject to the degenerate_save_interval parameter, see below), thus saving precious minimization time in uninteresting binding modes.

Caution: Degeneracy checking is still an experimental feature. It is designed to be used as a tool in speeding up DOCK runs involving force-field score minimization, but it can often be equally effective to use minimization without degeneracy checking with greatly reduced bin sizes.

Theory

The

degenerate_save_interval

parameter has the desirable effect of smoothing our sampling over all binding modes.

Usage

Virtual spheres are a reduced set of averaged spheres derived from your sphgen sphere center output used in determining orientation degeneracy. These spheres act as way points within the active site for defining the geometry of an orientation. The following program works by clustering all spheres which have any neighbor within a specified radius and computing an average position for a so-called "virtual sphere." The fewer virtual spheres used the more orientations will be considered degenerate and run time will be shorter; but beware, too few virtual spheres results in overlooked binding modes.

Create virtual spheres as follows. Reduce the actual sphgen sphere cluster to virtual spheres by running the interactive program virtual_spheres on your sphgen sphere center file. For the intersphere cutoff distance (parameter vsph), use a value of 2.0 Å. Smaller cutoff distances have less of a reducing effect on the original sphere set than do larger cutoff distances. Having more virtual spheres (smaller cutoff distance) provides fewer degenerate orientations. Consequently, runtime will be longer but accuracy improved. Note that a file called merge.lst was created - this file lists which true sphgen spheres were averaged (merged) into which virtual sphere. View the virtual sphere PDB file on a graphics terminal to verify that they are evenly spaced at a sufficient density for your application.

vsph

Removing degenerate orientations requires a knowledge of which virtual sphere is closest to any atom of an orientation. This is easily accomplished by creating a grid with this information which can merely be looked up, rather than calculated on-the-fly. Run the interactive program sphgrid. You will need to supply the following information:

the name of the virtual sphere file created in the previous step
the PDB box file enclosing the active site, used with chemgrid
the resolution of the grid (use the same as used in chemgrid)
the prefix for chemgrid grid files

A grid file with a .sph extension will be written.

Add the check_degeneracy keyword to your INDOCK parameter file. Other parameters should be customized by using the appropriate keyword, as described below:

bin sizes: should be increased relative to a minimization with no degeneracy checking run because degeneracy checking circumvents many minimizations.
bump_maximum: should be taken under consideration. In contrast with minimization without degeneracy checking, increasing the number of bumps does not necessarily increase result quality. A bell-shaped curve is observed for "result quality" vs. number of bumps. At too few bumps, it is often the case that an orientation near the true binding mode can never be found because too few steric clashes are being allowed for minimization to resolve - the minimizer is not used to its fullest potential. At the other end of the spectrum, too many bumps allows storage of potentially terrible orientations to which other orientations will be considered degenerate - the best orientations in this binding mode will probably be thrown out.
degeneracy_wobble: dictates how stringent degeneracy checking will be. This parameter is inversely related to run-time. Higher degeneracy_wobble values permit looser degeneracy checking, resulting in more degenerate orientations and fewer orientations minimized. degeneracy_wobble of 0 is usually too stringent and run-times can become inflated. degeneracy_wobble of 2 often provides fast results of good quality.
degenerate_save_interval: specifies the number of times an orientation must be found in any given family before minimizing and attempting to save another member of the same family. So, if a particular binding mode is found 100 times and degenerate_save_interval is 25, DOCK will minimize and evaluate four additional members of this family in hopes of finding a better scoring representative. This parameter provides a method for retaining additional members of popular families and smoothing the bias of sphere locations. Recommended values are in the range 10 to 25.
check_degenerate_children: indicates whether to check for degeneracy against orientations which were saved as multiples of a family due to the degenerate_save_intervalparameter. Recommended value is off.

Output

Check the OUTDOCK file to insure that the run is proceeding or has completed successfully. Confirmation that minimization and degeneracy checking have taken effect should be reported. A number of statistics concerning degeneracy and minimization are given at the end of the OUTDOCK file.

SINGLE runs: For each center for which matching has been completed, statistics on the number of matches attempted, number of unique orientations found, and number and percent of degenerate orientations are given. A file called family.log is also written which contains an orientation degeneracy histogram, providing the energy, starting energy, and rms deviation of each unique orientation, as well as how many times it was seen. Also included are the parent orientation for any children saved due to the degenerate_save_interval parameter.

SEARCH runs: For each compound, the number of matches attempted, number of unique matches, and number of degenerate orientations are reported.

An interesting feature: the family.log statistics (energy, starting energy, and RMS deviation) may actually be compiled while still minimizing every orientation generated. This can be accomplished by setting degenerate_save_interval to 1 (thus saving every multiple in a family) and turning check_degenerate_children off (degeneracy_wobble may assume any value). This method does, of course, require the creation of all files required for degeneracy checking.

Cautions

Memory requirements
Degeneracy checking can be very memory intensive because of all the information that must be stored. Should memory problems be encountered, the easiest work-around is to reduce the dimensions of the arrays in the header file degeneracy.h. Specifically, maxfamily may be reduced to the maximum number of expected unique matches - I'd recommend going no lower than 5,000. Also, several alternative prime number pairs are given in this header file for dimensioning the hash table with the maxhash and hash2 parameters.
Filling of hash table
The hash table used in degeneracy checking is an effective tool for circumventing the immense memory requirements for the task at hand. With standard DOCK runs, using a hash table can be just as rapid as storing all the required information in memory, were that much memory available. However, when the hash table becomes full, performance can degrade very rapidly. It is therefore imperative that the hash table be dimensioned sufficiently large. Degeneracy checking is not meant to be used for long SINGLE mode dock runs which generate upwards of hundreds of thousands of matches. The OUTDOCK file reports the percentage of the hash table used in single mode runs - this number should probably be less than 50% for efficient runs. Also, an "average number of search steps per match" (supplied at the end of the OUTDOCK file) greater than 5 can indicate performance degradation due to a full hash table.
Focusing
It has not yet been established if degeneracy checking is compatible with focusing. Bear in mind that focusing is a form of discrete optimization, while minimization is a continuous optimization. It may not be efficient, nor desirable, to use both concurrently.

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)

Degeneracy Checking

Overview / Theory / Usage / Output / Cautions

Overview

Theory

Problem description