Orientation Search

dock version 4.0 has a new orientation search algorithm, or matching algorithm, which is more robust than before (see Ewing and Kuntz [ 6 ]). An orientation search is requested with the orient_ligand parameter. The published search technique has been further extended so that the amount of orientation sampling can be controlled in two ways:

There are a number of sophisticated options available to tailor the orientation search. These options include:

Multiple orientations may be written out for each molecule using the write_orientations parameter, otherwise only the best orientation is recorded. A ranked list of the orientations may be written using the rank_orientations parameter. Otherwise, all orientations passing a score cutoff are written out. The score cutoff is specified with the contact_maximum parameter and so on for each type of scoring. If write_orientations is requested without scoring, then all orientations are written.

Automated Matching

With automated_matching , dock performs the same amount of orientation searching on each molecule. If the match_receptor_sites parameter is set, then Manual Matching is used as a black box engine for the orientation search (otherwise a Random Search is performed). The only sampling parameter needed is the maximum_orientations parameter, which is the number of desired orientations which survive the bump filter. Matches are formed in order of the smallest distance error first, so that the highest quality orientations are guaranteed to come sooner rather than later. This method of control is incredibly easy. It is most appropriate when docking a single molecule. It should not be used for database docking, since manual matching performs better because it biases the amount of sampling depending on the size and shape of the ligand. In addition, if the user wishes to use advanced matching features, like Chemical Matching and Critical Points , then manual matching must be used.

Manual Matching

If the match_receptor_sites parameter is set but not the automated_matching parameter, then manual matching is performed. It is controlled by the match parameters listed in Table 4. The matching parameters provide an intuitive way to control sampling. When multiple molecules are docked, matching will bias sampling towards molecules with more internal distance similarity with the receptor site points. The additional chemical and critical matching constraints provide a way to prune matching and further bias sampling towards more interesting molecules.

Table 4. Description of Matching Parameters

distance_tolerance	The distance tolerance can be viewed as the uncertainty in the distance comparisons or site point positions. The more generous the uncertainty in the distance comparisons, the more sampling will be performed. This parameter should be the first parameter to adjust if you need to change the amount of sampling.
distance_minimum	The distance minimum allows matching to focus on the longer distances which convey more information about molecule or site shape. This value can be conveniently set large enough to discard atoms directly bonded to each other. When docking large molecules, this value can be set higher.
nodes_minimum	The minimum number of nodes must be at least three to specify a unique rigid transformation. A value of four or more will allow every match to include information about chirality. Match chirality can be used to explore the mirror image of a molecule for docking. The higher this parameter, the better the ligand atoms in the match represent the entire molecule.
nodes_maximum	This value may be set arbitrarily high to prevent it from influencing matching. It may be set equal to the nodes minimum when performing pharmacophore-style matching if only a few specific site interactions are of interest.

Random Search

The random_search option is intended for advanced users. If match_receptor_sites is also set then random matching is performed, in which ligand centers and receptor sites are randomly matched regardless of internal distances. Otherwise, a random transformation search is performed, in which ligands are randomly rotated and translated within the rectangular box enclosing all the site points. Both methods could be employed when the user is concerned about the quality of the site point positions, or would simply like to try a richer set of generated orientations.

Site Point Construction

The random_search option is useful for exploring issues relating to site point construction. As discussed in Ewing and Kuntz [ 6 ], both random matching and random transformation were useful control algorithms to test the effectiveness of distance-based matching. The relative performance of random matching with respect to random transformation indicates how well the site points map out the relevant volume of the active site. The relative performance of distance-based matching with respect to random matching indicates how well individual positions of each site point correspond to good ligand atom positions. By using both of these search methods, an advanced user may quantify the quality of site points constructed by alternative methods to sphgen .

The random transformation search may in fact be used to construct site points to supplement those from sphgen . Using this search, the user may probe a site with different molecular probes much like the atomic probes used in Goodford's grid program. The best-scoring positions may then be used to position site points.

Chemical Matching

The chemical_match feature is used to incorporate information about the chemical complementarity of a ligand orientation into the matching process. As in Kuhl et. al [ 15 ], chemical labels are assigned to site points and ligand atoms. The site point labels are based on the local receptor environment. The ligand atom labels are based on user-adjustable chemical functionality rules. These labeling rules are identified with the chemical_definition_file parameter and reside in an editable file (see chem.defn on page 106 ). A node in a match will produce an unfavorable interaction if the atom and site point components have labels which violate a chemical match rule. The chemical matching rules are identified with the chemical_match_file parameter and reside in an editable file (see chem_match.tbl on page 107 ). If a match will produce unfavorable interactions, then the match is discarded. The speed-up from this technique depends how extensively site points have been labeled and the stringency of the match rules, but an improvement of two-fold or more can be expected.

The process of labeling site points must currently be done by hand. The user should load the site points and the receptor coordinates into a graphic program and study the local environment of each point. Developing an automated method to perform this task is still an active area of research. Labeled site points may be input as either a SPH format or SYBYL MOL2 format coordinate file. Check sphgen on page 84 for file format specifications. An example is shown in Table 5. To store labeled site points in a MOL2 file, select an atom type for each label of interest. Then edit the chem.defn file to include the selected atom types. Site point definitions can be distinguished from ligand atom definitions by explicitly requiring that no bonded atoms can be attached (ie. followed by [*]). The example chem.defn on page 106 includes a site point definition as the last definition for each label. Using the convention in that example file, site points should be labeled as follows: hydrophobic, "C.3"; donor, "N.4"; acceptor, "O.2"; polar, "F".

Table 5. Example of chemical labels in SPH format

DOCK 3.5 receptor_spheres

color hydrophobic 1

color acceptor 2

color donor 3

cluster 1 number of spheres in cluster 49

7 2.34500 36.49000 16.93500 1.500 0 0 1

8 -0.05200 42.29900 14.18800 1.500 0 0 1

9 -0.67000 41.20600 11.59800 1.500 0 0 1

17 -6.00000 34.00000 17.00000 1.500 0 0 3

18 -5.00000 29.00000 22.00000 1.500 0 1 3

...

Caveats on Chemical Matching

It can take a significant amount of effort to chemically label a large site and to verfiy that the docking results are what were expected. If you use this chemical matching, plan to spend some time in preparation and validation BEFORE running an entire database of molecules.

In concert with Degeneracy Checking , chemical matching is able to discard matches that not only contain bad interactions but that can be expanded to include other bad interactions. Although this helps reduce the bad interactions in an orientation, it can only do so within the constraints of the distance_tolerance , which can be rather tight. In addition, the number of interactions monitored in a match is usually small (3-5) compared to the total number of ligand atoms, so the preponderance of atoms may be in less than favorable environments. Therefore, chemical matching does not guarantee that all resulting orientations are chemically complementary, but instead that the resulting orientations are enriched in complementarity.

It must be pointed out that the ultimate arbiter of which orientations of a ligand are saved is actually the scoring function. If the scoring function is unable to discriminate what the user feels are bad chemical interactions, then any improvement with chemical matching will probably be obscured. In addition, if score optimization is used, then the orientation will be perturbed from the original chemically-matched position to a new score-preferred positions.

Critical Points

The critical_points feature is used to focus the orientation search into a subsite of the receptor active site [ 4 , 23 ]. For example, identifying molecules that interact with the catalytic residues might be of chief interest. Any number of points may be identified as critical, and any number of groupings of these points may be identified. Consequently, several receptor subsites may be targeted simultaneously. If a particular cluster of critical points is big enough to interact with more than one ligand atom, then use the multiple_points parameter. An alternative to using critical points is to discard all site points that are some distance away from the subsite of interest, while retaining enough site points to define unique ligand orientations.

This feature can be highly effective at reducing matching by five-fold or more. It is particularly useful to also assign chemical labels to the critical points to further focus sampling.