dock version 4.0 has a new orientation search algorithm, or matching algorithm, which is more robust than before (see Ewing and Kuntz [ 6 ]). An orientation search is requested with the orient_ligand parameter. The published search technique has been further extended so that the amount of orientation sampling can be controlled in two ways:
There are a number of sophisticated options available to tailor the orientation search. These options include:
Multiple orientations may be written out for each molecule using the write_orientations parameter, otherwise only the best orientation is recorded. A ranked list of the orientations may be written using the rank_orientations parameter. Otherwise, all orientations passing a score cutoff are written out. The score cutoff is specified with the contact_maximum parameter and so on for each type of scoring. If write_orientations is requested without scoring, then all orientations are written.
With automated_matching , dock performs the same amount of orientation searching on each molecule. If the match_receptor_sites parameter is set, then Manual Matching is used as a black box engine for the orientation search (otherwise a Random Search is performed). The only sampling parameter needed is the maximum_orientations parameter, which is the number of desired orientations which survive the bump filter. Matches are formed in order of the smallest distance error first, so that the highest quality orientations are guaranteed to come sooner rather than later. This method of control is incredibly easy. It is most appropriate when docking a single molecule. It should not be used for database docking, since manual matching performs better because it biases the amount of sampling depending on the size and shape of the ligand. In addition, if the user wishes to use advanced matching features, like Chemical Matching and Critical Points , then manual matching must be used.
If the match_receptor_sites parameter is set but not the automated_matching parameter, then manual matching is performed. It is controlled by the match parameters listed in Table 4. The matching parameters provide an intuitive way to control sampling. When multiple molecules are docked, matching will bias sampling towards molecules with more internal distance similarity with the receptor site points. The additional chemical and critical matching constraints provide a way to prune matching and further bias sampling towards more interesting molecules.
The random_search option is intended for advanced users. If match_receptor_sites is also set then random matching is performed, in which ligand centers and receptor sites are randomly matched regardless of internal distances. Otherwise, a random transformation search is performed, in which ligands are randomly rotated and translated within the rectangular box enclosing all the site points. Both methods could be employed when the user is concerned about the quality of the site point positions, or would simply like to try a richer set of generated orientations.
The random_search option is useful for exploring issues relating to site point construction. As discussed in Ewing and Kuntz [ 6 ], both random matching and random transformation were useful control algorithms to test the effectiveness of distance-based matching. The relative performance of random matching with respect to random transformation indicates how well the site points map out the relevant volume of the active site. The relative performance of distance-based matching with respect to random matching indicates how well individual positions of each site point correspond to good ligand atom positions. By using both of these search methods, an advanced user may quantify the quality of site points constructed by alternative methods to sphgen .
The random transformation search may in fact be used to construct site points to supplement those from sphgen . Using this search, the user may probe a site with different molecular probes much like the atomic probes used in Goodford's grid program. The best-scoring positions may then be used to position site points.
Degeneracy checking is a method implemented during matching to increase the diversity of the resulting orientations. It is selected with the check_degeneracy parameter. It is not an available feature if automated_matching has been selected. The method of Gschwend and Kuntz [ 11 ] implemented in dock version 3.5 has been updated to be easier to use and more robust. Degenerate matches are now defined as matches which are a subset of a larger match. In the nomenclature of graph theory, the surviving matches are maximally connected and are true cliques.
For degeneracy checking to work, nodes_maximum must be greater than nodes_minimum so that subsets can occur. In general, just set nodes_maximum arbitrarily high (15 or so). At most a two-fold reduction in matches is achieved using this feature.
When a match contains four or more nodes, the chirality of the ligand and receptor points involved in the match is checked. Half of the time, the ligand and receptor points have opposite chirality. See Ewing and Kuntz [ 6 ] for more discussion. Normally these improper matches are discarded, but they can be rescued with the reflect_ligand option, which allows the chirality of the ligand to be reversed by using its mirror image. This is useful for molecules which are either achiral or are available as a racemate.
The chemical_match feature is used to incorporate information about the chemical complementarity of a ligand orientation into the matching process. As in Kuhl et. al [ 15 ], chemical labels are assigned to site points and ligand atoms. The site point labels are based on the local receptor environment. The ligand atom labels are based on user-adjustable chemical functionality rules. These labeling rules are identified with the chemical_definition_file parameter and reside in an editable file (see chem.defn on page 106 ). A node in a match will produce an unfavorable interaction if the atom and site point components have labels which violate a chemical match rule. The chemical matching rules are identified with the chemical_match_file parameter and reside in an editable file (see chem_match.tbl on page 107 ). If a match will produce unfavorable interactions, then the match is discarded. The speed-up from this technique depends how extensively site points have been labeled and the stringency of the match rules, but an improvement of two-fold or more can be expected.
The process of labeling site points must currently be done by hand. The user should load the site points and the receptor coordinates into a graphic program and study the local environment of each point. Developing an automated method to perform this task is still an active area of research. Labeled site points may be input as either a SPH format or SYBYL MOL2 format coordinate file. Check sphgen on page 84 for file format specifications. An example is shown in Table 5. To store labeled site points in a MOL2 file, select an atom type for each label of interest. Then edit the chem.defn file to include the selected atom types. Site point definitions can be distinguished from ligand atom definitions by explicitly requiring that no bonded atoms can be attached (ie. followed by [*]). The example chem.defn on page 106 includes a site point definition as the last definition for each label. Using the convention in that example file, site points should be labeled as follows: hydrophobic, "C.3"; donor, "N.4"; acceptor, "O.2"; polar, "F".
It can take a significant amount of effort to chemically label a large site and to verfiy that the docking results are what were expected. If you use this chemical matching, plan to spend some time in preparation and validation BEFORE running an entire database of molecules.
In concert with Degeneracy Checking , chemical matching is able to discard matches that not only contain bad interactions but that can be expanded to include other bad interactions. Although this helps reduce the bad interactions in an orientation, it can only do so within the constraints of the distance_tolerance , which can be rather tight. In addition, the number of interactions monitored in a match is usually small (3-5) compared to the total number of ligand atoms, so the preponderance of atoms may be in less than favorable environments. Therefore, chemical matching does not guarantee that all resulting orientations are chemically complementary, but instead that the resulting orientations are enriched in complementarity.
It must be pointed out that the ultimate arbiter of which orientations of a ligand are saved is actually the scoring function. If the scoring function is unable to discriminate what the user feels are bad chemical interactions, then any improvement with chemical matching will probably be obscured. In addition, if score optimization is used, then the orientation will be perturbed from the original chemically-matched position to a new score-preferred positions.
The critical_points feature is used to focus the orientation search into a subsite of the receptor active site [ 4 , 23 ]. For example, identifying molecules that interact with the catalytic residues might be of chief interest. Any number of points may be identified as critical, and any number of groupings of these points may be identified. Consequently, several receptor subsites may be targeted simultaneously. If a particular cluster of critical points is big enough to interact with more than one ligand atom, then use the multiple_points parameter. An alternative to using critical points is to discard all site points that are some distance away from the subsite of interest, while retaining enough site points to define unique ligand orientations.
This feature can be highly effective at reducing matching by five-fold or more. It is particularly useful to also assign chemical labels to the critical points to further focus sampling.