DOCK FAQ: Current Dockers

Kuntz Home / DOCK Home / FAQ Contents

A DOCKumentation supplement

rev. 31 August 1995

Current DOCK Clients

Do I really need QCPE's version of MS? Where can I get it?

You need some version of MS if you plan on using SPHGEN to generate site points for docking (the standard approach; however,

see below

Science

221

J. Appl. Cryst.

MidasPlus display software from the UCSF Computer Graphics Lab

Tel. (812) 855-5539
Fax. (812) 855-4784
Eml. qcpe@ucs.indiana.edu

Take me to the QCPE gopher.

Do I need to use SPHGEN? What are some alternative methods of generating site points?

It is important to realize that DOCK uses only the centers of the spheres in the docking process; no other information about the spheres is used (e.g. radius). These sphere centers are more generally referred to as site points, as they are in fact only 3D coordinates of points within the target site. Site points need not be generated by SPHGEN, although we recommend this method as it does an excellent job of capturing shape information about the site of interest. Any method which results in points throughout the site could be used. For example, you could use the coordinates of a known ligand; you could use points along a solvent accessible surface of the receptor; you could use a grid of points in the site if you had years of CPU time... Alternatives that take into account the chemistry of the site are also possible, for example, Goodford's GRID program. Interaction "hotspots" generated by this program can be used as surrogate sphere centers (we in fact provide a few tools to help interface with the GRID program). Be imaginative - you needn't use SPHGEN.

What's all this stuff in my CHEMGRID OUTPARM file?

It is normal for many warnings to appear in OUTPARM. Non-polar hydrogens are NOT recognized by CHEMGRID, as a united-atom parameter set is used in which non-polar hydrogens are combined with their heavy-atom parent while polar hydrogens are retained. Thus, most of the errors you see in OUTPARM will likely be for parameters not found for nonpolar hydrogens - totally normal. For example:
```
	WARNING--parameters not found for
	ATOM     11  HA  THR     1 
```
Non-integral charge at the bottom of the OUTPARM file. This should be a concern to be looked into. To determine what the expected total charge on your receptor is, run the "charge" script (distributed with DOCK 3.5). To isolate which residues are in error, locate all entries in the OUTPARM file with the string "CHARGED RESIDUE" (e.g. use grep). Only typically charged residues (i.e. Lys, His, Arg, Asp, Glu) and termini should appear with either +1 or -1 charges. There are many possible causes for a non-integral total charge.
- Missing atoms Often, experimentally observed structures contain flexible residues which can not be modelled completely. This results in either missing atoms or entire missing sidechains. Check for such occurrences and fix them if possible. Commonly, missing sidechains are on the surface, so modelling in a conformation of the sidechain is not difficult, nor will it generally affect the electrostatics of active site interactions significantly, unless the missing sidechains are near the active site!
- Dual sidechain conformations. Often, experimentally observed structures contain residues which appear to take on more than one distinct conformation. These are frequently both entered into the PDB file, but naturally, only one should be present for docking studies. It is up to you to remove all but one of the sidechains for each such occurrence.
- Atom name problems. Check to see if the atom names for residues in error match the atom names expected in the parameter file you are using with chemgrid (e.g. prot.table.amcrg.pdbH). Frequently a minor change in atom names (often changing the justification of the atom name) can resolve the problem. For PDB files written out from SYBYL, it is advised that the program reformH be run (distributed with DOCK3.5) to modify the atom names so they are suitable for chemgrid operation.
- Histidine residues. The protonation state of histidines is sometimes reflected by the PDB residue name (e.g. HIP, HIE, HIS), but sometimes NOT. Chemgrid will do it's best to discern which state is present, but often the actual state dictated by the presence of hydrogens is not reflected in the residue name. The default behavior for chemgrid (I think) is to assume protonated histidines and the charge set accompanying these, so an uncharged histidine may come out with a non-integral charge (for example, histidines as output from Sybyl seem to come out at +0.514). It may be that you do not want some or all histidines protonated - this is your choice.

What about using the Cambridge Structural Database with DOCK?

The following are a few thoughts on philosophies behind using the Cambridge Structural Database (CSD) for docking. First, it can be misleading to believe that just because these structures have been crystallographically determined that they are more likely to represent bioactive conformations. Often, the biologically observed bound conformation is indeed not the global minimum energy conformation. It remains to be proven that using the CSD as opposed to a database of rule-based conformations (e.g. by CONCORD, such as the ACD, CMC or MDDR) is any more effective. Second, we have found the getting a hold of CSD compounds can be tremendously difficult, with a very a high attrition rate at the stage of obtaining compounds for assay. This should be a concern when using databases of compounds for purposes of "easy" access to biological characterization. We don't mean to deter you from using the CSD, only to highlight two common issues.

Bugs in sortDOCKout??

Yeah, yeah, I know. Because of the many and varied output types and formats of DOCK, detecting exactly how the output was generated is difficult. So...this is a complex way of saying that this program is likely not to be infallible... If you find problems, please check with the Kuntz group for the latest version.

How in the world do I use chemical matching (a.k.a. coloring)?!

There are three fundamental operations in using this feature:

coming up with labels and deriving matching rules,
assigning labels to ligand atoms, and
assigning labels to receptor site points.

These points are addressed in some detail here:

Coming up with labels and matching rules. Labels can be anything you want them to be. The standard procedure is to have them embody ideas of chemical complementarity. For example, a set of labels might be hydrophobic, acceptor, donor, polar, plus, and minus (representing respectively, hydrophobic points, hydrogen bond acceptors, hydrogen bond donors, points which can both accept and donate hydrogen bonds, positively charged points, and negatively charged points). Simple matching rules might then be hydrophobic-hydrophobic, acceptor-donor, acceptor-polar, donor-polar, plus-minus. These would be "allowed" combinations amongst ligand atoms and receptor site points.

Some general points: labels can be anything and mean anything you want them to (you might have a set of labels "purple, green, red, yellow, blue", or a set "hot, warm, cold", whatever these might mean to you). The rules you set up define which labels can match with which other labels. The matching rules can use the same label more than once (for example, cold-hot and cold-warm). However, each ligand atom or receptor site point can only have one label assigned to it. By "matching labels", we mean that come docking time when one atom is supposed to be placed upon a particular site point, the program assesses whether the corresponding labels can match together depending upon the rules you have set forth. Not all atoms or site points need be assigned a label: unlabeled points match everything.

Assigning labels to ligand atoms. This is done with the mol2db program, which takes a MOL2 ligand database and a set of label definitions (see the ./examples/3dfr/colcrit/keymtx file for a fairly complex example). The output is a dock 3.5 database file containing ligand which have their atoms labeled according to the definitions you provided. If our programs do not suit your set-up, feel free to write your own labeling algorithms and programs, as long as the output format is DOCK 3.5 database format. If you are only working with one ligand, it may be most flexible to do this by hand!

Assigning labels to receptor site points. This is somewhat more complex, and consequently there are several alternatives.

You can do this by hand. As there are typically only a few tens of site points, this is not a particularly tedious task. Manual labeling requires that you know some chemical information about the vicinity of the site point. The program showprobe may be useful for determining what types of atoms fare well in which regions of the target site. Myriads of other features in commercial packages should be of use here too.
You can use colsph program distributed with DOCK 3.5. This program requires that you are familiar with the electrostatic potential (ESP) throughout the site. The program showesp will help in this regard. Running the showesp program will generate a PDB file based on chemgrid files which contain the electrostatic potential at each grid point entered into the temperature factor column of the PDB file. Some graphics packages (e.g. MidasPlus) are capable of coloring atoms by their temperature factor, so this method provides a way of visualizing the potential throughout the site. Just scanning the values in the temperature factor column will give you an idea of the range the ESP takes on in your system. For running colsph, you will need a range file, which contains the names of the colors and the corresponding range of values they embody. An example (I'm just making this up) might be:
negative -100 -3 hydrophobic -3 3 positive 3 100
where the numbers represent ranges for the electrostatic potential that would correspond to the listed color, i.e. hydrophobic regions might be in areas of low absolute ESP. Assigning labels according to electrostatic potential can be difficult and may require some experimentation to get the desired results.
use Goodford's GRID program. This will allow you locate hotspots for a particular type of probe and assign labels to nearby site points accordingly. We distribute a few utilites to help interface to GRID, but these programs may require some customization.

Final comments: chemical matching is a very flexible tool and can address a lot of needs, but may be daunting to implement. Keep with it and you will probably be pleased with its operation. Also, do not think that chemical matching must related to chemical complementarity. One advanced implementation might be towards finding mechanism-based inhibitors. For example, one could label certain reactive functional groups in ligands appropriately, and certain receptive centers in the target site for a reaction to occur similarly. By setting up matching of these labels, one could attempt to locate ligands which placed reactive groups adjacent to, say, nucleophiles on the receptor. Be creative!

Are there "standard" bin sizes?
No! Every system is different, and you should experiment to see what works best. In general, larger bin sizes and/or larger overlaps increases sampling, effectively having dock "try harder". There are no standardized parameters which work for everything, sorry. Be careful when increasing the parameters, as small changes can have exponential effects on run-times. For cases when you have an experimentally determined binding mode for a ligand (e.g. substrate or inhibitor), make sure you can reproduce this with your chosen parameters before attempting a database screen. As a general rule of thumb, generate around 5,000-10,000 matches per ligand when using force-field score minimization, and at least 20,000 matches without optimization.

Curator: Malin Young, mmyoung@polonius.ucsf.edu