Kuntz Home / DOCK Home / DOCKumentation Contents / Beginner's Guide

Preparing for a DOCK Run

Models & Surfaces / Spheres / Scoring / Ligands / Labeling

Working with Macromolecular Models and Generating the Molecular Surface

Removing Ligands and Crystallographic Waters

The macromolecular structure you are working with may include a ligand, and crystal structures usually contain water molecules and sometimes ions which were found on the surface of the protein. These molecules are usually not included in the structure used to generate the molecular surface. To prepare for molecular surface generation, make a copy of the protein coordinate file. If there is a ligand present, remove it by deleting all of its records (they often start with HETATM in Brookhaven Protein Data Bank format files) from your copy of the file. (Note - sometimes, as in the case of a cofactor or catalytic metal ion, it may make chemical sense to keep a ligand in the PDB file.) Whether or not crystallographic waters and ions should be preserved when generating surfaces for use by sphgen is a matter of some debate. In structures of complexes, water molecules and ions are often found in the protein binding pocket along with the ligand(s). However, ligands can displace waters and ions, and the volume of a receptor site will be explored more completely if the waters and ions are removed, so if you don't have particular reasons for preserving any of the water molecules or ions in the crystal, it is probably best to remove all of them. Waters are usually located near the end of the PDB file and are often HETATM records with HOH or WAT residue types. Ions are often near the waters in the PDB file.

Please note that the PDB file used for generating the molecular surface should not include hydrogen atoms. NMR structures will include hydrogens; delete the hydrogens from a copy of each structure and use that copy in MS.

Creating the Molecular Surface

The dot surface which will be used to produce spheres is generated by the program MS (available from QCPE). When setting up for docking, it is acceptable just to generate a surface for the site of interest and adjacent regions (see documentation for get_near_res and autoMS); this will also reduce the computer time used by sphgen. The surface points must have associated normals.

What about hydrogens in molecular surfaces?

If you use the QCPE version of MS, you must run reformatms to convert the surface to the format used by sphgen (both formats are described in the reference manual section on reformatms). reformatms is interactive and requires the surface and the PDB file used to generate the surface.

How do I get QCPE MS to recognize standard PDB files?

Users of the UCSF MidasPlus package may use the output from the dms program directly as input for sphgen.

Representing the Site with Spheres

sphgen

sphgen uses the points of the molecular surface and their associated normals to determine spheres to fill the site. It then reduces the number of spheres to one per atom and groups them into clusters. You can inspect these clusters and regroup the spheres if necessary.

Do I need to use sphgen?

Creating `INSPH`

The parameters which tell sphgen exactly how to create the surface are placed in a file called INSPH, which must be present when sphgen is run. The contents of this file are described in the reference manual. To create it, make a file with each variable on a separate line. Most of the parameter values given in the reference manual should work fine. You will need to replace msfil with the name of your surface file and outfil with the desired name of your output file.

Running sphgen

sphgen must use the directory containing INSPH as its working directory; this means that it should be started while you are in that directory. The sphgen output file contains clusters of spheres which have been selected and grouped by sphgen; the clusters are listed in order of decreasing size. The last cluster, numbered 0, contains all the spheres produced. It may be used with the program cluster to make new sphere clusters if the original clustered output doesn't describe the site well.

Looking at the Output

Once you've generated spheres, you should look at the sphere clusters using a molecule display program. showsphere may be used to generate a PDB-like file of sphere centers for display. It can also generate a surface for the sphere cluster (in the MS format used by sphgen). showsphere is interactive. You will be prompted for the name of the cluster file (that is, the sphgen output), the number of the cluster, and names for the desired output file. In the PDB-like file of sphere center coordinates, each sphere is a separate residue and the spheres are separated by TER cards.

Getting a Good Sphere Cluster

Displaying the protein and sphere centers together should tell you how each sphere cluster is related to the site you are trying to represent. Examine sphere clusters until you find one that occupies the region (or regions) into which you want to dock ligands. Clusters of 50 or fewer spheres are best; larger numbers of spheres will cause DOCK to use more computer time. It is generally unwise to try docking with more than 100 spheres, although you may be able to use more if your database is small or you are using chemical matching. Initial sphere clusters are sometimes spider-like structures which include the area of interest but also branch into other regions. If your cluster has too many spheres, branches out, or is unsatisfactory for some other reason, you can correct the problem.

The easiest way to fix a sphere cluster is to use graphics to identify spheres that you don't really need, then remove them. When you've found the unnecessary ones, go back to the original sphere cluster file (i.e. the one from sphgen) and delete the corresponding lines - the residue number in the PDB-like file of centers is the first number in the line in the sphere file. Remember to change the number of spheres listed on the line with the cluster number to reflect the deletions.

If your cluster is large - more than about 100 spheres - and deleting spheres by hand looks too tedious, you can use cluster to break it into smaller clusters. cluster is described in the reference manual; read the documentation completely before you try it. Start with the parameters given and experiment with the values; small changes can make a big difference in the result. Be aware that if the best cluster found is the same as the original input cluster, the program will appear not to have done anything.

The two methods just described may be combined if the best cluster output is not quite right. More spheres can be deleted from the new cluster, or, if the new cluster is too small, additional spheres may be chosen graphically. A cluster containing all the desired spheres may then be created by editing the sphgen output.

If nothing else works, it is possible to run cluster on all possible spheres rather than a preselected group. Use the analytical clustering algorithm in cluster on cluster 0, and experiment until you get what you want. Flagging spheres in important regions of the site may help.

Creating Scoring Grids

Before running DOCK, you must choose which scoring option you will use and generate the scoring grid or grids required. Current scoring options are contact only, contact and DelPhi, contact and force field, and force field only. All the options involving contact scoring require the grid generated by distmap. DelPhi scoring requires a potential map from DelPhi, and force field scoring uses the result from chemgrid.

Contact Scoring

distmap

Positioning the Grid

distmap

The interactive program showbox is recommended for defining the size and shape of the grid. One way to do this is to make a box which encloses your sphere cluster and add an extra margin which encloses all the receptor atoms which might contact the ligands. The box generated is in PDB format; it should be viewed along with the receptor and possibly regenerated until it seems appropriate.

Creating `INDIST` and Running distmap

distmap

INDIST

reference manual

pdbnam

scoren

showbox

distmap

sphgen

INDIST

The Electrostatic Potential Map (Delphi)

DelPhi is not supplied with DOCK; it is available from Barry Honig, Columbia University, or from Biosym. If you have the program and wish to use it for electrostatic scoring, follow the directions supplied with it to generate a potential map for your protein. DOCK currently only reads the potential map (phi file) format used by version 3 of DelPhi.

What's the scoop on Delphi scoring with DOCK?

Force-field scoring

chemgrid

Positioning the Grid

showbox

chemgrid

distmap

Preparing the Receptor File

na.table.ambcrgna.table.ambcrg    nucleic acid parameters
prot.table.ambcrg.ambHprot.table  protein parameters; Amber hydrogen names
prot.table.ambcrg.pdbHprot.table  protein parameters; PDB hydrogen names

AMBER

Creating `INCHEM` and Running chemgrid

chemgrid

INCHEM

reference manual

recfil

table

vdwfil

Inbox

showbox

Grddiv values between 0.2 and 0.5 are recommended; fine grids are preferred. Any combination of grid point spacing and x, y, and z dimensions can be used as long as the number of points does not exceed the maximum array size specified in the program. If this happens, you will get a message when you run chemgrid. To resolve the problem, the grid spacing may be increased or the box dimensions may be decreased. DOCK may also be recompiled with larger array sizes if your computer's memory allows; edit the file chemgrid.h appropriately.

A dielectric function of 4.0r or 4.5r and a cutoff of 10.0 ngstroms or more are appropriate in most cases. (This dielectric corresponds to an estype of 1 and esfact of 4.0 or 4.5.) If a constant dielectric is selected, an "infinite" cutoff (one large enough to include the whole receptor) should be used. We tend to use close contact limits of 2.0-2.5 and 2.5-3.0 ngstroms for receptor polar and nonpolar atoms, respectively. The close contact limits do not affect the force field scores that orientations receive, but they determine which orientations are thrown out when the force field scoring only option is used in DOCK. The resolution of the protein structure should be kept in mind when setting these limits; if the receptor atom positions are not very well-defined, it is best not to constrain the results too strongly based on these positions.

Grdfil will be the beginning of the output file names.

Although we have recommended certain values, we would like you to try whatever you feel is appropriate, based on your knowledge of the parameters and their significance.

Parameterization takes place in the first few seconds of a chemgrid run, and produces two files: PDBPARM, which lists the coordinates of each receptor atom together with the associated parameters, and OUTPARM, which reports each atom not found in the parameter file and the apparent net charge of the receptor. A long list of hydrogen atoms is normal, since there are no parameters for hydrogens not attached to polar atoms. It is important to check the net charge since a strange value can alert you to parameterization problems. You may use the script called charge, supplied with DOCK, to estimate the expected charge. A value which differs from that expected may mean that something is wrong with the hydrogens or the residue names. Another common cause of a non-integral net charge is incompletely modeled residues.

Three output files, named grdfil.bmp, grdfil.esp, and grdfil.vdw, make up the actual grid.

What's all this stuff in my OUTPARM file?

Preparing Ligand Molecules

Before you can run DOCK, you must make sure that the ligands you intend to use are in a format which DOCK can read. Consult the table below for a list of acceptable formats. SINGLE mode reads files in all the listed formats directly; SEARCH mode requires DOCK 3.x database formats. Version numbers for DOCK databases identify the database format only; all are acceptable input for DOCK 3.5 SEARCH mode, but SINGLE mode can use only the DOCK 3.0 and DOCK 3.5 formats. Please note that ligands with charges must also have hydrogens.

SINGLE Mode File Formats

Format	Charges?	Compatible Scoring Options
standard pdb	no	contact
extended pdb	yes	all
SYBYL ASCII (mol2)	yes	all
SYBYL ASCII (mol2)	no	contact
DOCK database (v3.0)	yes	all
DOCK database (v3.5)	yes	all

SEARCH Mode File Formats

Format	Charges?	Conversion Program	Compatible Scoring Options
SYBYL MOL2	no	mol2db	contact
SYBYL MOL2	yes	mol2db	all
DOCK database (v1.1 or v2.0)	no	-	contact
DOCK database (v3.0)	yes	-	all
DOCK database (v3.5)	yes	-	all
CSD	no	mkdb	contact

mkdb is an interactive program which reads molecules in CSD format and writes them in DOCK 2.0 format. DOCK does not consider the hydrogens during its calculations, but they should be included in the mkdb output if you want them on the oriented ligands written out by DOCK.

mol2db is also interactive; it reads SYBYL ASCII (MOL2) format. If you wish to use contact scoring only and your ligands have no charges, choose version 2.0 output. Otherwise, your ligands should have hydrogens and charges and you should create a version 3.5 database. mol2db may be used to label ligand atoms for chemical matching, described below.

DOCK databases may include any number of molecules. They consist of lists of molecule records; you may edit them after they are created to delete molecules or to separate some molecules into smaller databases. If you create smaller files from a DOCK 3.5 database, be sure to include the header information in each file. (The first line should read DOCK 3.5 ligand_atoms, and the chemical matching information should follow it starting on the second line. You can copy the header from the beginning of the original database.)

Labeling Atoms and Spheres for Chemical Matching

In some cases, you may wish to label receptor spheres and ligand atoms for chemical matching. When chemical matching is used, labeled (sometimes called colored) spheres are permitted to match labeled atoms only if their labels correspond as specified in the INDOCK file (for example, positively charged spheres might be allowed to match only negatively charged ligand atoms). Labeled spheres may still match unlabeled atoms and unlabeled spheres may match labeled atoms. Labeling reduces the number of orientations which must be scored somewhat, but the overall DOCK run will not be significantly shorter unless many of the spheres and atoms are labeled. Chemical matching using labels assigned by the programs colsph and mol2db is most useful in conjunction with contact scoring or for highly charged or polar sites.

colsph may be used to label spheres according to the electrostatic potential at their location in space. You will need a sphere cluster file and either a potential map from DelPhi or the force field grid files created by chemgrid. Create a file specifying the labels and the range of receptor potential or of the electrostatic potential component of the force field grid which corresponds to each label. On each line, list one label and the upper and lower bounds of its electrostatic potential range. The potential will be evaluated at each sphere and the sphere will be assigned a label if the potential is in the appropriate range. Run colsph interactively. The program will prompt you for the map type and name, the name of the range file you just created, and the input and output sphere file names. (colsph will color spheres in all clusters in the sphere file.)

To color ligand atoms, begin with a list of molecules in SYBYL ASCII (MOL2) format. Run mol2db interactively (see the reference manual). At the coloring prompt, enter one label at a time with its corresponding SYBYL atom type. To indicate that a given atom type should be (or not be) in a particular functional group, you may include on the same line a second atom type and the number of bonds which separate it from the first atom. Atoms of the first type will then be labeled only if they are the specified number of bonds from an atom of the second type. If the number of bonds is negative, atoms of the first type which are the specified number of bonds from the second type will not be labeled (this is useful for excluding some atoms which meet a previous labeling criterion). Enter a blank line to end label entry.

In your INDOCK file, you will need to specify which of the labels you have assigned to spheres should match which of the ligand atom labels.

Tell me more about chemical matching...

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)

Preparing for a DOCK Run

Models & Surfaces / Spheres / Scoring / Ligands / Labeling

Working with Macromolecular Models and Generating the Molecular Surface

Removing Ligands and Crystallographic Waters

Creating the Molecular Surface

Representing the Site with Spheres

sphgen

Creating INSPH

Running sphgen

Looking at the Output

Getting a Good Sphere Cluster

Creating Scoring Grids

Contact Scoring

distmap

Positioning the Grid

Creating INDIST and Running distmap

The Electrostatic Potential Map (Delphi)

Force-field scoring

chemgrid

Positioning the Grid

Preparing the Receptor File

Creating INCHEM and Running chemgrid

Preparing Ligand Molecules

Labeling Atoms and Spheres for Chemical Matching

Creating `INSPH`

Creating `INDIST` and Running distmap

Creating `INCHEM` and Running chemgrid