Kuntz Home / DOCK Home / DOCKumentation Contents / Beginner's Guide
prev section prev toc up next next section

Preparing for a DOCK Run

Models & Surfaces / Spheres / Scoring / Ligands / Labeling

Working with Macromolecular Models and Generating the Molecular Surface

Removing Ligands and Crystallographic Waters

The macromolecular structure you are working with may include a ligand, and crystal structures usually contain water molecules and sometimes ions which were found on the surface of the protein. These molecules are usually not included in the structure used to generate the molecular surface. To prepare for molecular surface generation, make a copy of the protein coordinate file. If there is a ligand present, remove it by deleting all of its records (they often start with HETATM in Brookhaven Protein Data Bank format files) from your copy of the file. (Note - sometimes, as in the case of a cofactor or catalytic metal ion, it may make chemical sense to keep a ligand in the PDB file.) Whether or not crystallographic waters and ions should be preserved when generating surfaces for use by
sphgen is a matter of some debate. In structures of complexes, water molecules and ions are often found in the protein binding pocket along with the ligand(s). However, ligands can displace waters and ions, and the volume of a receptor site will be explored more completely if the waters and ions are removed, so if you don't have particular reasons for preserving any of the water molecules or ions in the crystal, it is probably best to remove all of them. Waters are usually located near the end of the PDB file and are often HETATM records with HOH or WAT residue types. Ions are often near the waters in the PDB file.

Please note that the PDB file used for generating the molecular surface should not include hydrogen atoms. NMR structures will include hydrogens; delete the hydrogens from a copy of each structure and use that copy in MS.

Creating the Molecular Surface

The dot surface which will be used to produce spheres is generated by the program MS (available from QCPE). When setting up for docking, it is acceptable just to generate a surface for the site of interest and adjacent regions (see documentation for get_near_res and autoMS); this will also reduce the computer time used by sphgen. The surface points must have associated normals.

What about hydrogens in molecular surfaces?

If you use the QCPE version of MS, you must run reformatms to convert the surface to the format used by sphgen (both formats are described in the reference manual section on reformatms). reformatms is interactive and requires the surface and the PDB file used to generate the surface.

How do I get QCPE MS to recognize standard PDB files?

Users of the UCSF MidasPlus package may use the output from the dms program directly as input for sphgen.


Representing the Site with Spheres

sphgen

sphgen uses the points of the molecular surface and their associated normals to determine spheres to fill the site. It then reduces the number of spheres to one per atom and groups them into clusters. You can inspect these clusters and regroup the spheres if necessary.

Do I need to use sphgen?

Creating INSPH

The parameters which tell sphgen exactly how to create the surface are placed in a file called INSPH, which must be present when sphgen is run. The contents of this file are described in the reference manual. To create it, make a file with each variable on a separate line. Most of the parameter values given in the reference manual should work fine. You will need to replace msfil with the name of your surface file and outfil with the desired name of your output file.

Running sphgen

sphgen must use the directory containing INSPH as its working directory; this means that it should be started while you are in that directory. The sphgen output file contains clusters of spheres which have been selected and grouped by sphgen; the clusters are listed in order of decreasing size. The last cluster, numbered 0, contains all the spheres produced. It may be used with the program cluster to make new sphere clusters if the original clustered output doesn't describe the site well.

Looking at the Output

Once you've generated spheres, you should look at the sphere clusters using a molecule display program. showsphere may be used to generate a PDB-like file of sphere centers for display. It can also generate a surface for the sphere cluster (in the MS format used by sphgen). showsphere is interactive. You will be prompted for the name of the cluster file (that is, the sphgen output), the number of the cluster, and names for the desired output file. In the PDB-like file of sphere center coordinates, each sphere is a separate residue and the spheres are separated by TER cards.

Getting a Good Sphere Cluster

Displaying the protein and sphere centers together should tell you how each sphere cluster is related to the site you are trying to represent. Examine sphere clusters until you find one that occupies the region (or regions) into which you want to dock ligands. Clusters of 50 or fewer spheres are best; larger numbers of spheres will cause DOCK to use more computer time. It is generally unwise to try docking with more than 100 spheres, although you may be able to use more if your database is small or you are using chemical matching. Initial sphere clusters are sometimes spider-like structures which include the area of interest but also branch into other regions. If your cluster has too many spheres, branches out, or is unsatisfactory for some other reason, you can correct the problem.

The easiest way to fix a sphere cluster is to use graphics to identify spheres that you don't really need, then remove them. When you've found the unnecessary ones, go back to the original sphere cluster file (i.e. the one from sphgen) and delete the corresponding lines - the residue number in the PDB-like file of centers is the first number in the line in the sphere file. Remember to change the number of spheres listed on the line with the cluster number to reflect the deletions.

If your cluster is large - more than about 100 spheres - and deleting spheres by hand looks too tedious, you can use cluster to break it into smaller clusters. cluster is described in the reference manual; read the documentation completely before you try it. Start with the parameters given and experiment with the values; small changes can make a big difference in the result. Be aware that if the best cluster found is the same as the original input cluster, the program will appear not to have done anything.

The two methods just described may be combined if the best cluster output is not quite right. More spheres can be deleted from the new cluster, or, if the new cluster is too small, additional spheres may be chosen graphically. A cluster containing all the desired spheres may then be created by editing the sphgen output.

If nothing else works, it is possible to run cluster on all possible spheres rather than a preselected group. Use the analytical clustering algorithm in cluster on cluster 0, and experiment until you get what you want. Flagging spheres in important regions of the site may help.


Creating Scoring Grids

Before running DOCK, you must choose which scoring option you will use and generate the scoring grid or grids required. Current scoring options are contact only, contact and DelPhi, contact and force field, and force field only. All the options involving contact scoring require the grid generated by
distmap. DelPhi scoring requires a potential map from DelPhi, and force field scoring uses the result from chemgrid.

Contact Scoring

The Electrostatic Potential Map (Delphi)

Force-field scoring


Preparing Ligand Molecules

Before you can run DOCK, you must make sure that the ligands you intend to use are in a format which DOCK can read. Consult the table below for a list of acceptable formats. SINGLE mode reads files in all the listed formats directly; SEARCH mode requires DOCK 3.x database formats. Version numbers for DOCK databases identify the database format only; all are acceptable input for DOCK 3.5 SEARCH mode, but SINGLE mode can use only the DOCK 3.0 and DOCK 3.5 formats. Please note that ligands with charges must also have hydrogens.

SINGLE Mode File Formats

FormatCharges?Compatible Scoring Options
standard pdbnocontact
extended pdbyesall
SYBYL ASCII (mol2)yesall
SYBYL ASCII (mol2)nocontact
DOCK database (v3.0)yesall
DOCK database (v3.5)yesall

SEARCH Mode File Formats

FormatCharges?Conversion ProgramCompatible Scoring Options
SYBYL MOL2nomol2dbcontact
SYBYL MOL2yesmol2dball
DOCK database (v1.1 or v2.0)no-contact
DOCK database (v3.0)yes-all
DOCK database (v3.5)yes-all
CSDnomkdbcontact

mkdb is an interactive program which reads molecules in CSD format and writes them in DOCK 2.0 format. DOCK does not consider the hydrogens during its calculations, but they should be included in the mkdb output if you want them on the oriented ligands written out by DOCK.

mol2db is also interactive; it reads SYBYL ASCII (MOL2) format. If you wish to use contact scoring only and your ligands have no charges, choose version 2.0 output. Otherwise, your ligands should have hydrogens and charges and you should create a version 3.5 database. mol2db may be used to label ligand atoms for chemical matching, described below.

DOCK databases may include any number of molecules. They consist of lists of molecule records; you may edit them after they are created to delete molecules or to separate some molecules into smaller databases. If you create smaller files from a DOCK 3.5 database, be sure to include the header information in each file. (The first line should read DOCK 3.5 ligand_atoms, and the chemical matching information should follow it starting on the second line. You can copy the header from the beginning of the original database.)


Labeling Atoms and Spheres for Chemical Matching

In some cases, you may wish to label receptor spheres and ligand atoms for chemical matching. When chemical matching is used, labeled (sometimes called colored) spheres are permitted to match labeled atoms only if their labels correspond as specified in the INDOCK file (for example, positively charged spheres might be allowed to match only negatively charged ligand atoms). Labeled spheres may still match unlabeled atoms and unlabeled spheres may match labeled atoms. Labeling reduces the number of orientations which must be scored somewhat, but the overall DOCK run will not be significantly shorter unless many of the spheres and atoms are labeled. Chemical matching using labels assigned by the programs colsph and mol2db is most useful in conjunction with contact scoring or for highly charged or polar sites.

colsph may be used to label spheres according to the electrostatic potential at their location in space. You will need a sphere cluster file and either a potential map from DelPhi or the force field grid files created by chemgrid. Create a file specifying the labels and the range of receptor potential or of the electrostatic potential component of the force field grid which corresponds to each label. On each line, list one label and the upper and lower bounds of its electrostatic potential range. The potential will be evaluated at each sphere and the sphere will be assigned a label if the potential is in the appropriate range. Run colsph interactively. The program will prompt you for the map type and name, the name of the range file you just created, and the input and output sphere file names. (colsph will color spheres in all clusters in the sphere file.)

To color ligand atoms, begin with a list of molecules in SYBYL ASCII (MOL2) format. Run mol2db interactively (see the reference manual). At the coloring prompt, enter one label at a time with its corresponding SYBYL atom type. To indicate that a given atom type should be (or not be) in a particular functional group, you may include on the same line a second atom type and the number of bonds which separate it from the first atom. Atoms of the first type will then be labeled only if they are the specified number of bonds from an atom of the second type. If the number of bonds is negative, atoms of the first type which are the specified number of bonds from the second type will not be labeled (this is useful for excluding some atoms which meet a previous labeling criterion). Enter a blank line to end label entry.

In your INDOCK file, you will need to specify which of the labels you have assigned to spheres should match which of the ligand atom labels.

Tell me more about chemical matching...


prev section prev toc up next next section

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)