HETATM
in Brookhaven Protein Data Bank format files) from your
copy of the file. (Note - sometimes, as in the case of a cofactor or catalytic
metal ion, it may make chemical sense to keep a ligand in the PDB file.)
Whether or not crystallographic waters and ions should be preserved when
generating surfaces for use by sphgen is a matter of some debate. In
structures of complexes, water molecules and ions are often found in the
protein binding pocket along with the ligand(s). However, ligands can displace
waters and ions, and the volume of a receptor site will be explored more
completely if the waters and ions are removed, so if you don't have
particular reasons for preserving any of the water molecules or ions in the
crystal, it is probably best to remove all of them. Waters are usually located
near the end of the PDB file and are often HETATM
records with HOH or WAT
residue types. Ions are often near the waters in the PDB file. Please note that the PDB file used for generating the molecular surface should not include hydrogen atoms. NMR structures will include hydrogens; delete the hydrogens from a copy of each structure and use that copy in MS.
What about hydrogens in molecular surfaces?
If you use the QCPE version of MS, you must run reformatms to convert the surface to the format used by sphgen (both formats are described in the reference manual section on reformatms). reformatms is interactive and requires the surface and the PDB file used to generate the surface.
How do I get QCPE MS to recognize standard PDB files?
Users of the UCSF MidasPlus package may use the output from the dms program directly as input for sphgen.
Representing the Site with Spheres
sphgen
sphgen uses the points of the molecular surface and their associated
normals to determine spheres to fill the site. It then reduces the number of
spheres to one per atom and groups them into clusters. You can inspect these
clusters and regroup the spheres if necessary.
INSPH
INSPH
, which must be present when sphgen is run.
The contents of this file are described in the reference manual. To create it,
make a file with each variable on a separate line. Most of the parameter
values given in the reference manual should work fine. You will need to
replace msfil with the name of your surface file and outfil with
the desired name of your output file.
INSPH
as its working
directory; this means that it should be started while you are in that
directory. The sphgen output file contains clusters of spheres which
have been selected and grouped by sphgen; the clusters are listed in
order of decreasing size. The last cluster, numbered 0, contains all the
spheres produced. It may be used with the program cluster to make new
sphere clusters if the original clustered output doesn't describe the site
well.
The easiest way to fix a sphere cluster is to use graphics to identify spheres that you don't really need, then remove them. When you've found the unnecessary ones, go back to the original sphere cluster file (i.e. the one from sphgen) and delete the corresponding lines - the residue number in the PDB-like file of centers is the first number in the line in the sphere file. Remember to change the number of spheres listed on the line with the cluster number to reflect the deletions.
If your cluster is large - more than about 100 spheres - and deleting spheres by hand looks too tedious, you can use cluster to break it into smaller clusters. cluster is described in the reference manual; read the documentation completely before you try it. Start with the parameters given and experiment with the values; small changes can make a big difference in the result. Be aware that if the best cluster found is the same as the original input cluster, the program will appear not to have done anything.
The two methods just described may be combined if the best cluster output is not quite right. More spheres can be deleted from the new cluster, or, if the new cluster is too small, additional spheres may be chosen graphically. A cluster containing all the desired spheres may then be created by editing the sphgen output.
If nothing else works, it is possible to run cluster on all possible spheres rather than a preselected group. Use the analytical clustering algorithm in cluster on cluster 0, and experiment until you get what you want. Flagging spheres in important regions of the site may help.
Creating Scoring Grids
Before running DOCK, you must choose which scoring option you will use and
generate the scoring grid or grids required. Current scoring options are
contact only, contact and DelPhi, contact and force field, and force field
only. All the options involving contact scoring require the grid generated by
distmap. DelPhi scoring requires a potential map from DelPhi, and force
field scoring uses the result from chemgrid.
The interactive program showbox is recommended for defining the size and shape of the grid. One way to do this is to make a box which encloses your sphere cluster and add an extra margin which encloses all the receptor atoms which might contact the ligands. The box generated is in PDB format; it should be viewed along with the receptor and possibly regenerated until it seems appropriate.
INDIST
and Running distmapINDIST
, which is described
in the reference manual. You will need to change pdbnam and
scoren, the input and output file names, but the rest of the parameters
given in the example are reasonable starting values. Be sure to include the
name of the box file from showbox on the last line. distmap, like
sphgen and all the programs that use similar input files, must be
started while in the directory where INDIST
is located.
DelPhi is not supplied with DOCK; it is available from Barry Honig, Columbia University, or from Biosym. If you have the program and wish to use it for electrostatic scoring, follow the directions supplied with it to generate a potential map for your protein. DOCK currently only reads the potential map (phi file) format used by version 3 of DelPhi.
What's the scoop on Delphi scoring with DOCK?
na.table.ambcrgna.table.ambcrg nucleic acid parameters prot.table.ambcrg.ambHprot.table protein parameters; Amber hydrogen names prot.table.ambcrg.pdbHprot.table protein parameters; PDB hydrogen namesAll three contain AMBER united-atom parameters. The only explicit hydrogens included are those bonded to polar atoms (anything except carbon). The protein files differ only in the hydrogen-naming conventions used; the heavy atom names are PDB standard in both. In your receptor file, the atom names should match the names in the parameter file you will be using, and hydrogens bonded to polar atoms should be present. It is all right to have all the hydrogens and even lone pairs present since any atom not found in the parameter file receives zero charge and volume. Special attention should be given to the names of atoms at the termini and the residue names for histidine and cysteine. The different protonation states of histidines correspond to different residue names: HIP for positively charged (hydrogens on both nitrogens), HID for neutral with the delta nitrogen protonated, and HIE for neutral with the epsilon nitrogen protonated. CYS refers to a cysteine with a free sulfhydryl group; CYX refers to a cysteine involved in a disulfide bond (a half-cystine). Note that some structures in the PDB use CYS in disulfides; these should be edited to CYX.
INCHEM
and Running chemgridINCHEM
, which is described
in the reference manual. The input values should be placed in the file as
shown in the example. You will need to replace recfil with the name of
your receptor file. Replace table and vdwfil with the locations
of your chosen parameter table and vdw.parms.amb on your system. Inbox
should be replaced with the name of the showbox output file.Grddiv values between 0.2 and 0.5 are recommended; fine grids are preferred. Any combination of grid point spacing and x, y, and z dimensions can be used as long as the number of points does not exceed the maximum array size specified in the program. If this happens, you will get a message when you run chemgrid. To resolve the problem, the grid spacing may be increased or the box dimensions may be decreased. DOCK may also be recompiled with larger array sizes if your computer's memory allows; edit the file chemgrid.h appropriately.
A dielectric function of 4.0r or 4.5r and a cutoff of 10.0 ngstroms or more are appropriate in most cases. (This dielectric corresponds to an estype of 1 and esfact of 4.0 or 4.5.) If a constant dielectric is selected, an "infinite" cutoff (one large enough to include the whole receptor) should be used. We tend to use close contact limits of 2.0-2.5 and 2.5-3.0 ngstroms for receptor polar and nonpolar atoms, respectively. The close contact limits do not affect the force field scores that orientations receive, but they determine which orientations are thrown out when the force field scoring only option is used in DOCK. The resolution of the protein structure should be kept in mind when setting these limits; if the receptor atom positions are not very well-defined, it is best not to constrain the results too strongly based on these positions.
Grdfil will be the beginning of the output file names.
Although we have recommended certain values, we would like you to try whatever you feel is appropriate, based on your knowledge of the parameters and their significance.
Parameterization takes place in the first few seconds of a chemgrid run, and produces two files: PDBPARM, which lists the coordinates of each receptor atom together with the associated parameters, and OUTPARM, which reports each atom not found in the parameter file and the apparent net charge of the receptor. A long list of hydrogen atoms is normal, since there are no parameters for hydrogens not attached to polar atoms. It is important to check the net charge since a strange value can alert you to parameterization problems. You may use the script called charge, supplied with DOCK, to estimate the expected charge. A value which differs from that expected may mean that something is wrong with the hydrogens or the residue names. Another common cause of a non-integral net charge is incompletely modeled residues.
Three output files, named grdfil.bmp
, grdfil.esp
, and
grdfil.vdw
, make up the actual grid.
What's all this stuff in my OUTPARM file?
Preparing Ligand Molecules
Before you can run DOCK, you must make sure that the ligands you intend to use
are in a format which DOCK can read. Consult the table below for a list of
acceptable formats. SINGLE mode reads files in all the listed formats
directly; SEARCH mode requires DOCK 3.x database formats. Version numbers for
DOCK databases identify the database format only; all are acceptable input for
DOCK 3.5 SEARCH mode, but SINGLE mode can use only the DOCK 3.0 and DOCK 3.5
formats. Please note that ligands with charges must also have hydrogens.
Format Charges? Compatible Scoring Options
standard pdb no contact extended pdb yes all SYBYL ASCII (mol2) yes all SYBYL ASCII (mol2) no contact DOCK database (v3.0) yes all DOCK database (v3.5) yes all
Format Charges? Conversion Program Compatible Scoring Options
SYBYL MOL2 no mol2db contact SYBYL MOL2 yes mol2db all DOCK database (v1.1 or v2.0) no - contact DOCK database (v3.0) yes - all DOCK database (v3.5) yes - all CSD no mkdb contact
mkdb is an interactive program which reads molecules in CSD format and
writes them in DOCK 2.0 format. DOCK does not consider the hydrogens during
its calculations, but they should be included in the mkdb output if you
want them on the oriented ligands written out by DOCK.
mol2db is also interactive; it reads SYBYL ASCII (MOL2) format. If you wish to use contact scoring only and your ligands have no charges, choose version 2.0 output. Otherwise, your ligands should have hydrogens and charges and you should create a version 3.5 database. mol2db may be used to label ligand atoms for chemical matching, described below.
DOCK databases may include any number of molecules. They consist of lists of
molecule records; you may edit them after they are created to delete molecules
or to separate some molecules into smaller databases. If you create smaller
files from a DOCK 3.5 database, be sure to include the header information in
each file. (The first line should read DOCK 3.5 ligand_atoms
, and the chemical
matching information should follow it starting on the second line. You can copy
the header from the beginning of the original database.)
Labeling Atoms and Spheres for Chemical Matching
In some cases, you may wish to label receptor spheres and ligand atoms for
chemical matching. When chemical matching is used, labeled (sometimes called
colored) spheres are permitted to match labeled atoms only if their labels
correspond as specified in the INDOCK
file (for example, positively charged
spheres might be allowed to match only negatively charged ligand atoms).
Labeled spheres may still match unlabeled atoms and unlabeled spheres may match
labeled atoms. Labeling reduces the number of orientations which must be scored
somewhat, but the overall DOCK run will not be significantly shorter unless
many of the spheres and atoms are labeled. Chemical matching using labels
assigned by the programs colsph and mol2db is most useful in
conjunction with contact scoring or for highly charged or polar sites.
colsph may be used to label spheres according to the electrostatic potential at their location in space. You will need a sphere cluster file and either a potential map from DelPhi or the force field grid files created by chemgrid. Create a file specifying the labels and the range of receptor potential or of the electrostatic potential component of the force field grid which corresponds to each label. On each line, list one label and the upper and lower bounds of its electrostatic potential range. The potential will be evaluated at each sphere and the sphere will be assigned a label if the potential is in the appropriate range. Run colsph interactively. The program will prompt you for the map type and name, the name of the range file you just created, and the input and output sphere file names. (colsph will color spheres in all clusters in the sphere file.)
To color ligand atoms, begin with a list of molecules in SYBYL ASCII (MOL2) format. Run mol2db interactively (see the reference manual). At the coloring prompt, enter one label at a time with its corresponding SYBYL atom type. To indicate that a given atom type should be (or not be) in a particular functional group, you may include on the same line a second atom type and the number of bonds which separate it from the first atom. Atoms of the first type will then be labeled only if they are the specified number of bonds from an atom of the second type. If the number of bonds is negative, atoms of the first type which are the specified number of bonds from the second type will not be labeled (this is useful for excluding some atoms which meet a previous labeling criterion). Enter a blank line to end label entry.
In your INDOCK
file, you will need to specify which of the labels you have
assigned to spheres should match which of the ligand atom labels.
Tell me more about chemical matching...