Kuntz Home / DOCK Home / DOCKumentation Contents / DOCK
prev section prev toc up next next section

sphgen

Irwin D. Kuntz
mods. by Renee DesJarlais, Brian Shoichet
Overview / Input / Output

Overview

sphgen generates sets of overlapping spheres to describe the shape of a molecule or molecular surface (
Kuntz et al., 1982; DesJarlais et al.,1988). For receptors, a negative image of the surface invaginations is created; for a ligand, the program creates a positive image of the entire molecule. Spheres are constructed using the molecular surface described by Richards (1977) calculated with the program MS (Connolly, 1983a, 1983b). Each sphere touches the molecular surface at two points and has its radius along the surface normal of one of the points. For the receptor, each sphere center is "outside" the surface, and lies in the direction of a surface normal vector. For a ligand, each sphere center is "inside" the surface, and lies in the direction of a reversed surface normal vector. Spheres are calculated over the entire surface, producing approximately one sphere per surface point. This very dense representation is then filtered to keep only the largest sphere associated with each receptor surface atom. The filtered set is then clustered on the basis of radial overlap between the spheres using a single linkage algorithm. This creates a negative image of the receptor surface, where each invagination is characterized by a set of overlapping spheres. These sets, or "clusters," are sorted according to numbers of constituent spheres, and written out in order of descending size. The largest cluster is typically the ligand binding site of the receptor molecule. The program showsphere writes out sphere center coordinates in PDB format and may be helpful for visualizing the clusters.

Do I need to use sphgen to generate site points?.

Input

The input file names and parameters are read from a file called INSPH, which should not contain any blank lines:

msfil
is the name of the file containing the molecular surface calculated using the program MS and must include surface normals. sphgen expects the Fortran format

        (A3, I5, X, A4, X, 2F8.3, F9.3, X, A3, 7X, 3F7.3).
This format is quite different from the QCPE molecular surface file format. For more details, see the documentation for reformatms and autoMS.

surftp
indicates whether the spheres should lie "outside" the surface, as for a receptor (R or r), or "inside" the surface, as for a ligand (L or l).

dentag
allows the user to specify that a subset of the surface points are to be used in calculating the spheres. This "density tag" may be set to the values 1, 4, 9, or 0, indicating that only points having 1, 4, 9, or 0, respectively, in column 42 of the molecular surface file will be used; alternatively, the value X or x indicates that all points will be used. It is recommended that X be used unless the system is particularly large (>75,000 surface points). It is most efficient to use a partial molecular surface (if it is known in advance which region is of interest) as the calculation time scales approximately with the square of the number of points.

dotlim
is used to prevent the generation of large spheres whose points of surface contact are quite close together. Each pair of points i and j are examined as potential sphere-defining points. dotlim is the lower limit on the dot product of the vector from i to j and the vector from the sphere center to j for points defining a sphere. dotlim is typically set to 0.0, although possible values range from -1.0 to 1.0; negative values, however, may be useful for flat sites such as the major groove of B-form DNA.

radmax
is the maximum sphere radius in Ångstroms. Spheres with radii larger than radmax are discarded. This is important for the clustering done within sphgen, where clusters are defined as sets of overlapping spheres. Decreasing radmax decreases the cluster sizes by eliminating large "connector" spheres. In general, values from 4.0 to 5.0 Ångstroms are used; values of 0.0 or less default to 5.0 Ångstroms.

radmin
is the minimum sphere radius in Ångstroms. Spheres with radii smaller than radmin are discarded. This should be unnecessary because the molecular surface should not produce spheres of radius less than the probe radius. However, some versions of MS occasionally place surface points very close together. This can result in sphgen generating very small spheres which are not useful in characterizing the shape of the active site. It is generally advisable to keep spheres with radii equal to the probe radius (typically 1.4 or 1.5 Ångstroms). Note that radmin can be set to 0.0 to allow the use of the "extra radius" surface developed by David Barry (unpublished results).

outfil
is the name of the file to which the clustered spheres will be written.

Output

Some informative messages are written to a file called OUTSPH. This includes the parameters and files used in the calculation. The spheres themselves are written to outfil. They are arranged in clusters with the cluster having the largest number of spheres appearing first. The sphere cluster file consists of a header followed by a series of sphere clusters. The header is the line

DOCK 3.5 receptor_spheres
followed by a color table. The color table contains color names (format A30) each on a separate line. As sphgen produces no colors, the color table is simply absent. The sphere clusters themselves follow, each of which starts with the line

cluster     n   number of spheres in cluster     i
where n is the cluster number for that sphere cluster, and i is the number of spheres in that cluster. Next, all spheres in that cluster are listed in the format

(I5, 3F10.5, F8.3, I5, I2, I3)
where the values correspond to, respectively,

The clusters are listed in numerical order from largest cluster found to the smallest. At the end of the clusters is cluster number 0. This is not an actual sphere cluster, but a list of all of the spheres generated whose radii were larger than the minimum radius, before the filtering heuristics (i.e. allowing only one sphere per atom and using a maximum radius cutoff) and clustering were performed. Cluster 0 may be useful as a starting point for users who want to explore a wider range of possible clusters than is provided by the standard sphgen clustering routine. The program
cluster takes the full sphere description as input, and allows the user to explore different sphere descriptions of the site. This is particularly useful for macromolecule macromolecule docking; it is often inefficient to use spheres that fill the entire volume of the "ligand" macromolecule. In addition, only a portion of a cavity in the "receptor" macromolecule may be of interest for docking purposes. If the standard clustered output from sphgen provides a satisfactory description of the ligand molecule or receptor site, running cluster is not necessary.

The program creates three temporary files: temp1.ms, temp2.sph, and temp3.atc. These are used internally by sphgen. In DOCK 3.0 these files were not removed after the program finished, so that they could be processed by tosph to produce input for cluster, but cluster 0 now satisfies the need for clusterinput, and these files are removed when the program finishes (unless it terminates abnormally). The tosph program is no longer part of DOCK.


prev section prev toc up next next section

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)