The macromolecular structure you are working with may include a ligand, and crystal structures usually contain water molecules and sometimes ions which were found on the surface of the protein. These molecules are usually not included as input to ms . To prepare for molecular surface generation, make a copy of the protein coordinate file. If there is a ligand present, remove it by deleting all of its records (they often start with hetatm in Brookhaven Protein Data Bank format files) from your copy of the file. (Note - sometimes, as in the case of a cofactor or catalytic metal ion, it may make chemical sense to keep a ligand in the pdb file.) Whether or not crystallographic waters and ions should be preserved when generating surfaces for use by sphgen is a matter of some debate. In structures of complexes, water molecules and ions are often found in the protein binding pocket along with the ligand(s). However, ligands can displace waters and ions, and the volume of a receptor site will be explored more completely if the waters and ions are removed, so if you don't have particular reasons for preserving any of the water molecules or ions in the crystal, it is probably best to remove all of them. Waters are usually located near the end of the pdb file and are often hetatm records with hoh or wat residue types. Ions are often near the waters in the pdb file.
Please note that the pdb file used for generating the molecular surface should not include hydrogen atoms. NMR structures will include hydrogens; delete the hydrogens from a copy of each structure and use that copy in ms .
The dot surface which will be used to produce spheres is generated by the program ms , available from Quantum Chemistry Program Exchange (QCPE) . When setting up for docking, it is acceptable just to generate surface for the site of interest and adjacent regions (see documentation for get_near_res and autoMS ); this will also reduce the computer time used by sphgen. Note: sphgen requires that the surface points must have associated normals.
If you use the QCPE version of ms, you must run reformatms to convert the surface to the format used by sphgen (both formats are described in the reference manual section on reformatms). reformatms is interactive and requires the surface and the pdb file used to generate the surface.
Users of the UCSF MidasPlus package may use the output from the dms program directly as input for sphgen.
We typically use sphgen to construct shape-based site points, but you may use any other program to construct site points. With the use of other programs you may include considerations of chemical complementarity in your site points. A common alternative to sphgen is the Goodford's grid program (Peter Goodford).
sphgen uses the points of the molecular surface and their associated normals to determine spheres to fill the site. It then reduces the number of spheres to one per atom and groups them into clusters. You can inspect these clusters and regroup the spheres if necessary.
The parameters which tell sphgen exactly how to create the surface are placed in a file called INSPH, which must be present when sphgen is run. The contents of this file are described in the reference manual. To create it, make a file with each variable on a separate line. Most of the parameter values given in the reference manual should work fine. You will need to replace msfil with the name of your surface file and outfil with the desired name of your output file.
sphgen must use the directory containing INSPH as its working directory; this means that it should be started while you are in that directory. The sphgen output file contains clusters of spheres which have been selected and grouped by sphgen; the clusters are listed in order of decreasing size. The last cluster, numbered 0, contains all the spheres produced. It may be used with the program cluster to make new sphere clusters if the original clustered output doesn't describe the site well.
Once you've generated spheres, you should look at the sphere clusters using a molecule display program. showsphere may be used to generate a pdb-like file of sphere centers for display. It can also generate a surface for the sphere cluster (in the ms format used by sphgen). showsphere is interactive. You will be prompted for the name of the cluster file (that is, the sphgen output), the number of the cluster, and names for the desired output file. In the pdb-like file of sphere center coordinates, each sphere is a separate residue and the spheres are separated by ter cards.
Displaying the protein and sphere centers together should tell you how each sphere cluster is related to the site you are trying to represent. Examine sphere clusters until you find one that occupies the region into which you want to dock ligands. Clusters of 50 or fewer spheres are best; larger numbers of spheres will cause dock to use more computer time. It is generally unwise to try docking with more than 100 spheres, although you may be able to use more if your database is small or you are using chemical matching. Initial sphere clusters are sometimes spider-like structures which include the area of interest but also branch into other regions. If your cluster has too many spheres, branches out, or is unsatisfactory for some other reason, you can correct the problem.
The easiest way to fix a sphere cluster is to use graphics to identify spheres that you don't really need, then remove them. When you've found the unnecessary ones, go back to the original sphere cluster file (i.e. the one from sphgen) and delete the corresponding lines - the residue number in the pdb-like file of centers is the first number in the line in the sphere file. Remember to change the number of spheres listed on the line with the cluster number to reflect the deletions.
If your cluster is large -- more than about 100 spheres -- and deleting spheres by hand looks too tedious, you can use cluster to break it into smaller clusters. cluster is described in the reference manual; read the documentation completely before you try it. Start with the parameters given and experiment with the values; small changes can make a big difference in the result. Be aware that if the best cluster found is the same as the original input cluster, the program will appear not to have done anything.
The two methods just described may be combined if the best cluster output is not quite right. More spheres can be deleted from the new cluster, or, if the new cluster is too small, additional spheres may be added graphically. A cluster containing all the desired spheres may then be created by editing the sphgen output.
If nothing else works, it is possible to run cluster on all possible spheres rather than a preselected group. Use the analytical clustering algorithm in cluster on cluster 0, and experiment until you get what you want. Flagging spheres in important regions of the site may help.