Kuntz Home / DOCK Home / DOCKumentation Contents / Appendices
prev section prev toc up next next section

Appendix 1: Ligand File Formats

extended PDB / Sybyl MOL2 / DOCK database 2.1 / DOCK database 3.0 / DOCK database 3.5

Extended PDB format

REMARK  AZASERINE                                          REFCODE#1
ATOM      1  O1  UNK     1       0.551  -0.151   0.382  -0.303   0.000 12
ATOM      2  C2  UNK     1      -0.815   0.040   0.003   0.106   0.000  5
ATOM      3  C3  UNK     1       1.470   0.512  -0.274   0.351   0.000  1
ATOM      4  C4  UNK     1      -1.714  -0.821   0.889   0.044   0.000  5
ATOM      5  C5  UNK     1       2.878   0.350   0.077   0.389   0.000  5
ATOM      6  O6  UNK     1       1.150   1.266  -1.178  -0.256   0.000 11
ATOM      7  C7  UNK     1      -3.156  -0.619   0.489   0.104   0.000  1
ATOM      8  N8  UNK     1      -1.354  -2.236   0.725   0.227   0.000  9
ATOM      9  N9  UNK     1       3.860   1.059  -0.625   0.132   0.000  8
ATOM     10  O10 UNK     1      -4.019  -0.158   1.360  -0.796   0.000 12
ATOM     11  O11 UNK     1      -3.517  -0.885  -0.646  -0.309   0.000 11
ATOM     12  N12 UNK     1       4.667   1.642  -1.201   0.215   0.000  8
ATOM     13  H13 UNK     1      -1.583  -0.531   1.929   0.094   0.000  7
ATOM     14  H14 UNK     1      -0.947  -0.252  -1.049   0.063   0.000  7
ATOM     15  H15 UNK     1      -1.084   1.099   0.126   0.063   0.000  7
ATOM     16  H16 UNK     1       3.056  -0.730  -0.033   0.139   0.000  7
ATOM     17  H17 UNK     1       2.920   0.622   1.142   0.139   0.000  7
ATOM     18  H18 UNK     1      -1.947  -2.804   1.309   0.200   0.000  6
ATOM     19  H19 UNK     1      -0.391  -2.371   0.992   0.200   0.000  6
ATOM     20  H20 UNK     1      -1.476  -2.504  -0.239   0.200   0.000  6
TER
REMARK  MECRYLATE                                          REFCODE#2
ATOM      1  C1  UNK     2       0.881  -0.199  -0.189   0.116   0.000  1
ATOM      2  C2  UNK     2      -0.138   0.559   0.532   0.303   0.000  1
ATOM      3  C3  UNK     2       0.487  -1.210  -1.151   0.105   0.000  1
ATOM      4  C4  UNK     2       2.180   0.040   0.038  -0.075   0.000  1
ATOM      5  O5  UNK     2      -1.407   0.326   0.310  -0.314   0.000 12
ATOM      6  O6  UNK     2       0.194   1.410   1.341  -0.259   0.000 11
ATOM      7  N7  UNK     2       0.171  -2.019  -1.920  -0.191   0.000  8
ATOM      8  C8  UNK     2      -2.368   1.093   1.040   0.046   0.000  5
ATOM      9  H9  UNK     2       2.939  -0.526  -0.500   0.055   0.000  7
ATOM     10  H10 UNK     2       2.476   0.799   0.760   0.055   0.000  7
ATOM     11  H11 UNK     2      -3.383   0.785   0.747   0.053   0.000  7
ATOM     12  H12 UNK     2      -2.231   2.161   0.817   0.053   0.000  7
ATOM     13  H13 UNK     2      -2.231   0.923   2.118   0.053   0.000  7
TER
DOCK version 3.5 output is in this format (although completely standard PDB format may be selected via the DOCKOPT file). The extra columns allow rapid rescoring of the orientations using different contact, Delphi, and/or force field grid files. For ATOM records, the Fortran format is
 	('ATOM', 2X, I5, X, A4, X, A3, 2X, I4, 4X, 5F8.3, I3)
where the fields contain atom number (I5), atom name (A4), residue name (A3), residue number (I4), coordinates (3F8.3), point charge (F8.3), electrostatic potential OR atomic contact score (F8.3), if known, and van der Waals type (I3). This is the same as standard PDB format up through the coordinates fields. The ATOM records for a molecule can be preceded by any number of REMARK records, and are followed by a TER card. The x2pdb utility converts extended PDB format into standard PDB format, viewable by almost any graphics package.

Sybyl MOL2 format

@<TRIPOS>MOLECULE
AZASERINE                                          REFCODE#1
   20    19     1     0     0
SMALL
GASTEIGER

@<TRIPOS>ATOM
      1 O1        0.5509 -0.1511  0.3824 O.3      1 <1>      -0.3033
      2 C2       -0.8146  0.0401  0.0033 C.3      1 <1>      0.1056
      3 C3        1.4701  0.5125 -0.2741 C.2      1 <1>      0.3512
      4 C4       -1.7143 -0.8207  0.8889 C.3      1 <1>      0.0442
      5 C5        2.8780  0.3502  0.0766 C.3      1 <1>      0.3895
      6 O6        1.1496  1.2661 -1.1784 O.2      1 <1>      -0.2563
      7 C7       -3.1563 -0.6188  0.4887 C.2      1 <1>      0.1043
      8 N8       -1.3538 -2.2363  0.7246 N.4      1 <1>      0.2269
      9 N9        3.8604  1.0594 -0.6249 N.1      1 <1>      0.1316
     10 O10      -4.0193 -0.1576  1.3596 O.3      1 <1>      -0.7964
     11 O11      -3.5175 -0.8852 -0.6458 O.2      1 <1>      -0.3094
     12 N12       4.6673  1.6419 -1.2012 N.1      1 <1>      0.2147
     13 H13      -1.5829 -0.5314  1.9295 H        1 <1>      0.0942
     14 H14      -0.9475 -0.2525 -1.0487 H        1 <1>      0.0627
     15 H15      -1.0844  1.0994  0.1262 H        1 <1>      0.0627
     16 H16       3.0565 -0.7296 -0.0334 H        1 <1>      0.1388
     17 H17       2.9197  0.6222  1.1416 H        1 <1>      0.1388
     18 H18      -1.9473 -2.8041  1.3088 H        1 <1>      0.2001
     19 H19      -0.3913 -2.3711  0.9918 H        1 <1>      0.2001
     20 H20      -1.4757 -2.5044 -0.2394 H        1 <1>      0.2001
@<TRIPOS>BOND
     1    1    2 1
     2    1    3 1
     3    2    4 1
     4    3    5 1
     5    3    6 2
     6    4    7 1
     7    4    8 1
     8    5    9 1
     9    7   10 1
    10    7   11 2
    11    9   12 3
    12    4   13 1
    13    2   14 1
    14    2   15 1
    15    5   16 1
    16    5   17 1
    17    8   18 1
    18    8   19 1
    19    8   20 1
@<TRIPOS>SUBSTRUCTURE
     1 ****        1 TEMP              0 ****  ****    0 ROOT


@<TRIPOS>MOLECULE
MECRYLATE                                          REFCODE#2
   13    12     1     0     0
SMALL
GASTEIGER
 
@<TRIPOS>ATOM
      1 C1        0.8809 -0.1989 -0.1892 C.2      1 <1>      0.1160
      2 C2       -0.1375  0.5591  0.5319 C.2      1 <1>      0.3033
      3 C3        0.4867 -1.2099 -1.1510 C.1      1 <1>      0.1046
      4 C4        2.1798  0.0397  0.0378 C.2      1 <1>      -0.0754
      5 O5       -1.4073  0.3258  0.3100 O.3      1 <1>      -0.3143
      6 O6        0.1941  1.4097  1.3412 O.2      1 <1>      -0.2585
      7 N7        0.1714 -2.0187 -1.9204 N.1      1 <1>      -0.1907
      8 C8       -2.3684  1.0930  1.0399 C.3      1 <1>      0.0462
      9 H9        2.9394 -0.5257 -0.5001 H        1 <1>      0.0549
     10 H10       2.4758  0.7990  0.7601 H        1 <1>      0.0549
     11 H11      -3.3828  0.7847  0.7466 H        1 <1>      0.0530
     12 H12      -2.2309  2.1613  0.8165 H        1 <1>      0.0530
     13 H13      -2.2309  0.9231  2.1180 H        1 <1>      0.0530
@<TRIPOS>BOND
     1    1    2 1
     2    1    3 1
     3    1    4 2
     4    2    5 1
     5    2    6 2
     6    3    7 3
     7    5    8 1
     8    4    9 1
     9    4   10 1
    10    8   11 1
    11    8   12 1
    12    8   13 1
@<TRIPOS>SUBSTRUCTURE
     1 ****        1 TEMP              0 ****  ****    0 ROOT
This format is taken as input to
mol2db and, for a single molecule, SINGLE mode DOCK. For these programs, the essential features are: the @<TRIPOS>MOLECULE line; the following line with a name for the molecule in positions 1 to 51 (this can be all spaces) and a 9-character refcode in positions 52 to 60; the first two integers in the next line which specify the number of atoms and the number of bonds, respectively; the @<TRIPOS>ATOM line, followed by the correct number of atom lines; and the @<TRIPOS>BOND line, followed by the correct number of bond lines. Blank or nonblank lines between these three sections are ignored, but blank lines should not occur within the sections. Output directly from SYBYL is naturally also acceptable. Each @<TRIPOS> must start in position 1 of a line, and the atom lines must contain the same number of alphanumeric fields as are shown, although not all the fields must have meaningful content. This is to allow variable spacing between the fields, as can result from an unformatted write, or manual file creation.

From the point of view of SINGLE mode DOCK, an atom line contains: an uninteresting alphanumeric field, the atom name (character), the x, y, and z coordinates (real), the atom type (character), another uninteresting alphanumeric field, the substructure name (character), and the partial charge (real). The substructure name is relatively unimportant. If it is three or more characters long and does not start with "<", it is included in the output as the residue name; otherwise, the residue name UNK (for unknown) is assigned. Bond lines are interpreted as: an uninteresting integer, the number of the first atom in the bond, and the number of the second atom in the bond. Any remaining fields are ignored.

From the point of view of mol2db, an atom line contains: an uninteresting alphanumeric field, another uninteresting alphanumeric field, the coordinates (real), the atom type (character), the substructure identifier (integer), another uninteresting alphanumeric field, and the partial charge (real). The atom and substructure names are not read since atom and residue names for SEARCH mode output are generated later on, by DOCK. Again, bond lines are interpreted as: an uninteresting integer, the number of the first atom in the bond, and the number of the second atom in the bond. Any remaining fields are ignored.


DOCK database format, version 2.1: from mkdb

REFCODE1 12  8
 8 6 6 6 6 8 6 7 7 8 8 7
 4570 2653 1583 3204 2844 1204 5489 3317  927 2305 1983 2090 6897 3154 1278 5169
 4070   23  863 2185 1690 2665  568 1926 7879 3863  576    0 2646 2561  501 1919
  555 8686 4446    0 2436 2273 3131 3071 2551  152 2935 3903 1327 7076 2074 1168
 6939 3426 2343 2072    0 2510 3628  433 2193 2543  300  962
REFCODE2  8  5
 6 6 6 6 8 8 7 6
 4264 1820 1731 3245 2578 2452 3870  809  769 5563 2059 1958 1976 2345 2230 3577
 3429 3261 3554    0    0 1015 3112 2960 6322 1493 1420 5859 2818 2680    0 2804
 2667 1152 4180 2737 1152 2942 4038
This format is described further in the documentation for mkdb.


DOCK database format, version 3.0: from convsyb

N AZASERINE                                          
REFCODE#1 12  8 20  0
12 5 1 5 511 1 9 81211 8 7 7 7 7 7 6 6 6
 -303  106  351   44  390 -256  104  227  132 -552 -552  215   94   63   63 139
  139  200  200  200
 4570 2653 1583 3204 2844 1204 5489 3317  927 2305 1983 2090 6897 3154 1278 5169
 4070   23  863 2185 1690 2665  568 1926 7879 3863  576    0 2646 2561  501 1919
  555 8686 4446    0 2436 2273 3131 3071 2551  152 2935 3903 1327 7076 2074 1168
 6939 3426 2343 2072    0 2510 3628  433 2193 2543  300  962
N MECRYLATE                                          
REFCODE#2  8  5 13  0
 1 1 1 11211 8 5 7 7 7 7 7
  116  303  105  -75 -314 -259 -191   46   55   55   53   53   53
 4264 1820 1731 3245 2578 2452 3870  809  769 5563 2059 1958 1976 2345 2230 3577
 3429 3261 3554    0    0 1015 3112 2960 6322 1493 1420 5859 2818 2680    0 2804
 2667 1152 4180 2737 1152 2942 4038
This format is described further here.


DOCK database format, version 3.5: from mol2db

DOCK 3.5 ligand_atoms
nc                            
cn                            
AZASERINE                                          CMC   614
 20 12  8      0.0000     0
 4570 2653 158312 -3030  0
 3204 2844 1204 5  1060  0
 5489 3317  927 1  3510  0
 2305 1983 2090 5   440  0
 6897 3154 1278 5  3900  2
 5169 4070   2311 -2560  0
  863 2185 1690 1  1040  0
 2665  568 1926 9  2270  0
 7879 3863  576 8  1320  1
    0 2646 256112 -5520  0
  501 1919  55511 -5520  0
 8686 4446    0 8  2150  0
 2436 2273 3131 7   940  0
 3071 2551  152 7   630  0
 2935 3903 1327 7   630  0
 7076 2074 1168 7  1390  0
 6939 3426 2343 7  1390  0
 2072    0 2510 6  2000  0
 3628  433 2193 6  2000  0
 2543  300  962 6  2000  0
MECRYLATE                                          CMC   734
 13  8  5      0.0000     0
 4264 1820 1731 1  1160  0
 3245 2578 2452 1  3030  0
 3870  809  769 1  1050  0
 5563 2059 1958 1  -750  0
 1976 2345 223012 -3140  0
 3577 3429 326111 -2590  0
 3554    0    0 8 -1910  0
 1015 3112 2960 5   460  0
 6322 1493 1420 7   550  0
 5859 2818 2680 7   550  0
    0 2804 2667 7   530  0
 1152 4180 2737 7   530  0
 1152 2942 4038 7   530  0
This format is described further here.


prev section prev toc up next next section

Curator: Daniel Gschwend, gschwend@cgl.ucsf.edu (rev. 1 September 1995)