De Novo Protein Design: Fully Automated Sequence Selection


Bassil I. Dahiyat,daggerStephen L. Mayo *
 

Design Target

- the bba motif typified by the zinc finger DNA binding module.

- the second zinc finger module of the DNA binding protein Zif268was selected as design template.

- the amino acids considered at the core positions :

Ala, Val, Leu, Ile, Phe, Tyr, and Trp.

- the amino acids considered at the surface positions :

Ala, Ser, Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg.

- the combined core and surface amino acidsets (16 amino acids) were considered at the boundary positions.

- Two of the residue positions (9 and 27) have f angles greater than 0 degree and are set to Gly by the sequence selection algorithm to minimize backbone strain.




 



Fig. 1. Sequence of FSD-1 aligned with the second zinc finger of Zif268. The bar at the top of the figure shows the residue position classifications: the solid bar indicates the single core position,the hatched bars indicate the seven boundary positions and the open bars indicate the 20 surface positions. The alignment matches positions of FSD-1 to the corresponding backbone template positionsof Zif268. Of the six identical positions (21 percent) between FSD-1 and Zif268, four are buried (Ile7, Phe12, Leu18, and Ile22). The zinc binding residues of Zif268 are boxed. Representativenonoptimal sequence solutions determined by means of a Monte Carlosimulated annealing protocol are shown with their rank. Verticallines indicate identity with FSD-1. The symbols at the bottomof the figure show the degree of sequence conservation for eachresidue position computed across the top 1000 sequences: filledcircles indicate more than 99 percent conservation, half-filledcircles indicate conservation between 90 and 99 percent, opencircles indicate conservation between 50 and 90 percent, and theabsence of a symbol indicates less than 50% conservation. The consensus sequence determined by choosing the amino acid with the highest occurrence at each position is identical to the sequenceof FSD-1.


Fig. 2. Comparison of Zif268 and computed FSD-1 structures. (A) Stereoview of the second zinc finger module of Zif268 showing its buried residues and zinc binding site. (B) Stereoview of the computed orientations of buried side chains in FSD-1. For clarity, only side chains from residues 3, 5, 8, 12, 18, 21, 22, and 25 are shown. [View Larger Version of this Image (39K GIF file)]


 
 
 

Alignment of the sequences for FSD-1 and Zif268 indicates that only 6 of the 28 residues (21 %) are identicaland only 11 (39 %) are similar. Four of the identities are in the buried cluster, which is consistent with the expectation that buried residues are more conserved than solvent-exposed residuesfor a given motif. A BLAST search of the FSD-1 sequence  did not reveal any zinc fingerprotein sequences.



 

Experimental validation.


Fig. 3. Circular dichroism (CD) measurements of FSD-1. (A) Far-UV CD spectrum of FSD-1 at 1 degree C. The minima at 220 and 207 nmindicate a folded structure. (B) Thermal unfolding of FSD-1 monitored by CD. The melting curve has an inflection point at 39 degree. To illustrate the cooperativity of the thermal transition, the melting curve was fit to a two-state model [and the derivative of the fit is shown (inset)]. The melting temperature determined from this fit is 42 degree. [View Larger Version of this Image (17K GIF file)]



 

The solution structure of FSD-1 was solved by means of homonuclear 2D 1H NMR spectroscopy. NMR spectra were well dispersed, indicating an ordered protein structure and easing resonance assignments.


Fig. 4. NOE contacts for FSD-1. (A) Sequential and short-range NOE connectivities. (B) Representative NOE contacts from aromatic to methyl protons. Several long-range NOEs from Ile7 and Phe12 to the helix help define the fold of the protein. The starred peak has an ambiguous F1 assignment, Ile22 Hd1 or Leu18 Hd2. [View Larger Version of this Image (16K GIF file)]



Fig. 5. Solution structure of FSD-1. Stereoview showing the best-fit superposition of the 41 converged simulated annealing structuresfrom X-PLOR.. The amino terminus is at the lower left of the figure and the carboxyl terminus is at the upper right of the figure. The structure consists of two antiparallel strands from positions 3 to 6 (back strand) and 9 to 12 (front strand), with a hairpin turn at residues 7 and 8, followed by a helix from positions 15 to 26. The termini, residues 1, 2, 27, and 28 have very few NOE restraints and are disordered. [View Larger Version of this Image (33K GIF file)]



 
 
Table 1. NMR structure determination: distance restraints, structural statistics, and atomic root-mean-square (rms) deviations.<SA>are the 41 simulated annealing structures, SA is the average structurebefore energy minimization, (SA)r is the restrained energy minimizedaverage structure, and SD is the standard deviation.

Distance restraints 

Intraresidue 97 
Sequential 83 
Short range (|i - j| = 2 to 5 residues) 59
Long range (|i - j| > 5 residues) 35 
Hydrogen bond 10 
Total 284
Structural statistics 
rms deviations <SA> ± SD (SA)r
Distance restraints (Å) 0.043  ±  0.003 0.038 
Idealized geometry
Bonds (Å) 0.0041  ±  0.0002 0.0037 
Angles (degrees) 0.67  ±  0.02 0.65
Impropers (degrees) 0.53  ±  0.05 0.51 
Atomic rms deviations (Å)*
<SA> versus SA ± SD <SA> versus  (SA)r ± SD
Backbone 0.54  ±  0.15 0.69  ±  0.16 
Backbone + nonpolar side chainsdagger 0.99  ±  0.17 1.16  ±  0.18 
Heavy atoms 1.43  ±  0.20 1.90  ±  0.29

* Atomic rms deviations are for residues 3 to 26, inclusive. Residues 1, 2, 27, and 28 were disordered [phipsi, angular order parameters (34) < 0.78] and had only sequential and |i - j| = 2 NOEs. 
dagger Nonpolar side chains are from residues Tyr3, Ala5, Ile7, Phe12, Leu18, Phe21, Ile22, and Phe25, which constitute the core of the protein.

Compared the average structure of FSD-1 and the design target.

  • The overall backbone rmsd  is 1.98 A for residues 3 to 26 and only 0.98 A for residues 8 to 26.
  • The largest difference between FSD-1 and the target structure occurs from residues 4 to 7, with a displacement of 3.0 to 3.5 A of the backbone atompositions of strand 1.


  • Fig. 6. Comparison of the FSD-1 structure (blue) and the design target (red). Stereoview of the best-fit superposition of the restrainedenergy minimized average NMR structure of FSD-1 and the backboneof Zif268. Residues 3 to 26 are shown. [View Larger Version of this Image (22K GIF file)]


    Table 2. Comparison of the FSD-1 experimentally determined structure and the design target structure. The FSD-1 structure is the restrained energy minimized average from the NMR structure determination.The design target structure is the second DNA binding module of the zinc finger Zif268 (9

    Atomic rms deviations (A)

    Backbone, residues 3 to 26 1.98 
    Backbone, residues 8 to 26 0.98 
    Super-secondary structure parameters*
    FSD-1 Design target
    h (Å)  9.9 8.9
    theta (degrees) 14.2 16.5
    Omega (degrees) 13.1 13.5

    * htheta, and Omega are calculated as described (36, 37). h is the distance between the centroid of the helix Calpha coordinates (residues 15 to 26) and the least-squares plane fit to the Calpha coordinates of the sheet (residues 3 to 12); theta is the angle of inclination of the principal moment of the helix Calpha atoms with the plane ofthe sheet; Omega is the angle between the projection of the principal moment of the helix onto the sheet and the projection of the average least-squares fit line to the strand Calpha coordinates (residues 3 to 6 and 9 to 12) onto the sheet.