Not long ago, it seemed inconceivable that proteins could be designed from scratch. Because each protein sequence has an astronomical number of potential conformations, it appeared that only an experimentalist with the evolutionary life span of Mother Nature could design a sequence capable of folding into a single, well-defined three-dimensional structure. But now, on page 82 of this issue, Dahiyat and Mayo (1) describe a new approach that makes de novo protein design as easy as running a computer program. Well almost. . .
The intellectual roots of this new work go back to the early 1980s when protein engineers first thought about designing proteins (2). At that point, the prediction of a protein's three-dimensional structure [HN2], [HN3] from its sequence alone seemed a difficult proposition. However, they opined that the inverse problem--designing an amino acid sequence capable of assuming a desired three-dimensional structure--would be a more tractable problem, because one could "over-engineer" the system to favor the desired folding pattern. Thus, the problem of denovo protein design reduced to two steps: selecting a desired tertiary structure and finding a sequence that would stabilize this fold. Dahiyat and Mayo have now mastered the second step with spectacular success. They have distilled the rules, insights, and paradigms gleaned from two decades of experiments (3) into a single computational algorithm that predicts an optimal sequence for a given fold. Further, when put to the test the algorithm actually predicted a sequence that folded into the desired three-dimensional structure. Thus, the rules of protein folding and computational methods for de novodesign may now be sufficiently defined to allow the engineering of a variety of proteins.
Dahiyat and Mayo's program divides the interactions that stabilize protein structures into three categories: interactions of side chains that are exposed to solvent, of side chains buried in the protein interior, and of parts of the protein that occupy more interfacial positions. Exposed residues contribute to stability, primarily through conformational preferences and weakly attractive, solvent-exposed polar interactions (4). The burial of hydrophobic residues in the well-packed interior of a protein provides an even more powerful driving force for folding. The side chains in the interior of a protein adopt unique conformations, the prediction of which is a large combinatorial problem.
One important simplifying assumption arose from the early work of Jainin et al. (5), who showed that each individual side chain can adopt a limited number of low-energy conformations (named rotamers), [HN4] reducing the number of probable conformers available to a protein. This work was subsequently extended to the design of proteins containing only the most favorable rotamers (6). Although the side chains in natural proteins deviate from ideality in a few cases (complicating the prediction of the structures of natural proteins), these deviations need not be considered in the design of idealized proteins. Thus, various algorithms have been developed to examine all possible hydrophobic residues in all possible rotameric states, to find combinations that efficiently fill the interior of a protein. A complementary approach uses genetic methods to exhaustively search for sequences capable of filling a protein core (7), and this work has been adapted for the de novodesign of proteins(8).
Interfacial residues are also quite important for protein stability (9, 10). They are often amphiphilic (for example, Lys, Arg, and Tyr) and their apolar atoms can cap the hydrophobic core, while their polar groups engage in electrostatic and hydrogen-bonded interactions. [HN5]
Until recently, protein designers have frequently concentrated on quantifying the energetics associated with just one of these three types of interactions (3). However, de novodesign is best approached by simultaneously considering all of the side chains in the protein--unfortunately, a very high-order combinatorial problem. For instance, the volume available to the interior side chains depends on the nature and conformation of the residues at the interfacial positions and vice versa. Dahiyat and Mayo assumed that each of these three features had been adequately quantitated to provide a useful empirical energy function for protein design. Their program combines a number of feaures taken from earlier potential functions and includes a penalty for exposing hydrophobic groups to solvent. Another essential innovation included in their program is an implementation of the Dead-End Elimination theorem, to efficiently search through sequence and side chain rotamer space.
Dahiyat and Mayo's target fold is a zinc finger, a motif with a well-established history in protein structure prediction and design. In an early, prescient paper, Berg correctly inferred that this His2Cys2 Zn-binding motif must feature a b-b-a fold that would position the ligating groups in a tetrahedral array around the bound Zn(II) (11). Favorable metal ion-ligand interactions together with a small apolar core help stabilize the three-dimensional structure of this compact fold. More recently, Imperiali and co-workers have designed a peptide that folded into this motif, even in the absence of metal ions (12). The design included a D-amino acid to stabilize a type II¢ turn, and a large, rigid tricyclic side chain that may help consolidate the hydrophobic core. This work was particularly exciting because, before their studies, it was not expected that sequences as short as 25 residues in length could fold into stable tertiary structures.
Now, Dahiyat and Mayo take these studies one step further through the design of a sequence composed of only natural amino acids that adopts the zinc finger motif. As input to their program, they introduced the coordinates of the backbone atoms from the crystal structure of the second domain of the zinc finger protein Zif268. [HN6], [HN7] The program then evaluated a total of 1062 possible side chain-rotamer combinations to find a sequence capable of stabilizing this fold without a bound metal ion. The resulting protein sequence shares a small hydrophobic core with its predecessor from Zif268. However, in the newly designed protein FSD-1 the core is enlarged through the addition of hydrophobic residues that fill the space vacated by the removal of the metal-binding site (see the figure). This increase in the size of the hydrophobic core together with the enhancements in the propensity for forming the appropriate secondary structure provide an adequate driving force for folding. The designed miniprotein actually folds into the desired structure as assessed by nuclear magnetic resonance spectroscopy, [HN8], [HN9], [HN10] and the observed structure closely resembles the three-dimensional structure of Zif268.
Because of its small size, the protein is marginally stable. A Van't Hoff analysis of the thermal unfolding curve gives a change in the enthalpy (DHvH) of approximately 10 kcal/mol, and indicates that the protein is about 90 to 95% folded at low temperatures (13). The small value DHvH and the lack of strong cooperativity in the unfolding transition are expected for a native-like protein of this very small size (14). Thus, FSD-1 is the smallest protein known to be capable of folding into a unique structure without the thermodynamic assistance of disulfides, metal ions, or other subunits. This important accomplishment illustrates the impressive ability of Dahiyat and Mayo's program to design highly optimized sequences.
This new achievement caps a banner year for denovo protein design. Earlier, Regan (15) answered the challenge of changing a protein's tertiary structure by altering no more than 50% of its sequence. And although Dahiyat and Mayo have demonstrated that the stabilizing metal-binding site is not necessary in their system, Caradonna, Hellinga, and co-workers (16) have made impressive progress in automating the introduction of functional metal-binding sites into the three-dimensional structures of natural proteins. Further, other workers (17) have used less automated approaches to successfully introduce functionally and spectroscopically interesting metal-binding sites into de novodesigned proteins.
To date, the most computationally intensive protein design problems have been the redesign of natural proteins of known three-dimensional structure. But the new automated approaches open the door to the denovo design of structures with entirely novel backbone conformations. It will be interesting to see if Dahiyat and Mayo's approach of designing an optimal sequence for a given fold is sufficient, or if it will be necessary also to destabilize alternate possible folds. Indeed, when using an earlier version of their algorithm to repack the interior of the coiled coil from GCN4, they had to retain the identity of a buried Asn residue from the wild-type protein. Although the inclusion of this Asn actually destabilized the desired fold, it was nevertheless essential to avoid the formation of alternate, unwanted conformers (18). The ability to ask such focused questions will reveal much about how natural proteins adopt their folded conformations while simultaneously allowing the design of entirely new polymers for applications ranging from catalysis to pharmaceuticals.
The Internet Course on The Principles of Protein Structure, organized by Birkbeck College in collaboration with the Virtual School of Natural Sciences (VSNS) of the Globewide Network Academy (GNA), presents a tutorial on protein structure. The tutorial covers primary structure, protein geometry, protein synthesis, secondary and tertiary structure, molecular forces in relation to protein structure, and protein interactions. A glossary of terms related to protein structure is included in the discussion of protein tertiary structure.
The American Peptide Society, Inc. provides a list of links to Protein and Peptide Related Sites. The list includes a link to A Compendium of Information on Individual Amino Acids, which provides molecular formulas, structures, and physical data for the common amino acids.
Primary Structure provides links to information about amino acids including proline, histidine, cysteine, and other amino acids discussed in this article. Properties of Amino Acids presents diagrams of the amino acids and information about their properties.
The World Wide Web Virtual Library: Biochemistry and Molecular Biology (Biosciences) presents a list of Web resources related to biochemistry, including the biochemistry of amino acids and proteins.
The ExPASy World Wide Web (WWW) molecular biology server is maintained by the Geneva University Hospital and the University of Geneva. This server is dedicated to the analysis of protein and nucleic acid sequences. ExPASy provides access to SWISS-PROT, an annotated protein sequence, and to SWISS-3DIMAGE, a database of 3D images of proteins and other biological macromolecules.
Pedro's BioMolecular Research Tools is a collection of WWW links to information and services useful to molecular biologists. It provides links to molecular biology search and analysis tools; bibliographic, text, and Web search services; guides and tutorials; and biological and biochemical journals and newsletters.
SCOP: Structural Classification of Proteins aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in Brookhaven National Laboratory's Protein Data Bank (PDB).
|
|||||||
Collections
under which this article appears:
Biochemistry Enhanced Content |
Volume 278, Number 5335 Issue of 3 Oct 1997, pp. 80 - 81
©1997 by The American Association for the Advancement of Science.