Structural genomics projects aim to provide an experimental structure or a good model for every protein in all completed genomes.

Most of the experimental work for these projects will be directed toward proteins whose fold cannot be readily recognized by simple sequence comparison with proteins of known structure.


Based on the history of proteins classified in the SCOP structure database

- only about a quarter of the early structural genomics targets will have a new fold.

- Among the remaining ones, about half are likely to be evolutionarily related to proteins of known structure, even though the homology could not be readily detected by sequence analysis.


 


The SCOP database organizes proteins according to their structural and evolutionary relationships.

Figure 1 shows how SCOP 1.40s classifies protein domain structures submitted to the Protein Data Bank (PDB) (Bernstein et al., 1977) between 1987 and 1997.

Slightly more than half of the protein domains submitted to the PDB in 1997 -> identical or nearly identical to one already in the database.

A further 20% of the domains were from a protein for which a structure had already been solved from a different species, and 14% were new proteins for which there was a known structure of a homolog in the same family.

In sum, more than 85% of the new protein domain structures experimentally determined were in the same SCOP family as a protein already in the PDB.

 

 
 
 

Figure 2 shows what was discovered from the proteins lacking significant pairwise sequence similarity to those already in the protein database.

 For these proteins, classification in SCOP requires knowledge of the structure; sequence would fail to predict these categories. In 1997, fewer than a quarter of such protein domains had a new fold, compared with about a half in 1990.

Even when more sensitive sequence comparison methods are used, like PSI-BLAST in Figure 3, only 26% of unrecognizable sequences represent new folds.


 This suggests that the 459 protein folds in the most recent SCOP incorporate a majority of the frequently occurring globular structures.

From this trend, it might seem that all of the most common folds may soon be known.

 

 
 
 
 
 
 
 
 
 

We still know little about those structures that are difficult to characterize structurally.--such as membrane proteins--

 

 
 


Next | Back