Structural genomics projects aim to provide an experimental
structure or a good model for every
protein in all completed genomes.
Most of the experimental work for these projects will be directed toward
proteins whose fold cannot be readily recognized by simple sequence comparison
with proteins of known structure.
Based on the history of proteins classified in the SCOP
structure database
- only about a quarter of the early structural
genomics targets will have a new fold.
- Among the remaining ones, about half are likely to be evolutionarily
related to proteins of known structure, even though the homology
could not be readily detected by sequence analysis.
The SCOP database organizes proteins according to their structural and
evolutionary relationships.
Figure
1 shows how SCOP 1.40s classifies protein domain structures submitted to
the Protein Data Bank (PDB) (Bernstein et al., 1977) between 1987 and 1997.
Slightly more than half of the protein domains submitted to the PDB in
1997 -> identical or nearly identical to one already in the database.
A further 20% of the domains were from a protein
for which a structure had already been solved from a different species,
and 14% were new proteins for which there
was a known structure of a homolog in the same family.
In sum, more than 85% of the new protein domain
structures experimentally determined were in the same SCOP family as a
protein already in the PDB.
Figure
2 shows what was discovered from the proteins lacking significant pairwise
sequence similarity to those already in the protein database.
For these proteins, classification in SCOP requires knowledge of
the structure; sequence would fail to predict these categories. In 1997,
fewer
than a quarter of such protein domains had a new fold, compared
with about a half in 1990.
Even when more sensitive sequence comparison methods are used, like PSI-BLAST
in Figure
3, only 26% of unrecognizable sequences represent new folds.
This suggests that the 459 protein folds
in the most recent SCOP incorporate a majority of the frequently occurring
globular
structures.
From this trend, it might seem that all of the most
common folds may soon be known.
We still know little about those structures that are difficult to characterize
structurally.--such as membrane proteins--