Reconstructions of Metabolism

Bacillus subtilis: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Vladmir Nazarenko
    Valeri Nenashev
    Ross Overbeek 
    Elena Panyushkina
    Lyudmila Pronevitch

Escherichia coli: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Vladmir Nazarenko
    Valeri Nenashev
    Ross Overbeek 
    Elena Panyushkina
    Lyudmila Pronevitch

Haemophilus influenzae: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Terry Gaasterland 
    Natalia Maltsev 
    Ross Overbeek

Homo sapiens: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Valeri Nenashev
    Ross Overbeek 
    Elena Panyushkina
    Lyudmila Pronevitch

Mycoplasma capricolum: The Pathways [The Pathways with Evidence] October 19,1995

    Evgeni Selkov
    Terry Gaasterland 
    Patrick Gillevet 
    Natalia Maltsev 
    Ross Overbeek

Mycoplasma genitalium: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Terry Gaasterland 
    Natalia Maltsev 
    Ross Overbeek

Saccharomyces cerevisiae: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Vladmir Nazarenko
    Valeri Nenashev
    Ross Overbeek 
    Elena Panyushkina
    Lyudmila Pronevitch

Salmonella typhimurium: The Pathways [The Pathways with Evidence]

    Evgeni Selkov
    Vladmir Nazarenko
    Valeri Nenashev
    Ross Overbeek 
    Elena Panyushkina
    Lyudmila Pronevitch

Sulfolobus solfataricus: The Pathways [The Pathways with Evidence] November 22, 1995

    Evgeni Selkov
    Valeri Nenashev
    Ross Overbeek 
    Lyudmila Pronevitch

What Do We Mean by Reconstructions of Metabolism?

Evgeni Selkov has worked for many years collecting and encoding data relating to enzymology and metabolism. His collection of "working notes" on metabolism includes a collection of over 1800 metabolic charts, most of which included EC numbers. In 1995, he agreed to make this collection available, knowing that it still contained some errors; he felt that the collection of diagrams would be useful to other researchers, and the quickest way to detect and correct errors would be to allow other experts to have access to the documents. A group of researchers at Argonne National Laboratory have been working with Dr. Selkov, helping him to prepare the collection for distribution.

The initial collection has been available from within PUMA since mid-1995. Now, as increasing amounts of sequence data have become available, Dr. Selkov has initiated an effort to organize what is known of the metabolism for a number of the organisms for which substantial sequence has been released to the research community. The first organism that was selected was Mycoplasma capricolum.

In the process of developing a more complete summary of what is known or could be deduced about the metabolism of Mycoplasma capricolum, Dr. Selkov found it necessary to extend his collection substantially. Indeed, he believes that his original collection did contain many omissions and should be viewed only as a personal attempt to formulate a growing integration of metabolic data. His "reconstructions" for specific organisms represent an effort to systematically gather what is known about these organisms and present the result in a coherent, structured form. These efforts are based on available sequence data, but are also critically dependent on numerous contributions from the available literature accumulated in a computer readable form of EMP (Enzymes and Metabolic Pathways Database) and phylogenetc analysis. For each such reconstruction, a journal article will be prepared to explain the emerging synthesis.

We are attempting to construct and organize "metabolic reconstructions" for a growing list of organisms. These reconstructions are constructed from available sequence data, a substantial body of biochemical literature, and judgement. We think of them as models that capture our current understanding of the metabolism of each organism. It must be emphasized that these reconstructions are not reviews of what is known of the metabolism, in the sense that they include both conjectures as well as clearly established biochemistry. It should be completely understood that some of the conjectures will utlimately turn out to be incorrect.

We view the process of generating such models as analagous to fitting a curve to a set of data points. One attempts to generate a model that correctly captures the unerlying reality; in the process, some data points will clearly be revealed to represent either errors or aspects of reality that have been inaccurately captured by the model. Determination of the source of such inconsistencies can often be the source of major new insights.

Hence, we urge the reader to consider our assessments critically. While many of the pathways included in a reconstruction will have overwhelming support from the sequence data, others will include "missing enzymes" or "missing non-catalytic proteins". These "missing" items represent inconsistencies which must eventually be resolved. Ultimately, what everyone desires is a representation of metabolism that accurately connects to the known sequence information, offering a consistent interpretation. We are releasing these reconstructions in the spirit of a "current hypothesis" that will be continuously refined until a coherent picture emerges. In our view, such a completely coherent picture cannot yet be produced for a number of reasons, but that it is becoming realistic to believe that such a picture could be constructed in many cases over the next few years.

Let us consider briefly the sources of uncertainty. We are attempting to merge information from three broad sources: the sequence data that is now being rapidly generated, a rich body of literature on biochemistry of specific organisms, and our understanding of the interdepencies of biochemical mechanisms. Substantial errors inevitably occur in all three sources.

In the case of sequence data, there can be errors in the reported sequence, but there are also errors in the interpretation of older sequences that have played a role in annotating the new sequence. In a number of cases, genes have been historically assigned an incorrect enzymatic activity; even when this has later been detected, the original error has often been propagated to a number of newer sequences, leading to a growing problem. Such errors are inevitable, and will gradually be detected and corrected; indeed, we believe that the type of analysis present in the metabolic reconstructions offers a valuable tool for detecting such errors. However, even if no such errors existed, the problem of assigning a function to coding sequences is now an extremely difficult task, although it will rapidly become easier as more data becomes available. In some cases there is a clear homology to an existing coding sequence for which the function is clearly understood. There are many other cases in which there is a weak homology, which would at best allow one to make conjectures about the newer coding sequence. This is well known by those attempting to assign functions to genes, and we mention it only to emphasize that many of the underlying judgements resulting from sequence analysis must be viewed as tentative.

In the case of biochemical evidence, there is also a substantial amount of uncertainty. Occasionally the failure to detect a function simply reflects the conditions under which the experiment was run. In other cases, invalid generalizations have been formed about a complete phylogenetic grouping. Again, such errors are completely inevitable and reflect the underying experimental reality.

Finally, it is also completely possible that our understanding of the interdependencies of metabolic processes is flawed. When it seems obvious to us that the existence of one metabolic capability implies the existence of another, we might well be overlooking novel biochemistry that remains to be explicated.

The reconstruction presented here contains a skeleton of the metabolism. It includes:

metabolic intermediates and their synonyms,
enzymes presented by their EC-numbers,
coenzymes, prosthetic groups, and enzyme cofactors,
membrane transporters of different kind,
noncatalytic binding proteins etc.

All metabolic maps represent predicted subcellular locations of enzymes and intermediates. More detailed information that includes predicted regulatory mechanisms (allosteric interactions, inducers, repressors, etc.) will be available with the second release of EMP. The release is expected to be ready in January 1996.

Why These Organisms?

Selkov met Pat Gillevet during a visit to America in 1993. In 1994, it was proposed that Selkov examine the question of what could be learned about the metabolism of an organism from the sequence data. A significant percentage of the genome for Mycoplasma capricolum had been sequenced (the exact size of the genome is not yet clear, but let us estimate that approximately a third of the sequence had become available via the The Mycoplasma capricolum Project. Selkov decided to take on this project to explore how much could be learned from the available sequence data, and the reconstruction that is now available via PUMA is the product of that effort. The effort builds upon the sequencing done at Harvard, the analysis and help from Pat Gillevet and researchers that helped determine the enzymes represented in the sequence [Exploring the Mycoplasma capricolum Genome].
The EMP has contained a fairly large and growing amount of data relating to Escherichia coli, so a reconstruction from the partial genome and the substantial body of literature was done.
The Haemophilus influenzae genome was, of course, the first complete bacterial genome available to the research community. We believe that it will be the first of many, and the obvious goal would be to try to produce reconstructions as quickly as the genomes become available. It might well be argued that such haste will produce errors that could be avoided if the reconstructions were to be done more slowly. In our view, there is a real trade-off; we believe that ultimately the most accurate reconstructions will emerge from these attempts to synthesize what is known as rapidly as possible, realizing that experts in the biochemistry of each organism will need to verify (and sometimes correct) our efforts.

Is There More to Come?

As sequencing efforts have now started producing substantial amounts of valuable data, we believe that the reconstruction of the metabolism of these organisms will play a useful role in supporting research to interpret the data. Selkov is now working on the detailed reconstructions of a number of other genomes, and we will make them available as they are completed.

Return to PUMA home page