Evgeni Selkov Vladmir Nazarenko Valeri Nenashev Ross Overbeek Elena Panyushkina Lyudmila Pronevitch
Evgeni Selkov Vladmir Nazarenko Valeri Nenashev Ross Overbeek Elena Panyushkina Lyudmila Pronevitch
Evgeni Selkov Terry Gaasterland Natalia Maltsev Ross Overbeek
Evgeni Selkov Valeri Nenashev Ross Overbeek Elena Panyushkina Lyudmila Pronevitch
Evgeni Selkov Terry Gaasterland Patrick Gillevet Natalia Maltsev Ross Overbeek
Evgeni Selkov Terry Gaasterland Natalia Maltsev Ross Overbeek
Evgeni Selkov Vladmir Nazarenko Valeri Nenashev Ross Overbeek Elena Panyushkina Lyudmila Pronevitch
Evgeni Selkov Vladmir Nazarenko Valeri Nenashev Ross Overbeek Elena Panyushkina Lyudmila Pronevitch
Evgeni Selkov Valeri Nenashev Ross Overbeek Lyudmila Pronevitch
Evgeni Selkov has worked for many years collecting and encoding data relating to enzymology and metabolism. His collection of "working notes" on metabolism includes a collection of over 1800 metabolic charts, most of which included EC numbers. In 1995, he agreed to make this collection available, knowing that it still contained some errors; he felt that the collection of diagrams would be useful to other researchers, and the quickest way to detect and correct errors would be to allow other experts to have access to the documents. A group of researchers at Argonne National Laboratory have been working with Dr. Selkov, helping him to prepare the collection for distribution.
The initial collection has been available from within PUMA since mid-1995. Now, as increasing amounts of sequence data have become available, Dr. Selkov has initiated an effort to organize what is known of the metabolism for a number of the organisms for which substantial sequence has been released to the research community. The first organism that was selected was Mycoplasma capricolum.
In the process of developing a more complete summary of what is known or could be deduced about the metabolism of Mycoplasma capricolum, Dr. Selkov found it necessary to extend his collection substantially. Indeed, he believes that his original collection did contain many omissions and should be viewed only as a personal attempt to formulate a growing integration of metabolic data. His "reconstructions" for specific organisms represent an effort to systematically gather what is known about these organisms and present the result in a coherent, structured form. These efforts are based on available sequence data, but are also critically dependent on numerous contributions from the available literature accumulated in a computer readable form of EMP (Enzymes and Metabolic Pathways Database) and phylogenetc analysis. For each such reconstruction, a journal article will be prepared to explain the emerging synthesis.
We are attempting to construct and organize "metabolic reconstructions" for a growing list of organisms. These reconstructions are constructed from available sequence data, a substantial body of biochemical literature, and judgement. We think of them as models that capture our current understanding of the metabolism of each organism. It must be emphasized that these reconstructions are not reviews of what is known of the metabolism, in the sense that they include both conjectures as well as clearly established biochemistry. It should be completely understood that some of the conjectures will utlimately turn out to be incorrect.
We view the process of generating such models as analagous to fitting a curve to a set of data points. One attempts to generate a model that correctly captures the unerlying reality; in the process, some data points will clearly be revealed to represent either errors or aspects of reality that have been inaccurately captured by the model. Determination of the source of such inconsistencies can often be the source of major new insights.
Hence, we urge the reader to consider our assessments critically. While many of the pathways included in a reconstruction will have overwhelming support from the sequence data, others will include "missing enzymes" or "missing non-catalytic proteins". These "missing" items represent inconsistencies which must eventually be resolved. Ultimately, what everyone desires is a representation of metabolism that accurately connects to the known sequence information, offering a consistent interpretation. We are releasing these reconstructions in the spirit of a "current hypothesis" that will be continuously refined until a coherent picture emerges. In our view, such a completely coherent picture cannot yet be produced for a number of reasons, but that it is becoming realistic to believe that such a picture could be constructed in many cases over the next few years.
Let us consider briefly the sources of uncertainty. We are attempting to merge information from three broad sources: the sequence data that is now being rapidly generated, a rich body of literature on biochemistry of specific organisms, and our understanding of the interdepencies of biochemical mechanisms. Substantial errors inevitably occur in all three sources.
In the case of sequence data, there can be errors in the reported sequence, but there are also errors in the interpretation of older sequences that have played a role in annotating the new sequence. In a number of cases, genes have been historically assigned an incorrect enzymatic activity; even when this has later been detected, the original error has often been propagated to a number of newer sequences, leading to a growing problem. Such errors are inevitable, and will gradually be detected and corrected; indeed, we believe that the type of analysis present in the metabolic reconstructions offers a valuable tool for detecting such errors. However, even if no such errors existed, the problem of assigning a function to coding sequences is now an extremely difficult task, although it will rapidly become easier as more data becomes available. In some cases there is a clear homology to an existing coding sequence for which the function is clearly understood. There are many other cases in which there is a weak homology, which would at best allow one to make conjectures about the newer coding sequence. This is well known by those attempting to assign functions to genes, and we mention it only to emphasize that many of the underlying judgements resulting from sequence analysis must be viewed as tentative.
In the case of biochemical evidence, there is also a substantial amount of uncertainty. Occasionally the failure to detect a function simply reflects the conditions under which the experiment was run. In other cases, invalid generalizations have been formed about a complete phylogenetic grouping. Again, such errors are completely inevitable and reflect the underying experimental reality.
Finally, it is also completely possible that our understanding of the interdependencies of metabolic processes is flawed. When it seems obvious to us that the existence of one metabolic capability implies the existence of another, we might well be overlooking novel biochemistry that remains to be explicated.
The reconstruction presented here contains a skeleton of the metabolism. It includes: