THE BOOK OF LIFE:
READING THE SEQUENCE OF HUMAN DNASeptember 1998
A major goal of the Human Genome Project is to sequence the entire length of human DNA.
The sequence of DNA bases contains the instructions for everything a cell does, from conception until death. If the letters representing the 3 billion bases that make up the human genome were printed out in books, and the books were stacked one on top of the other, they would reach as high as the Washington Monument. The ultimate goal of the Human Genome Project is to read the order, letter by letter, of those 3 billion bases. Changes in the spelling of the DNA letters can increase your chances of developing an illness, protect you from getting sick, or predict the way your body will handle medicines. Once scientists can read the DNA instruction book, they will be able to understand and treat diseases better.
The federally funded Human Genome Project began sequencing the DNA of laboratory organisms in 1990, while fine-tuning the strategy that would eventually be used to sequence the larger, more complex human genome. The complete DNA sequence of the genomes of many organisms have been completed, including that of the bacteria E. coli and baker's yeast. The end of 1998 will mark the completion of the first genome sequence from a multi-celled animal, the roundworm Caenorhabditis elegans.
The first methods for sequencing DNA were developed in the mid-1970s. At that time, scientists used a series of chemical reactions to sequence only a few base pairs per year, not enough to take on a single gene of several thousand bases. When the Human Genome Project began in 1990, few laboratories had sequenced even 100,000 bases, and the cost of doing so was more than$10 per base pair.
Now, machines read the sequence quickly, but they still can only read short DNA fragments at a time. So, using a strategy referred to as "shotgun" sequencing, the text of each page of those books stacked as tall as the Washington Monument is randomly cut into small fragments. These fragments are small enough for sequencing machines to read. But to get long stretches of DNA, you must then re-assemble these sequenced fragments back into sentences, paragraphs, chapters, and books. For the most part, sophisticated computer programs perform the re-assembly of the millions of pieces of this giant puzzle.
Sequencing of the human genome began in earnest in 1996. They intend to produce the first fully completed, highly accurate reference sequence of the human genome by the end of 2003, the year that marks the 50th anniversary of the discovery of the structure of DNA by James Watson and Francis Crick. Researchers also expect to pass another important milestone by 2001, when they will have a useful "working draft" of the sequence.
In the United States, the National Institutes of Health and the Department of Energy will sequence 60-70% of the human genome.
The rest of the human genome will be sequenced by the Sanger Centre in England, funded by the Wellcome-Trust, and through other sequencing centers around the world.
The ultimate Human Genome Project task of sequencing all 3 billion base pairs in the human genome will provide scientists with a virtual blueprint of a human being. With the sequence in hand researchers can begin to "read" the information in the genes and understand how genes function. From there, researchers can start to unravel biology's most complicated processes:
How a baby develops from a single cell? How genes coordinate the functions of tissues and organs? How disease predisposition occurs? How the human brain works?
NHGRI Office of Communications, 31 Center Drive, building 31, Room 4B09, MSC 2152, Tel: 301.402.0911, FAX: 301.402.2218 http://www.nhgri.nih.gov