Celera resolutely has taken the lead in unfolding the human genome

Rockville 10 January 2000Celera Genomics, a PE Corporation business, has DNA sequence in its database which covers 90 percent of the human genome. As a result of the extensive sequence coverage of the 23 pairs of human chromosomes and based on statistical analysis, Celera believes that greater than 97 percent of all human genes are now represented in this database. The DNA sequence is from more than 10 million high quality sequences, generated at Celera in the world's largest DNA data factory. The sequence, composed out of randomly selected fragments of all human chromosomes, contains over 5.3 billion base pairs, which constitute the letters of the human genetic code, at greater than 99 percent accuracy.

Advertisement

The stored 5.3 billion base pairs represent 2.58 billion base pairs of unique sequence which have been calculated to cover about 81 percent of an estimated genome size of 3.18 billion base pairs. These data, combined with all of the "finished" and "draft" human genome sequence data from the public databases, give Celera coverage of 90 percent of the human genome. The company's sequencing was performed on 300 PE Biosystems ABI Prism 3700 DNA Analysers.

The whole genome shotgun technique concentrates on sequencing the entire genome at once, allowing for real time discovery of human genes across the entire genome, according to J. Craig Venter, Ph.D., who is Celera's president and chief scientific officer. "The early phase of sequencing the human genome using the whole genome shotgun process is especially important for gene discovery. Today, we are rapidly coming to an end of that phase. Our statistical analysis and comparison to known genes suggest that more than 97 percent of all human genes are represented in our database."

Celera began to sequence the human genome on September 8, 1999, using the whole genome shotgun technique which its scientists have pioneered in sequencing the first complete genome in 1995 at The Institute for Genomic Research (TIGR). This technique involves the randomly shearing of human chromosomes into millions of different pieces of 2000 and 10.000 base pairs in length. The chromosome fragments are inserted into a plasmid vector and propagated in E. coli to produce millions of copies of each fragment.

Next, Celera scientists sequence both ends of each fragment. The millions of sequences representing billions of letters of genetic code are then assembled into the proper order using proprietary genome assembly algorithms and the Celera supercomputer facility, which results in a reconstruction of the linear sequence of the 23 pairs of human chromosomes. As second alternative, the human genome sequencing effort, funded by governments and some public charities around the entire world, is based on the sequencing of large clones of human DNA in bacterial artificial chromosomes (BAC) using a variation of the shotgun sequencing method. With that approach, approximately 25.000 BAC clones have to be sequenced and their order mapped to reassemble the 23 human chromosomes. There are on average a number of 150.000 letters of human DNA in a BAC.

Its "draft" sequence represents most of these base pairs; however the various fragments of DNA sequence are largely unordered. By combining the Celera whole genome data with the individual clone "draft" BAC data, the company plans to order the sequence within each BAC clone and subsequently place the different clones in the proper order to construct the genome's sequence. Celera expects to simultaneously and independently assemble the genome by means of its whole genome assembly algorithms. It is exactly the combination of both these complementary genome sequencing and assembly approaches which greatly reduces the time for Celera to finish the sequence of the human genome.

For the next several months, Celera plans to continue its full-scale effort on human genome sequencing, with the anticipated addition of approximately two billion base pairs per month. The additional sequences should provide redundant coverage of the chromosome sequences, improve accuracy, and aid in the final assembly of the chromosome sequences. When sequencing and scientific analysis of the human genome is completed, the consensus sequence data will be submitted for publication in a scientific journal. These published data will be made freely available to researchers around the world under a non-redistribution agreement.

The Celera gene discovery team has identified several thousand new genes that potentially play key roles in cellular communication and the regulation of physiological systems in the human body, including blood pressure, cell growth, and neuro-transmission. These are primarily rarely expressed genes and are not represented in the public database, GenBank. These are genes of significant interest to the pharmaceutical industry, since they can be utilised as the basis of new therapeutic development. The company has previously reported filing provisional patent applications on newly discovered genes and continues to file these applications on medically relevant gene discoveries. As such, Celera intends to file full applications on all those medically important discoveries, and has implemented a non-exclusive licensing programme to make the intellectual property available to Celera database subscribers.

Celera is currently on target to complete the sequencing phase of the human genome by summer 2000, incorporating data from GenBank. After the phase of sequencing, the company plans to begin shotgun sequencing of the mouse genome. On December 30, 1999, Celera completed the release of a partially assembled Drosophila genome sequence to the public data bank while now the company continues to perform scientific analysis in conjunction with the Berkeley Drosophila Genome Project (BDGP) as well as other collaborators. In this way, Celera intends to submit a manuscript to a scientific journal for publication in spring 2000. To date, the company and its collaborators have discovered a total of 14.000 genes in the genome, and many in commercially important protein families, which should prove valuable in developing new therapeutics and insecticides.

Celera's mission is to become the definitive source of genomic and related agricultural and medical information. Celera's information will be available on a subscription basis to academic and commercial institutions who will have access to tools for viewing, browsing, analysing, and integrating data in a way which will assist scientists in accelerating their understanding of the human genetic code. PE Corporation at present comprises two operating groups. Celera Genomics Group, headquartered in Maryland, intends to become the definitive source of genomic and related medical information. PE Biosystems Group, based in California, and with sales of $1.2 billion during the fiscal year 1999, is developing and marketing instrument-based systems, reagents, software, and contract related services to the life science industry and research community.

You can find more news on Celera's human genome research in VMW's January 1999 issue, in the article Alpha system architecture will help to reveal the omega of the human genome.


Leslie Versweyveld

[Medical IT News][Calendar][Virtual Medical Worlds Community][News on Advanced IT]