Number of human genes estimated between 28.000 and 34.000

Evry 30 June 2000Researchers from the laboratory of Jean Weissenbach, General Director of Genoscope, have published a study of 30 percent of the so-called compact genome sequence of a fish, Tetraodon nigroviridis, and compared it with 42 percent of the human genome. This study appears in the June 2000 issue of


Nature Genetics. A search for similarities between the two genomes led Genoscope to conclude that the human genome contains between 28.000 and 34.000 genes, and validates the development of a new generation tool for the analysis of genomes and identification of genes.

A new estimate shows the number of human genes is between 28.000 and 34.000. The computation was carried out by the Genoscope Sequencing Center in Evry, France, on a multiprocessor Sun Starfire Enterprise 10000 Server, using LASSAP bioinformatics software from Gene-IT.

Following the Genoscope's new EXOFISH gene prediction method, which consists of comparing 30 percent of the genome sequence of the pufferfish or Tetraodon nigroviridis against all known human DNA, it was shown that a human genome contains between 28.000 and 34.000 genes. A single Sun Enterprise 10000 Server configured with 64 400MHz UltraSPARC-II processors with 8 MB of cache and 64 GB of central memory provided the computational power to perform the required 72 billions blast protein/protein comparisons in less than 2 days.

LASSAP's Smith-Waterman algorithm has particularly been optimised for Sun platforms using the UltraSPARC Visual Instruction Set (VIS), yielding impressive performance and scalability. For instance, a scan of a 5017-aminoacid protein against the database of all proteins known to date (TrEMB/SwissPROT r.11) shows a performance of 3.2 billion matrix cells/second and a speed-up of 62 on a 64-way Sun Enterprise 10000 Server. Similarly, building protein families of E. coli and Synechocystis is performed in less than 8 minutes.

The French discovery is supported by the publication, in the very same issue of Nature Genetics, of a study by the American bioinformatician Philip Green, who, using a different approach, also found 34.000 genes. Bioinformatics, a major component of which is genetic sequencing, is key to the nascent field of pharmacogenomics. With pharmacogenomics, it may become possible to tailor-make medical treatments for specific sub-populations, and eventually, specific individuals.

The LASSAP software package brings an innovative approach to sequence comparison and provides biologists tools and methods to enable researchers to speed up discovery by combining different methods with different kinds of databases, which allows them to answer global questions.

The announcement by Genoscope comes at the moment when the sequencing of the human genome is almost finished, and will lead into the "annotation" era, in which the sequencing data will be analysed in order to determine the role and function of the genes. At the present time, the number of genes in the human genome is a matter of major scientific controversy, which has divided the genetics community.

Ad Emmen

