French Genoscope ran the first analysis of the complete draft of the human genome

Evry 19 July 2000 With Genoscope (Centre National de Sequencage), officially created on January 1st 1997, France rejoined the group of countries which had initiated large scale sequencing. It is a non profit organization located at Evry, France, and owns the second largest sequencing facility in Europe. Last year Compaq started to invest in bioinformatics and founded a Bioinformatics Expertise Center in Marlboro, Massachusetts to better support customers and business partners in the industry. Its Cambridge Research Laboratory too focussed on bioinformatics. Here the application performance is optimised and and the data mining algorithms for genetic data are developed. Now Compaq offers a cluster of AlphaServer ES40 systems with 100 CPUs and a terabyte of storage, at the enterprise Systems Lab in Littleton for the Human Genome Project. Research institutions can use this cluster to complete the annotation of the human genome. One of the early users was Genoscope. They ran the first analysis of the complete draft of the human genome in only 38 hours.

Genoscope have been one of the early users of the Alpha cluster in Littleton and ran the first analysis of the complete draft of the human genome, the first time anywhere in the world. The collaboration with Compaq allowed this project. Now the results are scientifically analysed at Genoscope and will provide highly accurate prediction of the total number of genes in the human genome. The analysis used the LASSAP (Large Scale Sequence compArison Package) code and ran on the Alphaserver cluster at Compaq's facility in Littleton. The cluster consists of 25-nodes composed of Alphaserver EV67 ES40s, each with 4 CPUs. The complete analysis run of the whole draft dataset took only 38 hours on the 100 CPUs. The Alphaserver cluster needed 25% less time to complete a run 2.5 times larger than all previous runs made on any system available from any vendor.

Overall system performance was 2.5 times greater than all the competitive system. Monsieur Jean-Jacques Codani, CEO of Gene-IT, the authors of LASSAP, said that he was very impressed by such a scalability, absolutely linear and equal to the factor of the number of processors.

Gene-IT continues to improve the LASSAP code on the Alpha processor. With the support of specialists from the Compaq HPTC Solution Centre in Annecy, France, the code will be optimised. Further improvements will be made, to maintain Compaq's leadership position in delivering solutions in Computational Biology.

Genoscope Hardware

Its IT infrastructure is based on a high-performance and high-availability UNIX computing environment. Genoscope chose Compaq Alphaservers and Compaq StorageWorks. The main decision criteria have been: resiliency, compute power and storage scalability. Actually they own a cluster of 4 Digital quadriprocessor, 525 MHz, computers, the Compaq GS60 with 4 GB of memory each, with a peak performance of 17 GFlop/s. The main storage system are disk racks with a total capacity of about 1 TByte. The backup is realised on 330 cartridges robot, with a 35 to 70 GB capacity for each cartridge.

Moreover, there are additional dedicated servers, a general purpose (X-term support, mail, PC's and Macintosh integration) Digital DS20 (2 EV6 CPU with 500 MHz) server, a cluster of 2 Digital AS 1000 servers interfacing the sequencers with the main calculation structure, a Digital AS 255 public server (FTP and HTTP), a dedicated Digital server for logistic support and two machines, Sun and Compaq, as part of the Firewall.

All this equipment is connected through several virtual 10 Mbit/s and 100 Mbit/s ethernet-switched networks. Genoscope actively participates in the metropolitan network of the Evry area (RMRE) project, associating Genoscope with the Evry University, GENOPOLE, Genthon, INT, IEE-CNAM and ENSMP-Centre des Matériaux.

Genoscope's choice of the computer server underpins the strong acceptance of Compaq Alpha systems in bioinformatics, as the other major gene sequencing centres, the Sanger Centre in the UK, MIT Whitehead Institute in the US and the two largest private sequencing companies, Celera Genomics and Incyte Genomics Inc, also chose Alphaservers and StorageWorks.

An important issue is the back-up and data recovery. The disk rack has a RAID5 organisation, and the memory is backed-up. Two disks are reserved for immediate automatic replacement, in case of material failure. Then there is a daily incremental backup, double weekly complete backup - one of which is stored outside of Genoscope) and on-line (or nearly on-line, i.e. on site) conservation of back-ups for six months quickly accessible and definitive backups are made each month.

Genoscope Software

Together with the GCG and Staden software packages, more than 240 programs are installed and used at Genoscope. Their main application area is, of course, sequence analysis and genetic and physical mapping. The first category includes phred for base calling, phrap and consed for assembly, fasta, blast and Smith and Waterman algorithm-based sequence comparison software, along with LASSAP. LASSAP is sold by Gene-IT, a privately held company - spin-off of INRIA, the French National Institute for Research in Computer Science and Control. It was developed by Gene-IT founders, from 1994 to 1998, in a research project at INRIA, with technology transfer in mind. LASSAP is designed to fit various needs from daily watches to complex workflows, raises limitations of usual sequence comparison softwares by allowing large scale analysis, speeds up Discovery by combining different methods with different kinds of databases and by allowing to answer global questions. LASSAP is a technological break allowing scientists to blast different! It runs routinely on the Genoscope machines in multi threaded (4 Cpus) and message passing mode (16 Cpus). This software environment is complemented with locally developed software and databases.


Uwe Harms

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]