French Genoscope ran the first analysis of the complete draft of the
human genome
Evry 19 July 2000 With Genoscope (Centre National de Sequencage), officially
created on January 1st 1997, France rejoined the group of
countries which had initiated large scale sequencing. It is a
non profit organization located at Evry, France, and owns the
second largest sequencing facility in Europe. Last year Compaq
started to invest in bioinformatics and founded a Bioinformatics
Expertise Center in Marlboro, Massachusetts to better support
customers and business partners in the industry. Its Cambridge
Research Laboratory too focussed on bioinformatics. Here the
application performance is optimised and and the data mining
algorithms for genetic data are developed. Now Compaq offers a
cluster of AlphaServer ES40 systems with 100 CPUs and a terabyte
of storage, at the enterprise Systems Lab in Littleton for the
Human Genome Project. Research institutions can use this cluster
to complete the annotation of the human genome. One of the early
users was Genoscope. They ran the first analysis of the complete
draft of the human genome in only 38 hours.
Genoscope have been one of the early users of the Alpha cluster
in Littleton and ran the first analysis of the complete draft of
the human genome, the first time anywhere in the world. The
collaboration with Compaq allowed this project. Now the results
are scientifically analysed at Genoscope and will provide highly
accurate prediction of the total number of genes in the human
genome. The analysis used the LASSAP (Large Scale Sequence
compArison Package) code and ran on the Alphaserver cluster at
Compaq's facility in Littleton. The cluster consists of 25-nodes
composed of Alphaserver EV67 ES40s, each with 4 CPUs. The
complete analysis run of the whole draft dataset took only 38
hours on the 100 CPUs. The Alphaserver cluster needed 25% less
time to complete a run 2.5 times larger than all previous runs
made on any system available from any vendor.
Overall system performance was 2.5 times greater than all the
competitive system. Monsieur Jean-Jacques Codani, CEO of
Gene-IT, the authors of LASSAP, said that he was very impressed
by such a scalability, absolutely linear and equal to the factor
of the number of processors.
Gene-IT continues to improve the LASSAP code on the Alpha
processor. With the support of specialists from the Compaq HPTC
Solution Centre in Annecy, France, the code will be optimised.
Further improvements will be made, to maintain Compaq's
leadership position in delivering solutions in Computational
Biology.
Genoscope Hardware
Its IT infrastructure is based on a high-performance and
high-availability UNIX computing environment. Genoscope chose
Compaq Alphaservers and Compaq StorageWorks. The main decision
criteria have been: resiliency, compute power and storage
scalability. Actually they own a cluster of 4 Digital
quadriprocessor, 525 MHz, computers, the Compaq GS60 with 4 GB
of memory each, with a peak performance of 17 GFlop/s. The main
storage system are disk racks with a total capacity of about 1
TByte. The backup is realised on 330 cartridges robot, with a 35
to 70 GB capacity for each cartridge.
Moreover, there are additional dedicated servers, a general
purpose (X-term support, mail, PC's and Macintosh integration)
Digital DS20 (2 EV6 CPU with 500 MHz) server, a cluster of 2
Digital AS 1000 servers interfacing the sequencers with the main
calculation structure, a Digital AS 255 public server (FTP and
HTTP), a dedicated Digital server for logistic support and two
machines, Sun and Compaq, as part of the Firewall.
All this equipment is connected through several virtual 10
Mbit/s and 100 Mbit/s ethernet-switched networks. Genoscope
actively participates in the metropolitan network of the Evry
area (RMRE) project, associating Genoscope with the Evry
University, GENOPOLE, Genthon, INT, IEE-CNAM and ENSMP-Centre
des Matériaux.
Genoscope's choice of the computer server underpins the strong
acceptance of Compaq Alpha systems in bioinformatics, as the
other major gene sequencing centres, the Sanger Centre in the
UK, MIT Whitehead Institute in the US and the two largest
private sequencing companies, Celera Genomics and Incyte
Genomics Inc, also chose Alphaservers and StorageWorks.
An important issue is the back-up and data recovery. The disk
rack has a RAID5 organisation, and the memory is backed-up. Two
disks are reserved for immediate automatic replacement, in case
of material failure. Then there is a daily incremental backup,
double weekly complete backup - one of which is stored outside
of Genoscope) and on-line (or nearly on-line, i.e. on site)
conservation of back-ups for six months quickly accessible and
definitive backups are made each month.
Genoscope Software
Together with the GCG and Staden software packages, more than
240 programs are installed and used at Genoscope. Their main
application area is, of course, sequence analysis and genetic
and physical mapping. The first category includes phred for base
calling, phrap and consed for assembly, fasta, blast and Smith
and Waterman algorithm-based sequence comparison software, along
with LASSAP. LASSAP is sold by Gene-IT, a privately held company
- spin-off of INRIA, the French National Institute for Research
in Computer Science and Control. It was developed by Gene-IT
founders, from 1994 to 1998, in a research project at INRIA,
with technology transfer in mind. LASSAP is designed to fit
various needs from daily watches to complex workflows, raises
limitations of usual sequence comparison softwares by allowing
large scale analysis, speeds up Discovery by combining different
methods with different kinds of databases and by allowing to
answer global questions. LASSAP is a technological break
allowing scientists to blast different! It runs routinely on the
Genoscope machines in multi threaded (4 Cpus) and message
passing mode (16 Cpus). This software environment is
complemented with locally developed software and databases.
Uwe Harms
[News on Advanced IT]
[Calendar]
[Analysis]
[IT in Medicine]
|