Impact of IT on the Sanger Centre

Munich 30 November 2000 One of the most important aid in studying genomes is the use of high-end computers. Phil Butcher, Head of Information Technology at Sanger Centre, presented the storage and computing challenges as well as the actual usage of clusters of Alpha processors.

One of the most important and critical issue is data storage. It grew from 3 TByte in 1998, 4 TByte in October 1999, 8 TByte in April 2000 to 22 TByte in total in November 2000. The capacity of RAID disk storage increased over the past years in line with the sequencing projects ramp up. To assemble the human genome we have all our archived and nearlined data online.

The other factor is the computer systems architecture. At Sanger they built compute farms from the beginning - for many years, and utilised network storage before SAN/NAS (Storage Area Network/Network Attached Storage became a popular acronym. They implemented loosely coupled clusters to maximise the use of all the systems and distribute the workload efficiently. From the desktop workgroup systems the user distributes via LSF (Load Sharing Facility) his batch jobs onto the front-end compute servers or the compute server farms. Phil Butcher mentioned some limitations, as NFS access is slow, data sizes are increasing, the store has to grow, they need larger memory configurations and the management overhead increases as the number of nodes increases. Now they turn to Fibre channel/Memory channel Tru64 clusters. It is done for better disk I/O (fibre channel), scalability (multi-cpu, multi-terabyte) and improved manageability (single system image). Thus whole clusters are managed as single entities.

Computers at Sanger Centre

All the projects have their specific machines, which can be used by others when they are idle. The project Ensembl recently installed a DS10L Alpha farm, these are the small size - Pizzabox - rack mountable workstations. In 8 racks there are 40 DS10L each, which sums up to 320 in total: 1 U high, EV6 (466 MHz), 320 GB memory and 19.2 GB internal disk. This is equivalent to 10 GS320 with a performance of 326 GFlop/s peak. This delivers capacity for more than 500 000 blast searches per day. The blast farm has 440 nodes, there is a largescale assembly and sequencing server, as server for SNP, Mapping, Informatics, Sequence data processing and Pathogen. With LSF Sanger has the capability to use many of the 700+ compute nodes as a *single" Sanger Compute Engine - modular supercomputing. The aggregated peak performance can be estimated in the range of 700 to 800 GFlop/s. With an estimation of 60% of peak, Sanger would be in the 30th to 40th rank in the actual Top500 list.

Phil Butcher listed the actual projects which all are compute intensive. Thus Sanger has to scale up by a factor of five and deal with the physical limitations. This will involve thousands of CPUs, large number of PC farm nodes and high-end, large memory SMP configurations. Additionally they need 50 to 100 TByte of storage.

Future plans at Sanger Centre

For the medium term Butcher will replace memory channel interconnect by implementing 200 MByte/s Quadrics Switches (Quadrics Supercomputer World Ltd. It is built out of 8-way crossbar chips with a latency of 3 micro seconds end-end from user application. The network is capable of 256 SMP nodes. He will also connect individual clusters into one switching network - Sierra Clusters.

The immediate future will se a storage area network, installing 7.5 TByte to enable disk mirroring and controller/controller snapshots. It will be connected to the Sanger facilities. Additionally they will realise an institute to institute clustering and thus a closer collaboration between Sanger and EBI, which brings the need for site wide shared clusters.

For the long term Future wide area clusters are needed for large scale collaboration. The GRID technology, global distributed computing, will come, international cluster collaborations with other scientific institutes. Phil Butcher said that Sanger is keen to keep abreast of this emerging technology - global compute engines.


Uwe Harms

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]