The actual situation at CEA
December of 2001 Compaq/Hewlett-Packard installed the TERA-1 machine, 2560 64-bit Alpha processors, 2.5 TB memory, 50 TB disk, which delivers a peak performance of 5.1 TFlop/s. The cluster interconnect was developed by Quadrics Supercomputers World in collaboration with Compaq. This supercomputer uses Compaq's Tru64 UNIX operating system. Several interesting functions such as Single System Image, Cluster File System and Parallel File System allow a simple and centralised administration. With nearly 4 TeraFlop/s the supercomputer was ranked number 4 in the June 2002 Top500 list. The TERA benchmark showed 1.32 Teraflop/s on 2474 processors - therefore the name of the computer TERA1.
The decision of CEA
Designed by Bull, Tera-10 (10 TeraFlop/s sustained performance) will integrate 544 NovaScale 6160 computing nodes, each including eight next generation Itanium processors codenamed Montecito. Montecito will be released, commercially in high volume, in 2006 and will feature dual-core technology. The peak performance will be higher than 60 TeraFlop/s. Quadrics, the leader in supercomputing network, is to provide QsNet high performance network to interconnect the NovaScale servers. The global configuration will feature 8,704 core processors with 27 terabytes of core memory and 1 Petabyte of disks.
Tera-10 will operate the Bull HPC software platform that includes the Linux operating system and Lustre, the global and parallel file system. This platform is based on open source software integrated and optimized by Bull's HPC competence centre in Echirolles, France.
Uwe Harms (UH): How did you run the call for tender that led to the purchase of the system?
Jean Gonnord (JG):The purchase is the result of a call for tender, following the French rules for public market called "request for procurement on performances". We published a call for interest in the official EEC journal in January 2004. Eight vendors answered beginning of March. A call for procurement (RFP) was issued mid March, to be answered beginning of May. CEA specifications were sent to these 8 vendors. They had to answer their best technical proposal with its cost, but not knowing the budget target.
UH: What did you collect in your technical specifications?
JG:The complete specification file can be summarized in a table of 258 criteria, which consists of functionalities and benchmarks. The 205 functionalities we wanted, should be answered by yes/no or a figure. They contain for example the power supply, floor space, etc. Additionally 53 measurements are made on CEA benchmarks, run on machines as near as possible of the final one, and extrapolated by the vendor who has to explain how he does it, and to commit on the result.
UH: Which vendors answered the call for procurement?
At the end of this stage four vendors delivered a final answer in the beginning of July. This answer included a risk study and a commitment to demonstrate before the CEA decision - which should take place in September - some technologies that CEA thought to be critical for the answer. This means, for example, the existence, at least as a prototype, of the proposed processor, the existence, at least as a prototype, of the proposed board and chipset, and the development status of critical software components like the parallel file system etc.
UH: What is the total investment, as published in the call for tender?
JG:The cost considered for the decision is the total cost on 4 years including the price of the machine, the maintenance but also the power consumption and the new infrastructure investment needed. The budget itself won't be disclosed.
UH: What was the base of the decision and why did you choose Bull? Because it was a French company?
JG:The base of the decision is the commitment of the vendors in their final proposal on the 258 criteria, confirmed (or not) by the "technologic" demonstration (risk), and the total cost.
On the four final answers, using these given criteria, Bull was the best. You have to remember that we are buying a complex machine, not a single processor or a network or specified software. Bull was globally the best on all these aspects, especially on some of very important interest for us :
- the quality and performance of the network (latency, barrier, which are of primary importance for our very large parallel applications. On these points Quadrics is far ahead. We noted that some proposals were even not better from what we got from Quadrics in 2001 on the TERA-1 machine
- the quality and performance of the I/O subsystem that we consider, as a return from experiment of TERA-1, of primary importance, for which Bull made a special effort.
- last but not least the commitment of Bull for Open Source
All these technical reasons led us to the choice of Bull. Naturally we are happy and proud that such a challenging request for proposal has been won by a set of European companies, Bull and Quadrics. But can you imagine that CEA/DAM would have risked his mission (France deterrent) for some economical reason ? Our choice just shows that the world has changed in the last five years and that European industry is back in the field of computer and especially in supercomputers.
UH: What about the Operating System, the cluster management software and Open Source Software?
JG:We made an important choice, we decided for an open source system - this is also in favour of Bull. This is a very new commitment from CEA/DAM. Based on our experience we consider these very powerful machines of primary importance for our simulation programme, and we want to be partner in this challenge. It was the reason, from the origin of the programme, why we push for the use of COTS (Commercial off-the-shelf) products (large diffusion components) for the hardware. This is fulfilled by the use by Bull of the Intel Itanium processors. Today we are reaching the final step by the use of open source software. As part of the open source community we can influence their development or at least do it our selves.
In the case of our TERA-10, Bull is developing an open source HPC version of Linux based on a standard kernel.
The Cluster Management System is still RMS from Quadrics which is the exception to our open-source commitment.
UH: What about the availability of the Itanium Montecito processor this year?
JG:As you know the official commercial large diffusion date for Montecito is January 2006. We are quite confident about Intel to deliver us the Montecito on time for our 10 sustained Tflops demonstration end of 2005. They already demonstrate the TERA benchmark running on the prototype and we will have access to Montecito very soon.
UH: I read that CEA/DAM is responsible since September 2003 for all CEA computing, what different type of machines do you offer?
JG:Since September 2003 DAM is also responsible of the open CEA computing centre (CCRT) that CEA shares with EDF, SNECMA and ONERA.
The CCRT offers today a total of 3.6 Teraflops :
- 2.4 through an HP machine absolutely similar to TERA-1 (alpha, ES45, Elan-3),
- 0.8 through an HP cluster of Opteron,
- 0.4 through a vectorial NEC SX6 machine.
The CCRT will be completely renewed beginning of 2007 and there will be in 2006 an RFP for one or two machines with a total power of several tens of Teraflops.
UH: What type of parallel jobs will run on the new machine?
JG:We run very large parallel applications, some could use the whole machine. But we also have to support standard calculations which are mainly using 32 to 128 processors today.
UH: With such a fast and huge machine you surely have major I/O requirements, can you comment on this?
JG:First, we are not only using gigantic computing power, we are also a gigantic data producer. On this point we have a very similar problem to the one that occurs with the LHC in Geneva. Each experiment, in our case a virtual experiment, produces enormous amounts of data. As any experiment, we follow its behaviour function of time. In our case we will use for example several thousands of time steps for which we will follow several tens of parameters defined on billion of meshes.
Today, with the TERA-1 machine we are producing more than 3 Terabytes of data per day, which is more than a petabyte per year. This will be multiplied by a factor 10 with TERA-10.
Second, a large simulation could run on the machine for several weeks. We must have a strong capacity of saving the computation every few time steps and restarting it. This dimension has an impact on the I/O system as we don't want more than 10% of the computation time to be used for I/Os. This specification leads to the gigantic bandwidth (100 gigabits/s) we demand on the file system.
UH: Thank you for these informations, Mr. Jean Gonnord, Programme Director for Numerical Simulation & Computer Sciences, CEA/DAM. |