In June 2002 Sandia Labs chose Cray as a multi-year collaboration partner to develop a massively parallel supercomputer in the capability class. It should replace the Intel ASCI Red system, which is based on Intel processors. Red Storm with its Opteron should be connected via a high-speed 3D-torus network developed by Cray. The partners could spend about 93 million US$. Since September 2004 Cray installs the cabinets, 140 with 11.646 processors and 10 TByte distributed memory. Additionally there are 240 TeraByte disk space. As discussed at this year's ISC2004 in Heidelberg, Sandia expects a highly scalable system with a high application performance.
Further customers are Oak Ridge National Lab with an XT3 with 20 TeraFlop/s and a vector machine Cray X1E - the successor of the X1 with a higher performance - with the same peak performance. It will be installed in 2005.
In 2006 ORNL expects to move to a 100 TeraFlop/s machine and in 2007 expand to 250 TeraFlop/s. The application performance of the latter system will be in the range of 100 TeraFlop/s.
The third customer is the Pittsburgh Supercomputer Center with a 10 TeraFlop/s XT3 for research within the TeraGrid.
On October 25 Cray adds the XT3 to its portfolio. It extends the entry level XD1 to a capability supercomputer in the upper segment. The list price starts at about one million US$. The third will be the Cray X1E an improved vector supercomputer which will be announced later.
The XT3 can be seen as the successor of the massively-parallel computers T3D and T3E which are based on off-the-shelf processors - the Digital/Compaq Alphas. They have been efficiently used at a lot of supercomputer centres in academia and research in Germany.
All XT3 processors are based on the actual AMD Opteron processor. Here Cray is prepared to exchange the single core processors to the coming dual-core Opterons. Because of space requirements four Opterons are on one blade. The Opteron has a peak performance of 4.8 GigaFlop/s, the memory can be extended up to 8 GByte. In addition to the compute nodes the XT3 has service processing elements which can be configured as I/O, login, network or system node.
The operating system is Cray's own Unicos/lc which was developed for complex applications and scales up to 30.000 processors. It consists of two components, a microkernel for the compute nodes and a complete operating system on the service nodes. On the service nodes there runs a complete Linux-kernel from SuSE. The global file system is based on the open system Lustre. The I/O system scales up to a bandwidth of 100 GB/s. Red Storm dumps it 10 TB memory within less than 2 minutes to disk.
The main differentiator to the competition is the Cray-specific network with a high bandwidth and a low latency. Each Opteron is directly connected to the XT3 interconnect via the SeaStar routing and communication chip. The SeaStar router connects the six neighbouring nodes in a 3D-Torus topology. The bidirectional peak bandwidth of a link is 7.6 GB/s, the sustained up to 4 GB/s.
The Cray RAS (Reliability, Availability, Serviceability) and Management System (CRMS) is independent from the XT3 and owns its own processors, software and network to display the main hardware and software components and to manage them. It allows recovery of hardware and software failures for example. Cray will assure a meantime between failure of 400 hours for 1000 processors.