Hardware SR8000 F1
Compared to the actual SR8000 with 8 GFlop/s peak performance per node, the F1 delivers 12 GFlop/s - a 50% increase, 1.5 GFlop/s per processor. A node contains physically 9 processors although only 8 are used by the compiler for automatic parallelisation.. With its 112 nodes the system has a peak performance of 1.344 GFlop/s which makes it to the first TFlop computer in Europe. In 2002, 2 TFlop/s using 168 nodes are scheduled.
A node has 8 GByte memory, about 6.5 GByte can be used by a user. Four nodes of the whole system have 16 Gbyte each. This sums up to .928 TByte. In the last stage 1.344 TByte are available with the same node constraints. The node contains Hitachi proprietary RISC processors, which have been developed by the company itself.
The disk space sums up to 7.4 TByte (5.3 TByte user) in the first and then to 10 TByte (7.1 TB user).
As the SR8000 is a homogeneous system, and has the same architecture in both phases, no program modifications are necessary in phase 2. Reliability is an other important issue, as Hitachi officials mentioned during the contract ceremony. The target is 180 000 hours mean time between failure per node - this means 20 years/failure. For the total system this means about 2 months. The mean time to repair is targeted as one hour for node replacement.
The nodes are connected via a three-dimensional crossbar with a bandwidth of 1 Gbyte/s bidirectional between nodes and a latency of 19 microseconds.
Programming models
The innovative architecture of SR8000 allows the usage as a vector processor and as the scalar SMP-cluster programming model within one machine. A vectorisable operation is distributed on the floating point unit of the 8 processors within a node. This is called COMPAS, Cooperative Micro Processors in single Address Space, by Hitachi. The compiler distributes the data, the synchronisation is realised by hardware. An other feature is PVP, Pseudo Vector Processing. By prefetching of data and storing it into the cache that is needed in a loop, the access time is reduced.
Software and tools
The operating system is POSIX.1003.2-based Unix, HI-UX/MPP. Compilers are C, Fortran 77, Fortran 90, C++ (as a precompiler to C, the native compiler will be available 1.Q 2002). TotalView is the debugger.
Tools and libraries for parallel programming are OpenMP 1.0 supported by Fortran and C++. MPI 2, fully implementation, parallel I/O, dynamic process generation between nodes as well as intra-node (MPP), PVM 3.3.10, High-Performance Fortran Vers. 2.0 and Linda, mid 2000.
A very important tool for the profiling of MPI-programs, VAMPIR by Pallas GmbH in Bruehl, is scheduled for 3. Q 2000. Furtheron Hitachi will deliver for SR8000 F1 optimised versions of BLAS, LAPACK, ScaLAPACK und NAG, the Hitachi proprietary library MATRIX/MPP, subroutines for linear algebra, fast Fourier Transforms and random numbers. Huge sparse matrices will be supported.
A two node system for training purposes will be delivered this year.