SX-5 exceeds 6 Gflop/s for Sparse Eigensolutions using 1 CPU

The Woodlands, 25 Feb 00 NEC's Advanced Technology Computing Center released benchmark results for its tuned version of BCSLIB-EXT 4 for the NEC SX Series supercomputers with an overall performance of 6 GFLOPS for problems of 1 million degrees of freedom (DOF) or larger, executed on a single processor.

The BCSLIB-EXT (Boeing EXTreme) Mathematical Library is a library of routines for solving large problems whose data volume exceeds the memory limitations of computers. One of the package's strengths lies in its ability to solve large sparse systems of linear equations and large sparse eigenvalue problems. The linear equation solver is based on the multifrontal algorithm, and the real symmetric generalized eigensolver is a block shift and invert Lanczos algorithm. Both make use of the NEC supplied BLAS library for fundamental matrix operations.

BCSLIB-EXT is useful as an indicator of application performance for Finite Element and Optimization codes as it is used in several commercial Finite Element packages. On the SX-5 Series CSA/NASTRAN and OPTISTRUCT make use of BCSLIB-EXT to solve static and eigenvalue problems. BCSLIB-EXT shows what can be expected of a Finite Element solver when executed on the SX-5 Series.

BCSLIB has been ported and tuned by NEC Systems, Inc. (NECSYS) for the SX-4 and SX-5 Series of parallel-vector computers. It is based on the highly optimized NEC BLAS library. The NEC optimized version of BCSLIB-EXT completed QA with no changes to the performance modifications made by NECSYS.

To address the problems with large amounts of I/O in the out of core solutions this implementation includes the NEC High Performance I/O (HPIO) Library. HPIO is a threaded, intelligent, word addressable cache for the NEC SX series. I/O functions are handled by separate threads allowing computation to proceed at a much faster rate. This is especially important for the Lanzcos eigensolution, where I/O demands can exceed 5 TB for a 1 Million degree of freedom system. HPIO allows near in-core real time performance with less additional memory than an in-core solution. In tests with some ISV code a small HPIO cache reduced elapsed time by 50% over the out-of-core solution. The out-of-core run with HPIO took only half the memory of the same job run in-core, giving it an advantage on machines with high load or less memory.

To date all of the tuning work for the SX-Series has been conducted on the SX-4. The performance figures below are for SX-4 and SX-5. No effort has been made to tune specifically for the SX-5. SX-5 single processor performance ranges from 400 MFLOPS for very small problems (6000 DOF) to 6000 MFLOPS for larger problems (5,000,000) DOF. These are the average performance numbers as reported by the hardware performance monitor. The performance increases significantly as the vector length of the problem increases. The best performance is obtained during the factorization phase of the problem, where most of the work is done. The table below gives peak performance numbers in MFLOPS for 2D, Low Aspect Ratio 3D, and 3D problems in the BCSLIB-EXT test suite. It is important to note that while the solver performance is much lower than the factorization performance, it only represents a small fraction of the total solution time and the number of floating point operations in the solver is limited. The figure of 7.4GF for the factor operation represents 94% of peak performance on the SX-5, a computational rate that is only possible with the uniquely high memory bandwidth of the SX Series systems.

Table 1. Single Processor Sparse Direct Solver Performance
ProblemDOFFactorSolveOverall
2D5,000,0006615 MF1711 MF2714 MF
Low Aspect 3D1,000,0007183 MF2502 MF5511 MF
3D1,000,0007351 MF2846 MF5681 MF

Table 2. Single Processor Lanzcos Eigensolver Performance
(100 Modes)
ProblemDOFFactorSolveMultiplyOverall
2D5,000,0006665 MF1650 MF2184 MF3155 MF
Low Aspect 3D1,000,0007371 MF2496 MF2496 MF5736 MF
3D1,000,0007431 MF2844 MF2428 MF6074 MF

 


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]