SX-5 exceeds 6 Gflop/s for Sparse Eigensolutions using 1 CPU
The Woodlands, 25 Feb 00 NEC's Advanced Technology Computing Center released benchmark results for its tuned version of BCSLIB-EXT 4 for the NEC SX Series supercomputers with an overall performance of 6 GFLOPS for problems of 1 million degrees of freedom (DOF) or larger, executed on a single processor.
The BCSLIB-EXT (Boeing EXTreme) Mathematical Library is a library of routines for solving large problems whose data volume exceeds the memory limitations of computers. One of the package's strengths lies in its ability to solve large sparse systems of linear equations and large sparse eigenvalue problems. The linear equation solver is based on the multifrontal algorithm, and the real symmetric generalized eigensolver is a block shift and invert Lanczos algorithm. Both make use of the NEC supplied BLAS library for fundamental matrix operations. BCSLIB-EXT is useful as an indicator of application performance for Finite Element and Optimization codes as it is used in several commercial Finite Element packages. On the SX-5 Series CSA/NASTRAN and OPTISTRUCT make use of BCSLIB-EXT to solve static and eigenvalue problems. BCSLIB-EXT shows what can be expected of a Finite Element solver when executed on the SX-5 Series. BCSLIB has been ported and tuned by NEC Systems, Inc. (NECSYS) for the SX-4 and SX-5 Series of parallel-vector computers. It is based on the highly optimized NEC BLAS library. The NEC optimized version of BCSLIB-EXT completed QA with no changes to the performance modifications made by NECSYS. To address the problems with large amounts of I/O in the out of core solutions this implementation includes the NEC High Performance I/O (HPIO) Library. HPIO is a threaded, intelligent, word addressable cache for the NEC SX series. I/O functions are handled by separate threads allowing computation to proceed at a much faster rate. This is especially important for the Lanzcos eigensolution, where I/O demands can exceed 5 TB for a 1 Million degree of freedom system. HPIO allows near in-core real time performance with less additional memory than an in-core solution. In tests with some ISV code a small HPIO cache reduced elapsed time by 50% over the out-of-core solution. The out-of-core run with HPIO took only half the memory of the same job run in-core, giving it an advantage on machines with high load or less memory. To date all of the tuning work for the SX-Series has been conducted on the SX-4. The performance figures below are for SX-4 and SX-5. No effort has been made to tune specifically for the SX-5. SX-5 single processor performance ranges from 400 MFLOPS for very small problems (6000 DOF) to 6000 MFLOPS for larger problems (5,000,000) DOF. These are the average performance numbers as reported by the hardware performance monitor. The performance increases significantly as the vector length of the problem increases. The best performance is obtained during the factorization phase of the problem, where most of the work is done. The table below gives peak performance numbers in MFLOPS for 2D, Low Aspect Ratio 3D, and 3D problems in the BCSLIB-EXT test suite. It is important to note that while the solver performance is much lower than the factorization performance, it only represents a small fraction of the total solution time and the number of floating point operations in the solver is limited. The figure of 7.4GF for the factor operation represents 94% of peak performance on the SX-5, a computational rate that is only possible with the uniquely high memory bandwidth of the SX Series systems. Table 1. Single Processor Sparse Direct Solver Performance | Problem | DOF | Factor | Solve | Overall | | 2D | 5,000,000 | 6615 MF | 1711 MF | 2714 MF | | Low Aspect 3D | 1,000,000 | 7183 MF | 2502 MF | 5511 MF | | 3D | 1,000,000 | 7351 MF | 2846 MF | 5681 MF | Table 2. Single Processor Lanzcos Eigensolver Performance (100 Modes) | Problem | DOF | Factor | Solve | Multiply | Overall | | 2D | 5,000,000 | 6665 MF | 1650 MF | 2184 MF | 3155 MF | | Low Aspect 3D | 1,000,000 | 7371 MF | 2496 MF | 2496 MF | 5736 MF | | 3D | 1,000,000 | 7431 MF | 2844 MF | 2428 MF | 6074 MF |
Ad Emmen
[News on Advanced IT]
[Calendar]
[Analysis]
[IT in Medicine]
|