NEC on top of STREAM memory bandwidth benchmark

The Woodlands 07 Jan 00 With the SX-5, NEC has taken the first position in the STREAM benchmark that measures sustainable memory bandwidth for computational operations on shared memory systems.

There are a number of important classes of applications where the data sets regularly exceed the ability of acceleration techniques to provide the memory response necessary to maintain processing speed. Examples of memory-intensive applications include those dominated by the processing of sparse matrix systems as well as many applications used in industrial environments.

Sparse systems typically do not fit in cache memories, and therefore are difficult to solve at high speeds because the cache becomes ineffective and processing therefore approximates the performance of main memory more closely than the performance of the CPU. The solutions speed is simply limited by the ability of memory to deliver operands. The STREAM benchmark registers this performance for a numbe of operations.

High memory bandwidth has traditionally been the province of shared memory parallel vector systems (PVP). PVP systems essentially replace cache with a high performance main memory. The result is that extremely large numeric systems, whether dense or sparse, regular or irregular, can be solved at very high computational rates. Further, PVP CPUs are matched to the performance of main memory and provide significantly higher sustained performance on these difficult classes of applications as compared to other architectures that are fully dependent on memory acceleration techniques.

The STREAM results for the NEC SX-5/16A reported by NEC are shown below. In keeping with the metrics used by STREAM, numbers reported are in MB ( 10^6) per second of sustainable memory bandwidth while processing specific kernels.

                                  
             Computational Operation  
                           
CPUs   Copy     Scale          Add       Triad      
SX-5/16A      
16      607492    590390    607412     583069                         
8       332551    332551    371160     366690                         
4       168486    168509    189555     189517                        
2       84853      84853     95352      95328                         
1       42545      42546     47780      47779     

The other "top10" machines in this benchmark category have the following performance.:

                                                 
                                   CPUs   Copy     Scale          Add       Triad 
Cray_T932_321024-3E                 32 310721.0 302182.0 359841.0 359270.0  
Cray_C90                            16 105497.0 104656.0 101736.0 103812.0  
Cray_Y-MP                           8  19291.6  19294.2  26588.9  26802.2   
SGI_Origin_2000-300               128  23846.0  23437.0  26365.0  26729.0  
SGI_Origin_2000_195               128  21857.6  23351.7  24459.5  22913.6  
NEC_SX3-44                          1  16941.0  15640.7  22436.5  21972.2  
Cray_J932                          32  19007.0  18944.1  19993.9  18870.4  
Cray_T94                            1  11341.0  10717.0  14783.0  13920.0 

Check in at the STREAM web site for more information: www.cs.virginia.edu/stream

 


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]