Hitachi's pioneering hybrid approach to open new perspectives in high performance processing

Mannheim 09 jun 2000 Dr. Matthias Brehm from the Leibniz Computing Center in Munich offered the Supercomputer 2000 participants an overview of the features and qualities, provided by the centre's recently installed Hitachi SR8000-F1, a 112-node system with a peak performance of 1.3 Teraflops. This machine probably is the fastest computer in Europe at the moment. The innovative architectural concepts and the advanced configuration allow the Leibniz researchers to automatically pseudovectorise or parallelise typical applications, in order to produce well-performing code. Dr. Brehm assessed the trade-offs for the use and combination of the different levels of parallelism, such as pseudo- vectorisation, shared memory with threading, and distributed memory with message passing (MPI.

In 1995, plans were made to transfrom the Leibniz Rechenzentrum into a Supercomputing Centre. Private funding and a positive evaluation report made this dream come true when in 1998 the Bayerische offer was accepted officially. Since this year, the High Performance Computing Group at Leibniz Rechenzentrum in Munich has become operational and beautifully equipped with the Hitachi SR8000-F1, the "Speedy Gonzalez" among the supercomputers, based in Europe. By 2002, the Leibniz Supercomputing Division will be ready to reach its highest capacity.

The Hitachi SR8000-F1 nodes have a peak performance of 12 Gflops where each node is an 8-way RISC-based SMP. If one wants to utilise the full memory bandwidth to obtain a significant fraction of peak performance for the most memory intensive applications, the compilers support specific preload and prefetch optimisation strategies to pipeline the load and store operations, known as pseudo-vectorisation. Also perfectly operational, according to Dr. Brehm, is the automatic parallelisation across the 8 processors contained in every node, which is called COMPAS or COoperative MicroProcessors in single Address Space. The nodes of the Hitachi SR8000-F1 are connected by a conflict-free crossbar, which enables efficient communication via standard message-passing interfaces.

The Leibniz system represents the best of both worlds, combining the many advantages of two different architectures, vectorisation and parallisation. At the processor level, the machine is able to execute high performance processing via Pseudo Vector processing to minimise performance degradation for out of cache data. At the intra-node level, COMPAS allows to distribute loop iterations among the processors of a node, the so-called DO loop vectorisation, to offer performance similar to a vector machine for easy migration. Message Passing Interface enables intra-node parallisation, thus providing compatibility to MPP machines. At the inter-node level, large scale parallel processing is realised by applying MPI, PVM, and HPF.

Dr. Brehm also explained the behaviour of the two different embodiments of pseudo-vectorisation, which consist of the prefetching of data from consecutive areas or lines of the main memory in order to place them into the cache, and the preloading to directly load data from the main memory into floating-point registers without stalling subsequent instructions. The hybrid approach, offered by the SR8000-F1, unfortunately still has a few disadvantages. Dr. Brehm admitted that it is not so easy anymore to reuse code and to optimise the parameters for two or more programming paradigms. In addition, the MPI is not yet threaded and also enforces data locality. Equally required will be the introduction of data alignment for the Open/MP, in order to avoid false data sharing.

The SR8000-F1 system's numerous advantages though serve as an important counterweight. In this regard, Dr. Brehm mentioned the easy gain of performance by auto-parallelisation, COMPAS, and OPEN/MP directives; the great flexibility to write libraries; the possibility of eliminating redundant data and minimising MPI traffic to save memory and enhance cache bandwidth; the higher scaling potential; and the ease of debugging and analysing.


Leslie Versweyveld

[News on Advanced IT][Calendar][Analysis][IT in Medicine]