Contents of
TOP500 Issue


Developments in Japan

Munich,18-11-1996 The supercomputer situation in Japan is characterized by the installation of many supercomputers of the new CMOS based generation. The market distribution in Japan became somewhat more balanced compared to 1995 when Fujitsu was leading far ahead. In terms of performance Hitachi (774.5 GFlop/s) now became second behind Fujitsu (910.5 GFlop/s) while in terms of number of sites NEC (15) is second behind Fujitsu (21). In summary, Japan strengthened its position world-wide as the second largest user of supercomputers. The three most powerful systems of the world are installed in Japan!

  The supercomputer situation in Japan is characterized by the installation of many supercomputers of the new CMOS based generation. Fujitsu entered the list with new VPP300 and VPP700 installations, while NEC continued to install more SX-4 systems. Finally, Hitachi succeeded to deliver the currently most powerful system in the world to the University of Tsukuba with the CP-PACS computer. Following the former leader NWT, again a system is leading the list that has been developed in a collaboration between computer industry and a research institute. With these new systems Fujitsu, Hitachi and NEC increased their competitiveness by advanced CMOS technology together with an attractive price/performance ratio. This also resulted in the success of winning several procurements outside of Japan.

The market distribution in Japan became somewhat more balanced compared to 1995 when Fujitsu was leading far ahead. In terms of performance Hitachi (774.5 GFlop/s) now became second behind Fujitsu (910.5 GFlop/s) while in terms of number of sites NEC (15) is second behind Fujitsu (21). IBM also progessed well increasing its number of sites from 8 to 13. Cray kept its 10 sites but lost part of its performance share relative to the other vendors. SGI's success in Japan is again not visible in the TOP500. Only 3 Japanese sites entered the list. Most SGI systems are smaller in size.

Japan increased its share of the TOP500 sites from 73 to 80 entries which corresponds to 16%. Traditionally, the Japanese supercomputer sites are in average more powerful than the sites in other countries. Consequently, 20 Japanese sites are listed in the world-wide TOP50. Japan's share in the world-wide installed Rmax GFlop/s capacity increased even more. The aggregate performance of the Japanese TOP500 sites almost doubled from 1,234 GFlop/s to 2,508 GFlop/s which corresponds to 31.4%.

In summary, Japan strengthened its position world-wide as the second largest user of supercomputers. The three most powerful systems of the world are installed in Japan!

Background

In the past years the Japanese supercomputer market [1, 2] was dominated by mono and multiprocessor vector computers manufactured by the big Japanese vendors Fujitsu, Hitachi and NEC. Fujitsu and Hitachi started in the early 80's to deliver the first vector computers in Japan. NEC joined them few years later. The companies improved steadily the performance of their monoprocessors before delivering multiprocessor systems end of the 80's. In the early 90's all three vendors again improved the performance and scalability of their systems while investigating new architectures in collaborations with research institutes or laboratories. Fujitsu together with the National Aerospace Lab developed the Numerical Wind Tunnel (NWT) as the prototype of the VPP series. Hitachi together with the University of Tsukuba developed the CP-PACS system which can be seen as a prototype of the SR2201.

The acceptance of MPP systems started slowly in Japan. Some customers bought various MPP systems from American vendors. This could be considered as an evaluation phase. The acceptance of systems with distributed memory started to grow after the NWT demonstrated unprecedented performance while maintaining the vector `culture'. The success of the Hitachi MPP system will also contribute to the broader use of this architecture.

All three vendors are also marketing their systems outside of Japan with remarkable success. The biggest success for Fujitsu was the contract with the ECMWF in Reading to deliver a VPP700 system. Hitachi got a first contract for its SR2201 system from Cambridge University, UK. NEC won several major SX-4 contracts in Europe, but the public attention was drawn on the NCAR project in the USA.

The CP-PACS Project

The CP-PACS project [3, 4] formally started in 1992 with a funding of 1.5 billion Yen spread over a five-year period. The project name CP-PACS stands for Computational Physics - Parallel Array Computer System. The name was chosen to reflect the two phases of the project - development of a massively parallel computer optimized for physics problems describable in terms of space-time fields, and subsequent research with it in several key areas of computational physics with primary emphasis on lattice QCD. With the start of the project the Center for Computational Physics was founded at the University of Tsukuba in order to serve as a base for a collaborative effort between physicists and computer scientists for the development of the CP-PACS computer and its utilization for research in computational physics.

Through a formal bidding process in summer 1992, Hitachi Ltd. was selected for the manufacturing of the CP- PACS computer. Since then, the Center for Computational Physics and Hitachi Ltd. have been working in a close collaboration both on the hardware and software development of the CP-PACS computer. The fundamental design of the computer was laid down in 1992, its details worked out in 1993, and the logical design and the physical packaging design was completed in 1994. Chip fabrication and assembling of parts started in early 1995, resulting in the completion of the CP-PACS with 1024 processors and a peak speed of more than 300 GFlop/s in March 1996. In fall 1996 the configuration of CP-PACS has been doubled to 2048 processors, 128 GB memory and more than 600 GFlop/s.

The CP-PACS computer is an MIMD system with distributed memory. Each processor has a performance of 300 MFlop/s. The design of the processor is based on the HP PA-RISC 1.1 architecture. To achieve a better efficiency for applications that intensively perform vector operations, the PVP-SW feature has been added to the processor design. PVP-SW stands for ``pseudo vector processor based on slide-windowed registers''. Each processor is equipped with 128 physical floating-point registers, while the logical registers are split into g global registers and 32-g local registers. These local registers can slide by means of a window along the physical registers. While carrying out computations using the registers of a specific window position, the processor can issue preload instructions which fetch data from memory to registers in any forward window, and issue poststore instructions which store data in any previous window to memory. With a proper selection of the windows for calculations and memory load/store one can achieve that data already reside in registers when the window is shifted to the specific position for calculations, thereby effectively reducing the memory latency.

Other important characteristics of the processor are the clock frequency of 150 MHz, a first level cache containing 16 KB of instructions and 16 KB of data, and a second level cache with a capacity of 2x512 KB. Each processor is connected to a local memory with a capacity of 64 MB of DRAM which is pipelined with multiple interleaved memory banks.

The processors are connected via a 3 dimensional crossbar network. A number of crossbar switches are placed in the x, y and z direction. The crossbars for different directions are connected at each crossing point by a router which is a 4x4 crossbar itself. A maximum configuration with 2048 processors is arranged in a three- dimensional 8x16x16 array. Together with the connection of the IOUs (Input Output Units) the crossbar network has the size of 8x17x16. The bandwidth via the crossbar network is 300 MB/s with a latency of 3 *sec.

On each processor runs a UNIX micro kernel. The CP-PACS computer is controlled by a front-end computer that also schedules the jobs and acts as a file server. The programming languages of the CP-PACS are Fortran, C and assembly language.

The highest LINPACK performance reported so far has been mearured on the CP-PACS/2048 system with Rmax=368.2 GFlop/s. This performance was achieved by solving a system of 103,680 linear equations. Half the performance could be achieved for a system of 30,720 equations.

Current commercial offerings

The three Japanese supercomputer vendors have decided for different architectures to increase scalability.

Fujitsu is continuing with the VPP architecture that was for the first time implemented in the NWT. The current offering ranges from the departmental system VX with up to 4 processors over the VPP300 system with up to 16 processors up to the high-end VPP700 system with up to 256 processors. All systems are based on CMOS technology and use the same processing element (PE) with a peak performance of 2.2 GFlop/s. The PEs are connected via a crossbar network and have their own SDRAM memory each.

NEC continues to build traditional PVP systems in its SX-4 system also based on CMOS technology. One node can have up to 32 vector processors with a peak performance of 2 GFlop/s each connected to a shared fast SSRAM memory. Bigger configurations are planned by coupling several nodes together. NEC is also offering compact models with a limited number of processors for departmental use.

Hitachi decided for a typical MPP design. Microprocessors based on the PA-RISC design enhanced by ``pseudo vector'' processing capabilities are connected via a 3-dimensional crossbar network. This architecture was tested in a joint project with Tsukuba University. In the CP-PACS system that is used for QCD calculations 2048 processors are coupled together setting a new performance record. The biggest commercial system with that architecture is the SR2201 with 1024 processors at Tokyo University. Hitachi also started marketing the SR2201 series outside of Japan with a first sale in the UK. Little is known whether Hitachi will continue also their PVP line S-3800. If the market acceptance of the SR2201 is big enough, in particular for traditional vector computer users, then Hitachi may concentrate on the SR architecture only.

Procurements

In the fiscal year '95 (ending March '96) several systems of the new CMOS generation have been ordered in the government market. The contract for a Hitachi SR2201 system at Tokyo University drew a lot of attention since this was the first time that one of the computing centers of the 7 major universities in Japan decided to replace a classical main frame computer by an MPP system. The biggest variety of systems can be seen at the Japan Atomic Energy Research Institute (JAERI). They procured the following systems: Cray T90, Fujitsu VPP300, Hitachi SR2201, IBM SP2, Intel XP and NEC SX-4. Fujitsu is acting as the system integrator for these systems. Further SX-4 systems have been procured by the National Research Institute for Metals, the Japan Marine Science and Technology Center, the National Cardiovascular Center and the Geographical Survey Institute. The National Astronomical Observatory of Japan ordered one of the first VPP300 systems together with several departmental VX systems. Another VPP300 system was ordered by the Power Reactor and Nuclear Fuel Development Corporation.

Several decisions have already been made in procurements of the fiscal year '96. Kyushu University ordered and installed a Fujitsu VPP700 system. The National Astronomical Observatory of Japan ordered a VPP700 together with a Fujitsu AP3000 - an MPP system based on UltraSparc processors. These systems will be installed in 1997. A VPP300 system has been ordered by the Japan Science and Technology Corporation. Osaka University installed a NEC SX-4 complex. NEC also won contracts from the National Aerospace Laboratory and the National Institute for Environmental Study. Cray Research obtained contracts from Kyoto University, the National Research Institute for Earth Science and Disaster Prevention, and the Real World Computing Partnership. Details have not been disclosed yet. This outlook shows that due to the competitive supercomputer market a variety of systems are procured from different vendors. The fierce competition reduces the traditional loyality of customers to their traditional computer supplier.

Current market situation

80 supercomputers in Japan entered the TOP500 list. This represents a 16% share of the 500 entries - an increase from 73 systems one year ago. The accumalated Rmax performance of these 80 systems reaches 2.5 TFlop/s which represents 31.4% of the accumulated Rmax performance of the TOP500. These figures show that in particular the big Japanese supercomputer installations have in average a significantly higher performance than sites in other countries. In table 1 the distribution of the number of systems and the accumulated Rmax performance are listed for different vendors.

VendorSitesRmax (GFlop/s)
Convex: 1 4.80
Cray: 10 124.20
Fujitsu: 21 910.46
Hitachi: 12 774.50
IBM: 13 115.08
Intel: 3 121.10
NEC: 15 426.75
Parsytec: 1 5.25
SGI: 3 17.96
TMC: 1 7.70
Total: 80 2,507.79

Table 1:  Distribution of systems to different vendors.

The market leader in Japan is still Fujitsu with 26.3% of the number of sites and 36.3% of the accumulated Rmax performance of the TOP500 sites in Japan. But the lead over the following vendors Hitachi and NEC has reduced significantly. Hitachi made the biggest step ahead in accumulated Rmax performance pushing from 157.7 to 774.5 GFlop/s reaching 30.9% of the Japanese market and a solid number 2 position. This is essentially due to the two big sites at Tsukuba and Tokyo. These two systems represent already 588.6 GFlop/s, i.e. 24.9%.

Now we want to look at the distribution into MPP, PVP and SMP systems (see table 2). Last year we discussed whether the VPP500 should be considered as MPP as this system dominated last years Japanese list. This year MPP systems from several vendors dominate the list. In particular, Fujitsu and Hitachi systems in terms of performance and IBM systems in terms of number of sites contribute to the success of MPP. 47 of the 80 systems, i.e. 58.8%, can be counted as MPP. These systems account for 1,813.4 GFlop/s, i.e. 72.3% of the performance of the Japanese TOP500 sites. The traditional PVP systems reduced their share. 30 systems, i.e. 38%, account for 676.4 GFlop/s. New PVP systems came only from CRAY and NEC. Fujitsu disappeared from the PVP camp concentrating on their VPP series. Hitachi also moved to the MPP camp with the SR series. The only SMP vendor in the Japanese list is SGI contributing 3 systems and 18 GFlop/s to the list.

Type: Sites Rmax (GFlop/s)
MPP 47 1,813.4
PVP 30 676.4
SMP 3 18.0
Total: 80 2,507.8

Table 2:  Distribution of systems to different architectures.

What is the reason for the success of MPP in Japan as the Japanese customers have been in favour of traditional vector (PVP) systems for so many years? Fujitsu and Hitachi included vector features in their systems with distributed memory. The Fujitsu VPP series consists of powerful classical vector processors while the Hitachi SR series and the CP-PACS system contain processors based on the PA-RISC design enhanced with ``pseudo vector processing''. It is obvious, that keeping the benefits of using vector features convinced many customers and end-users to move from PVP to `vector'-MPP. These users can incrementally parallelize their applications which had been vectorized in the past. On the VPP series we can clearly observe, that more parallelized applications are performed compared to one year ago when many VPP systems have been mainly used in throughput mode. The use of message-passing for parallelization has increased, although still many Japanese VPP users prefer the compiler directive based VPP-Fortran parallelization style.

Another interesting aspect is the distribution of the Japanese TOP500 systems into application areas (see table 3). 39 systems are installed at research laboratories and account for 1,105.8 GFlop/s. 28 systems are installed in the academic sector at universities and account for 1,171.1 GFlop/s. There is no classified system on the list. The vendors have reduced the number of their internal systems to 4 contributing 111.2 GFlop/s to the list.

Application area: Sites Rmax (GFlop/s)
Academic 28 1,171.1
Industry 9 119.7
Research 39 1,105.8
Vendor 4 111.2
Total: 80 2,507.8

Table 3:   Distribution of systems to different application areas.

The number of industry sites decreased from 11 to 9. Only 3 of these systems are new on the list. Toyota, a traditional industry user of supercomputers, installed a new NEC SX-4/20 system in addition to its older systems: NEC SX-3/14, CRAY T94 and Fujitsu VPP500/4. The second new industry entry comes from Nippon Telegraph and Telephone (NTT). They installed one of the first full blown CRAY T932 end of last year (shortly after last year's TOP500 deadline). The third new industry entry is a real breakthrough. Kirin Beer installed an IBM SP2/38. To the author's knowledge, this is the first supercomputer used in the food industry. What will be the purpose of that system? Will they try to improve the taste of beer by ``molecular modelling'' methods? Or do they want to simulate the impact of drinking beer to Japanese business men? The reality is that the system will be used for data warehouse applications with parallel DB2.

The other industry systems in Japan have been installed in former years. Nuclear Power Engineering continues to use an IBM SP2/72. Suzuki Motor still uses the 4 year old Hitachi S-3800 system. Mitsubishi Electric Corporation is continuing to use one of the few CRAY T3D systems in Japan. Does that mean that the Japanese industry is investing less money in supercomputing? This is most likely a misinterpretation. We know of a lot of SGI SMP systems, smaller IBM SP systems and departmental vector systems based on CMOS technology from Fujitsu and NEC. Supercomputing technology can now be afforded by industry departments. These departments make use of that technology at a performance level below the entry level for the TOP500.

The Japanese TOP20

This year Hitachi took over the number one position. After several years when Fujitsu's NWT was leading the list, again a kind of prototype system is the number one. The CP-PACS with 2048 processors at the Center for Computational Physics at Tsukuba University has set a new record with Rmax equal to 368.2 GFlop/s. This system is very similar to the Hitachi SR2201 series. Tsukuba University and Hitachi developed that system in a joint collaboration between 1992 and 1996 specifically for QCD applications. A commercial version of this system - the SR2201 with 1024 processors - is installed at Tokyo University ranging on position 3. Second of the list is now the NWT which has been upgraded from 140 to 167 processors. On rank 4 we find the first system on the TOP20 that was manufactured in the US - an Intel XP/S-MP 125 with 2502 processors installed at the Japan Atomic Energy Research Institute (JAERI). On position 6 we see today's biggest VPP700 installation. Fujitsu installed a system with 56 processing elements at Kyushu University. The positions 5, 10 to 15 and 19 are occupied by Fujitsu VPP500 systems that have been installed in former years. A Fujitsu VPP300/16 at JAERI is listed on rank 20. On positions 7 to 9 three NEC SX-4/32 systems are listed. Beside the benchmarking system two new systems have been installed at Osaka University. Probably, these two systems will be combined later to a bigger complex when the necessary hardware and software support will be available. On rank 16 to 18 we find three NEC SX-4/20 systems.

In total we find 11 Fujitsu VPP systems, 6 NEC SX-4 systems, 2 Hitachi SR systems and 1 Intel XP system in the TOP20. The continuing effort in improving the supercomputer capacity in Japan can also be seen in the fact that 10 of the TOP20 systems in Japan have been installed or upgraded within the last 12 months.

Conclusions

The Japanese supercomputer manufacturers succeeded in bringing their new CMOS based supercomputer generation to the market. However, Fujitsu, Hitachi and NEC decided for different ways to lead their customers to highly scalable systems. While Fujitsu decided for powerful vector processors combined with distributed memory and a crossbar network, NEC continued in the PVP style with shared memory. Hitachi, on the other side, decided for an MPP system based on a RISC processor enhanced by ``pseudo vector'' capabilities. Thus, all three vendors continue to offer in some way vector processing. A Japanese customer can therefore easily select between three different architectures whatever fits best to his application. This combination of continuity and innovation is for sure very attractive not only for Japanese customers but has already shown success in the world market.

References

1
Jarp, S. and Bez, W., Supercomputing in Japan, Supercomputing 60/61, volume XI, number 2/3, June 1995, pp. 31-44.

2
Hoffmann, G. and Schnepf, E., Developments in Japan, Supercomputer 63, volume XII, number 1, January 1996, pp. 23-29.

3
Iwasaki, Y., Status of the CP-PACS Project, presented at Lattice 96, St. Louis, USA.

4
Ukawa, A., Status of the CP-PACS Project, presented at Lattice 94


Eric Schnepf
Siemens Nixdorf, Scientific Computing
Otto-Hahn-Ring 6
D-81739 Munich
Germany
Email: Eric.Schnepf@mch.sni.de

Top of Article

© The HOISe-NM Consortium 1996