Paderborn-Siemens-Scali: the right mix to develop a new European supercomputer

Paderborn 04 Feb 99 The new 192 processor parallel machine which was installed in December 1999 at PC2 in Paderborn, is the result of three parties with their own expertise, working closely together: the Paderborn PC2 centre with its expertise in parallel architectures and programming, Scali with its hardware integration expertise and Siemens with its marketing and support knowledge. Alexander Reinefeld and Jens Simon explained in an exclusive interview to Primeur, why this works so well.

Alexander Reinefeld, was as director of the Paderborn Center fro Parallel Computing (PC2) responsible for the R&D programme that eventually resulted in this first Northern-Europe build supercomputer built since several years. (He has now moved to Berlin, to take up the position of managing director of ZIB, one of the largest German supercomputer centres.)

Jens Simon was responsible for the technical activities concerning the new 192 processor machine and its predecessor, and being involved in developing a new mpp supercomputer, problems there were indeed. Fortunately they all ended up being solvable

When developing plans for a new parallel machine, PC2 wanted to collaborate with a computer company. This way they would assure that their R&D would be put to good use afterwards. Being in Paderborn, it is natural to turn to Siemens. This companies has large offices and factories in Paderborn, as a result of the merger with Nixdorf a few years ago.

Siemens are good in marketing and in product support. This helped with outlining the potential usage and markets for a new machine and also put some constraints on the future support. Siemens has a computer line of its own, the Celsius workstations and the Primergy servers, so it made sense to base a new development fit into that. Siemens also has a large customer base.

Having that settled, PC2 and Siemens looked for a company that could really built the machine. Hardware system integration is not that difficult in principle: you take some processor chips, some interconnect chips some network cards, put it in one box and off you go! In practice, the story is rather different. Because the machine is so big and has to be very fast, everything is stressed to the limits. Instruction from the processor are used that you do not normally need. the network cards and processor boards have to work at exactly the same speed. The basic system controlling software should work flawless, because if something is wrong, a large machine does not produce one error, but millions and millions of the each second!

At the hearth of a parallel machine is the network interconnect. Jens Simon explained that for Paderborn, SCI was the right solution: it is fast and has enough parallel support to give building a large machine based on it, a chance. With SCI you can directly access memory of a remote node, which helps much in building a paralllel programming model for the machine. There is also an IEEE standard for SCI. According to Simon, this all sets SCI apart from competing technologies as Myrinet or ATM.

So they start with looking for a company who could do the job, based on Siemens processor hardware and SCI network interconnect. In 1997 they learned about Scali, a start-up Norwegian company who built small parallel systems with, at that time, 8 nodes based on Sun Hyper Sparc multi processor machines. This was just the right company for the project: they new the problems of building parallel hardware, and as it happened to be, were also looking for possibilities to integrate Intel processors, like the ones Siemens is using, into their architecture.

Scali, also practically neighboured Dolphin, a Norway originating company, producing SCI interconnect cards. For large parallel systems, a scalable topology of the communication network is needed. Therefore, Scali developed in cooperation with Dolphin an extension for the PCI-based SCI card which is used to build up a two dimensional torus. Here, the torus can be seen as a distributed switch.

Having the team complete, they set off building a first 64-processor machine. Intel based. This gave especially Scali a lot of problems in the beginning. There were hardware errors in communication part of the Intel chip. But this was solved after some time.

Then PC2 tested the new 64-processor machine with all they learned from parallel applications during their years of experience. They threw in all the benchmark of real programmes they had, with good results. MPI based programmes ran with 15 microseconds (µsec) latency and the communication bandwidth was 55 Mbyte/s per node. This compares well with some of the large mpp's that are already on the market. MPI is a standard for writing parallel programmes.

Time to move to the next stage: a 192 processor supercomputer: take almost 100 powerful multiprocessor "computers" and let them work as one. This mighty beast was built by Siemens and installed at PC2 last December 1998 in Paderborn. Siemens factories in Augsburg and Nuremberg delivered components, and the company worked together with Scali for the interconnect. Some of the work was done by subcontractor ICT in Aachen.

The new generation of Intel processor and system bus increases the computation performance by 50% and the available communication bandwidth to more than 80 Mbyte/s per node.

The new machine will run the Solaris Unix operating system - still a good vehicle when it comes to number crunching. There is work underway between the three partners and Dolphin to develop the SCI software layers also for the Linux OS. A lot of knowledge of Linux is located at the universities.

The small 64-processor machine which is jointly operated C-LAB and PC2 in Paderborn.


Ad Emmen