Computer Centre of Karlsruhe University receives IBM's 425 000 euro SUR-Award

Munich 13 November 2000 Professor Wilfried Juling and the Computer Centre of Karlsruhe University received IBM's SUR-award (Shared University Research) in Karlsruhe on October 4th for the extraordinary research in the field of high-performance computing. Primeur interviewed Prof. Juling, asking for the reasons and what he has done with this award. Furthermore he described the new machines and what will be done with the new computer power.

Primeur: Dear Professor Juling, congratulations to you and your team in the Karlsruhe Supercomputer Centre for this award. I have heard this was for the first time that an institution outside America received it. What have been the reasons? Prof. Juling: Yes, this is right, we are the first institute outside the USA that got this award. To explain the reason why, I have to go back to the days when we installed our good 㯬d" IBM SP. The Computer Centre was one of the big centres with a 256 processor SP and we decided to run the machine in a DCE (Distributed Computing Environment) environment. This was a hard job and risky to do so, as such a big machine with a corrsponding production had not been integrated into DCE before. DCE became the base for the development of distributed applications using resources on distributed computing systems. It is a software package from the Open Group. Very important is DFS (Distributed File System), a global file system with great advantages concerning reliability, security, data protection and performance compared to other, more traditional file systems.

Thus, in close co-operation with the local staff of IBM, we implemented DCE/DFS support for the batch system LoadLeveler, the back-up/archiving system ADSM and the parallel execution environment. It has been reliable for more than two years of SP operation. We did a good job and this built up our good reputation within IBM. Other problems in the beginning were solved by specific software levels, as the switch or nodes broke. This led to extremely close contacts to IBM's Thomas Watson Research Lab. One effect of this good cooperation was that some computer centre employees had been involved in a so-called 㒥d Book", in which IBM describes new computer systems. This award is combined with a research co-operation with the IBM T. J. Watson Research Center in New York - for us it is especially the Advanced Computing Technology Center (ACTC). Participants are the Karlsruhe Computer Centre, several institutes of the Karlsruhe University and some well-known independent software vendors.

Primeur: Now you received about 850 000 DM (425 000 euro), what are you doing with this money, Prof. Juling ? Prof. Juling: This money is not for me or my employees, but for the extension of our existing supercomputer. Additionally we got money by the Federal Republic and the State of Baden-Wuerttemberg according to the "Hochschulbaufoerderungsgesetz", a government grant for the further development of universities allowed by the DFG (German Research Society). In total we now have the threefold performance. We installed IBM's WinterHawk thin and high nodes, 48 SMP (symmetric multiprocessor) Thin nodes with Power 3-II (375 MHz), 2 processors per node, 8 MB L2-cache, 2 GB memory and a performance of 3 GFlop/s per node. This sums up to a total of 96 GB memory and 1.7 TB local disk storage. We got 4 High nodes with the same processor type but 8 processors per SMP and 8 GB memory, 12 GFlop/s peak performance per node. The Power 3-II processor can perform 4 floating point operations per clock. Thus the performance of the new SP sums up to 192 GFlop/s and a total of 128 GB memory. The 㯬d" IBM SP, installed in 1997, with its 256 processors aggregates up to 105 GFlop/s peak and 120 GB memory.

Primeur: How do you integrate these three systems? Prof. Juling: The additional 52 nodes are installed as a separate machine. This is due to the switch of the existing computer. The integration of further nodes would have caused an additional switch level with additional high costs. Both computers are integrated in an uniform operation environment. Thus the changing from one system to another is possible. As we use the batch environment LoadLeveler, jobs can be sent to one of the systems without any problem.

Primeur: Which are your new research topics in general and specifically in the co-operation with IBM? Prof. Juling: We clearly have raised our performance by a factor of three. We needed a platform for SMP-parallelised programs and for programs with high memory requirements. SMP-parallelisation has a new quality and is easier to program than massively parallelism. This can be done by directives, which are standardised in OpenMP. Additionally we are looking for the combination of OpenMPI and MPI in real applications. Last but not least we want to gain experience in the usage of SMP systems with lots of nodes in a parallel production environment - applications and administration.

We are very proud that we co-operate with ACTC at IBM's T. J. Watson research Center in a project named "Multilevel Parallelisation for the Power4". As Erwin Staudt, General Manager IBM Germany, mentioned, we are now involved in the development of the coming Power4 Processor and perhaps we have more knowledge than the IBM employees in Germany. Together we will develop programming techniques for coming computer architectures. Additionally some institutes of the university as fluid machines and machine construction as well as some software vendors participate, as they want to use new technologies in a very early stage to port their software efficiently.

IBM will deliver simulation software to study the behaviour of programs on the coming Power4 systems. These processors have a higher clock frequency - GigaHz processor - and a higher inherent parallelism. A chip contains two processor kernels, four chips are integrated into a module. Thus a module delivers 32 GFlop/s with a size of 11 cm x 11 cm.

We will study programming techniques and the adaptation of programs for these new computer structures. Instead of starting the adaptation when the processors are available, the partners and we can start today to use the modern IBM Power4 technology efficiently.

For us it was a big surprise to be integrated into the Power4 development group, thus we have direct contact to the developers in America.

Primeur: You are a partner in the hww GmbH (High-performance computers for research and industry operation company), the other partners are Stuttgart University and the State of Baden-Wuerttemberg - in total 50%, debis Systemhaus (40%) and Porsche AG (10%). Perhaps you can comment on your activities and the usage of your IBM SP.

Prof. Juling: I have some figures from April to September of this year. The University of Karlsruhe uses 40% of the SP, the researchers from Baden-Wuerttemberg about 30% and researchers from all over Germany - outside Baden-Wuerttemberg - needed 30%. These figures demonstrate that we are an active partner in hww. We have an interesting mixture of users on our SP:
Computational Fluid Dynamics: 26%
Informatics/Mathematics: 24%
Physics: 20%
Chemistry: 17%
Bioinformatics: 7%
Mechanics: 4%
Other: 2%

The parallel computer IBM SP is really used as a parallel machine. We have counted the distribution of parallel jobs:
More than 100 nodes: 24%
50 to 99 nodes: 8%
25 to 49 nodes : 36%
10 to 24 nodes: 19%
Up to 10 nodes: 4%
Serial: 9%
Our machine has been very stable since several years, about 100% availability. Over the whole period we have had a CPU usage of about 90%.

Primeur: What is going on in Baden-Wuerttemberg in high-performance computing? Prof. Juling: This is a first step, as the Minister-President of the State of Baden-Wuerttemberg, Erwin Teufel, wants to install the Europe-wide biggest and fastest supercomputer within the next two years. This will be the base for top research in very important areas.

Primeur: Thank you, Professor Juling, for this interview.


Uwe Harms

[News on Advanced IT][Calendar][Analysis][IT in Medicine]