News from ParaStation on hpcLine - the Myrinet interconnect

Karlsruhe 02 May 2001 The new partner in the interconnect field of Fujitsu Siemens Computers for the hpcLine is ParTec, a spin-off of University Karlsruhe, with its ParaStation based on Myrinet interconnect. Professor Walter Tichy, the father and member of the Department of Informatics University Karlsruhe, discussed the basic ideas and the resulting middleware.

Professor Tichy reported from a workshop on cluster computing at Fermilab and concluded that the Americans will heavily invest in clusters as they have a good price/performance and are composed from off-the-shelf parts.

Then he gave a short history about cluster computing and the reasons for clusters and Linux (free Operating System). He mentioned that a Pentium 4 (1.5 GHz) is available for 1700 DM (850 euro). In the Top500 list there are big SMP clusters, a 20 000 processor cluster is planned.

Professor Tichy listed important problems in the cluster arena:

  • getting nominal power into real applications
  • communication performance within cluster
  • storage performance (parallel file system, fast I/O)
  • simple system image
  • load balancing
  • robustness
  • easily programmed and managed
  • make everything above scaling

A Communication Subsystem was the target of ParaStation. In detail, Professor Tichy presented the communication performance issue. Kernel based communication (TCP/IP) is a bottleneck in high-speed networks (> 1 Gbit/s). A GHz processor executes several instructions per nanosecond. The communication latency in Ethernet is in the range of milliseconds, 40 microseconds Myrinet means several orders of magnitude. Thus, he proposed and realised an optimal communication subsystem - unprivileged communication without intervention of the operating system. The control register of the Myrinet card can directly access the application. As the Myrint link has about 1 bit error per day, it is not necessary to implement recovery overhead. Thus only in case of an error, an exception handling is started. Normally, a message is copied 5 to 7 times within the operating system, here only one copy of the message is on the card.

The user space communication has the following advantages:

  • no operating system
  • less copies
  • no protocol stack
  • optimised protocols
But one has a reduced protection and needs a coordination between scheduler and communication subsystem - co-scheduling.

Tichy presented some results, a communication via the kernel in a Gbit-Ethernet delivers 20 MByte/s, ParaStation with a 32 bit PCI bus 100 MByte/s comparable to the IBM SP2. Myrinet 64 bit PCI shows 140 MByte/s and with Myrinet 2000 and ParaStation 3 about 250 MByte/s.

Professor Tichy also cited the future research topics in clustering. Clusters can not be handled as easily as traditional supercomputers. But important problems are being addressed:

  • fast and reliable communication
  • single system view
  • load balancing
  • parallel file system
  • checkpoint/restart
  • parallelism tuning
  • object-oriented parallelism

http://www.par-tec.com


Uwe Harms

[News on Advanced IT][Calendar][Analysis][IT in Medicine]