Cluster computing at SC '98

Rennes, 25 November 98 Interest in cluster computing is still growing. At the SC98 exhibition several clusters were on display in both the research and industry exhibits. One of the main trends this year is the adoption of standards in cluster programming. Interconnected PCs and workstations can be clustered through the use of several interconnection technologies (ATM, Gigabit Ethernet, SCI, Myrinet, Fibre Channel, HIPPI,..) Want to know more about the projects and products in cluster computing? Read the full article, that also provides many links to the sources.

One of the most interesting booths on cluster computing was certainly the one from the Japanese Real World Computing Partnership (RWCP). The Parallel and Distributed System Software Laboratory of RWCP has been contracted to develop the system software for several clusters based on SUN workstations, PCs and Compaq Alpha workstations. An impressive effort was made to develop software for these clusters. RWCP is now distributing freely what they call the "SCore Cluster System Software", which is a set of communication libraries, operating system layers, compilers and programming environments for SUN, PC, and Alpha based machines ( http://www.rwcp.or.jp/lab/pdslab/dist/ ).

Another impressive project was the SPADE cluster from the University of Sao Paulo in Brazil ( http://www.lsi.usp.br/spade ). Researchers from this university have designed a custom interconnection system with hardware support for remote memory access. Three prototypes of this interconnection system are being developed (PCI network adapter, ATM and SCI cards). Several software developments are in progress, such as the design of a lightweight communication library, cluster management system and distributed shared memory.

The Scalable Computing Laboratory at the Iowa State University demonstrated a switched Gigabit Ethernet Alpha cluster utilising a 55 Gbit/s shared memory communications fabric ( http://www.scl.ameslab.gov/scl/sclHome.html ). And last but not least, the University of Berkeley showed its cluster built within the Millennium project ( http://millennium.millennium.berkeley.edu/ ). A 500-node cluster is currently installed at the CS department of the University. This project is financially supported by Intel (6 M$). Several research projects are being carried out by seventeen UC Berkeley campus departments which are affiliated to this project. One interesting project is focusing on the analysis of the Virtual Interface (VI) architecture (see below).

At the industry exhibition, Compaq presented its Windows NT cluster ( http://www.windows.digital.com/Clusters/index.asp ) based on the ServerNet interconnection technology (see below) and Scali presented its SCI-based cluster ( http://www.scali.com ).

What is really new in Cluster computing this year ?

One of the main trends this year is the adoption of standards in cluster programming. PCs and workstations can be clustered through the use of several interconnection technologies (ATM, Gigabit Ethernet, SCI, Myrinet, Fibre Channel, HIPPI, ... ). Each of these technologies has its own set of communication API which hampers the development of portable software layers. However, several research projects have dealt with this problem and proposed implementations of portable communication layers (Fast Message, Active Message). Microsoft, Compaq and Intel are pushing the Virtual Interface (VI) architecture ( http://www.viarch.org ) for communication within a cluster.

The second standard is devoted to the programming of SMP (Symmetric Multi-Processing) computers. OpenMP ( http://www.openmp.org ) is mainly a set of directives for the Fortran-77 and C/C++ languages for the parallelisation of code. It relies on the use of a shared memory available on SMP. However, executing an OpenMP compliant code on a cluster is challenging in many ways. Clusters do not provide shared address space and they are loosely coupled. Therefore, research is needed in order to design the necessary software layers in order to execute OpenMP compliant codes efficiently.

VI is an emerging standard in the world of user-level networking (Active Message, Fast Message, U-Net). VI is a new network API that provides each user process with an access to the network hardware. By by-passing the operating system, performance in term of bandwidth and latency is increased. The availability of a standard API will certainly push cluster computing out of the research laboratories.

A first evaluation of this standard was presented at the conference by researchers from the Millennium project at the University of Berkeley ( http://www.cs.berkeley.edu/~philipb/via/ ). These researchers have developed a VI compatible software layer for the Myrinet hardware. They showed that a 24 µsec one way latency and a 425 Mbit/s have been achieved. Such level of performance is competitive with presently available fast communication architectures.

The Parallel and Distributed System Fujitsu Laboratory of RWCP is also investigating this technology by developing the COMET network adapter ( http://www/pds-flab.rwcp.or.jp ). One key feature of this adapter is its ability to run a communication protocol using a protocol engine (an 166 Mhz Alpha processor).

Researchers have developed several communication layers on top of VI such as IP and TCP/IP. Several VI compatible communication network adapters were shown at different booths. ServerNet-II, from Compaq, is VI compatible ( http://www.servernet.com ). Just before SC'98, Compaq announced a world record for sorting one terabyte of data using a 72-node VI-based cluster machine located at the Sandia National Lab. Another product, from Finisar, is also VI compatible based on Fibre Channel ( http://www.finisar.com ).

Despite an increasing number of VI network adapters, this technology is not yet widely available outside the Microsoft world. Most of these VI products are only available for NT. However, one can imagine that this will change in the near future since it will allow the development of portable operating system layers and middleware for high-performance cluster computing.

Despite that OpenMP is not targeted to clusters, several researchers are now trying to design a specific runtime environment in order to execute an OpenMP compliant code on a cluster. During the conference, researchers from Rice University showed that a Distributed Shared Memory, like TreadMark, can be used efficiently to execute OpenMP codes. Speedups between 3 and 6 were obtained using an 8 nodes cluster for several benchmarks. To get better performance and scalability, they showed that only minor modifications to the standard are required, and these could easily be incorporated into later versions of the standard.

Note that the OdinMP project at the Department of Information Technology of the Lund Institute of Technology is adopting a similar approach ( http://www.it.lth.se/~d92jh/odin.html ).

At the end of SC'98, a panel was devoted to discussions on operating system issues for cluster computing. The main discussion was on basic issues such as whether clusters will ever achieve or need to achieve the robustness of MPPs, what processor (x86 or Alpha) and networking technology is best and whether Linux is a transitory technology for Windows NT 5 for clusters. Several arguments were given by the panelists without really giving


Thierry Priol