The DutchGrid project has been launched last year as part of the European Datagrid initiative to create a testbed for applied grid computing in The Netherlands. Six partners, including the Dutch National Institute for Nuclear and High Energy Physics (NIKHEF), the Royal Dutch Meteorological Institute (KNMI), the Academic Computer Centre Amsterdam (SARA), and the Universities of Amsterdam, Nijmegen and Utrecht are contributing to various experiments in processing very large distributed data sets. As such, Monte Carlo simulations of event data are being set up in the particle physics field and satellite data are being processed for earth observation purposes. To this end, a Dutch mini-grid has been created which contains multiple clusters or so-called PC farms, and the Globus Toolkit to facilitate access to distributed databases.
Virtual Laboratories was set up in 1999 by the University of Amsterdam, the Institute for Atomic and Molecular Physics (AMOLF), and NIKHEF to build a science portal based on grid technology, and thus provide seamless access to a large collection of software and hardware resources. The portal is used by scientists in chemistry and physics to handle data from remote devices; for simulations of traffic control and telematics in order to solve complex design problems; and in bio-informatics and medicine to integrate large databases as well as to perform 3D modelling and interactive simulation. The Gigabit Ethernet Local Area Network of the Amsterdam Science & Technology Centre serves as a testbed to run the GigaPort wide-area network, as well as the Globus Toolkit and physical devices which are applied in the design of generic software tools for virtual reality visualisation, remote collaboration, and the processing and data integration from heterogeneous databases.
The Distributed ASCI Supercomputer in the DAS project running since 1997, is a shared testbed for parallel and distributed computing which was built by the Advanced School for Computing and Imaging. Universities in the 4 Dutch cities of Amsterdam, Delft, Leiden, and Utrecht are involved in programmes ranging from image processing, weather forecasting, quantum chemistry, N-body & time warp simulations to the steering of scientific applications from virtual reality, the search through large image databases, and parallel processing on the wide-area DAS. The DAS architecture consists of a cluster-based system which is geographically distributed, with a homogeneous grid environment for controlled experiments, offering fast access to a local cluster for interactive use.
In fact, there are two testbed versions. DAS1 provides 4 PentiumPro/Myrinet clusters running on a wide-area Asynchronous Transfer Mode (ATM) network totalling 200 CPUs. In the Summer of 2001, DAS2 will start operating on the GigaPort wide-area network with five clusters, totalling 400 CPUs, which are based on shared memory processing (SMP) nodes. The problem that is faced with distributed computing such as DAS is that it is only suitable for coarse-grained applications due to the slow wide-area communication, according to Dr. Bal. The DAS team has been searching for a way to exploit the hierarchy to allow more fine-grained parallelism. Dr. Bal used the example of a parallel heuristic search as it is applied in bio-informatics. Here, the scientist needs a shared transposition table which remembers the solutions already found in order to reduce the search space.
In practice each processor must execute a lot of table lookups to check if the state has already been searched. This requires 10.000 remote table lookups per second which causes a mediocre performance even on one single Myrinet cluster, let alone on a grid. Therefore, Dr. Bal proposed a new algorithm that asynchronously moves the search away to the processor which may contain the solution. This transposition driven scheduling (TDS) does not reduce the amount of communication but it allows to perform other jobs instead and to combine messages, thus speeding up the process. TDS is latency-insensitive but bandwidth bound with a need for optimised load balancing to function in hierarchical wide-area systems, such as DAS. This is realised by ignoring the transpositions between clusters.
Since the most simple optimisations already have a high performance impact the algorithmic alternative provides tremendous opportunities for distributed supercomputing, as Dr. Bal explained. In hierarchical systems, there exists a need for programming support offered by tools like Satin and MagPIe. Satin is based on the principle of divide-and-conquer parallelism. The algorithm is stealing randomly from clusters whereby the steals from remote clusters are asynchronous and overlapped with local steals. MagPIe, part of the Albatros project, instead uses hierarchical communication graphs for the segmenting of large messages to keep multiple wide-area links busy. The goal now is to adapt MagPIe to the changing network conditions for having it run over the Internet to make applications and libraries aware to changes in the network performance.
Dr. Bal presented the future work in the exciting field of advanced computer science as a five-fold challenge consisting in the ongoing study of distributed processing on large data sets; the integration of different heterogeneous databases; implementation of medium-grained distributed supercomputing; the creation of real time interactive and collaborative visualisation, as it is being done in the CAVEstudy for instance; and the building of programming environments for all of these applications.