|
Computers have become increasingly costlier, larger, hotter and power-hungry. The current generation of CMOS has a size of about 400 mm2. The first generation to come will have a factor of 1.8 transistors more than the current one whereas the second generation will include CMOS at a factor of 3.5 more than today. Chips with 1B circuits are just a few years away, as the speaker predicted.
The processor-memory in- and output will be increasing which will result in performance disparity. There will also be a growing number of levels of parallelism which will have an impact on the realisable performance. At the same time, Mr. Mirza warned the audience that we will face increasing costs of communication as well as a cabling mess. The management of systems will become more complex. At IBM, the BlueGene initiative has been set up to investigate solutions to these and other challenges.
In fact, BlueGene constitutes a multi-year project to build a PetaFLOPS class system with a dramatically higher performance, an unconventional approach and a targeted application. The speaker explained that BlueGene aims to achieve two primary goals. IBM wants to advance the state-of-the-art of biomolecular simulation and of computer design and software for extremely large scale systems. For that purpose, IBM has signed partnerships with different customers.
The researchers involved in the BlueGene initiative approach the big challenge by making unconventional system trade-offs, according to the speaker. As such, they strive to have modest performance nodes in order to dramatically reduce the power and build a system with an increased density and reliability. The amount of memory will not be exceedingly elevated as to create a lower latency and a higher BW. The engineers aim to implement direct connected switching which implies that the cabling will be reduced but that there will be integrated multiple interconnects. The system will include many compute nodes but the hierarchical structure will reduce the number of managed nodes.
Mr. Mirza also told something about the compute nodes in the newly to build system. They will be dedicated to the execution of one single application process and interconnected via multiple networks. They will be provided with a lightweight kernel with a simple single-user OS. Features of the kernel further include a single dual-threaded application process with each thread bound to one of the processors, a user level run time library for communication, a single static virtual address place and minimal overhead/interference.
The in- and output nodes have an interface with the outside world and they manage a subset of compute nodes (pSet). They run on a standard Linux OS with additional functions to control pSet but there is no application code, as the speaker explained. They even provide services which are not provided by the compute node software which are file system access; socket connections to other systems; process management, authentication and authorisation processes, and accounting; and debugging for user applications.
Mr. Mirza noted that the management node takes care of a cluster of 1024 "nodes" or pSets as well as file server nodes. The BlueGene PetaFLOPS system is expected to run a modified cluster systems management (CSM). The in- and output nodes will access a file I/O through a scalable file server.
As far as the BlueGene/L networks are concerned, they will be multiple and integrated. As such, a three-dimensional torus will be built to realise virtual cut-through hardware routing just to maximise efficiency with 1.4 Gb/s on all 12 node links, amounting to a total of 2.1 GB/s per node. The total torus interconnect bandwidth will be 134 TB/s with a 0.7/1.4 TB/s bisectional bandwidth, as the speaker stated.
The global combining/broadcast tree for collective operations includes the following features: a logic and arithmetic for combining and reducing operations implemented in the tree; a link bandwidth of 350 MB/sec in each direction; a separate tree for global interrupts and barriers; and a target hardware latency of 1.5 us to traverse the tree of a 64K node partition.
Ethernet is incorporated into every node ASIC, but externalised only in I/O nodes and hosts control, booting and diagnostics. Furthermore, all networks are reconfigurable at 512-node mid-plane level. The networks are partitionable which is achieved by programming the Link cards. They provide connectivity between nodes in different midplane units. As such, smaller system partitions are created to run multiple smaller jobs. The networks are used for servicing also.
Mr. Mirza stated that from a system perspective, BlueGene/L will look like a 1024-way cluster of independent machines or pSets. The system will be able to leverage existing cluster infrastructure like CSM for management and administration. One of the benefits is that each pSet functions as one independent logical machine. Each I/O node runs a full Linux image and the corresponding pSet is under control of that Linux image. There is only one hostname and IP address through the Gigabit Ethernet as well as one process space, for all processes in the pSet. Finally, the user processes live in the I/O nodes and they include process management and debugging, authentication and authorisation, and system management daemons.
According to the speaker, the user processes are actually executed on compute nodes and the processes in the compute nodes are controlled from the I/O node using BlueGene/L-specific calls. Mr. Mirza expects that over time, the system will evolve to a model where the compute nodes are more transparent, and the processing set truly behaves as a conventional Linux multi-processor. BlueGene/L displays a message passing programming model.
IBM wants to apply the system for protein folding and classical molecular dynamics. In this regard, a TeraFLOPS machine is needed since good science implies numerous simulations, as Mr. Mirza stated.
The IBM Research Computational Biology Center (CBC) is a part of the Deep Computing Institute within IBM. This virtual organisation includes a group of about 35 full time researchers who operate in six locations. They are involved in basic science and exploratory work at the interface between information technology and biology. Mr. Mirza cited some examples of CBC projects that are dealing with bioinformatics algorithms, functional genomics and modelling, structural biology, protein dynamics in relationship to BlueGene, and data management and integration.
IBM has closed a partnership with the Tri-Lab ASCI Community including Lawrence Livermore, Los Alamos, and Sandia Laboratories, to investigate whether the BlueGene/L approach could be applied in other areas of high performance computing. In that regard, IBM will deliver a 65K node system to Lawrence Livermore in the fourth quarter of 2004. Furthermore, IBM has set up a number of external collaborations with American and European universities and research institutes as well.
Mr. Mirza concluded his talk by confirming that here has been significant interest in the HPC user community to apply the BlueGene/L approach to other ultra-intensive computations. Currently, there exists a strong user community collaborating with IBM on various aspects of the system. Mr. Mirza believed that this will guide IBM to the development of more general-purpose PetaFLOPS class machines in the future.
|