The main sponsor of the ISC conference is also the sponsor of the awards. This year's sponsor is HP. The procedure adopted for deciding the awards is as follows: Authors are invited to submit papers to the annual ISC conference. The topics of the call for papers change with the focus of the conference every year. The best paper from each topic is chosen and presented at the conference. One award is given for each individual topic.
An international committee of experts, from academia and industry chose the winners. Chaired by Professor Michael Resch, the current committee has academic luminaries from France, Germany, Japan, UK, the USA and expert representatives from HP and Intel, in Germany.
In general, the ISC awards honour scientists working on projects, which address the applicability of High Performance Computing to real life problems in an innovative way. For 2006 there is a specific focus on two research segments.
The first segment chosen was on life sciences: Of interest were projects in the area of bioscience and computational medicine with examples where high performance computing systems helped to gain a completely new insight and understanding rather then showing a well known phenomenon with higher resolution.
The second segment chosen was on the issue of application scalability on very large systems: Of interest were projects which address the typical hybrid nature of large systems where a shared memory architecture at node level becomes the building block for very large clusters. Innovative solutions should scale very well across a variation of both paradigms and demonstrate high efficiency for very large node numbers.
Congratulations go to all those who submitted papers, as all these papers contained interesting developments offering new insights. This year's winners for the first segment were: Shantenu Jha, Peter Coveney, Matt Harvey, Centre for Computational Sciences, UCL for their paper "SPICE: Simulated Pore Interactive Computing Environment - Using Federated Grids for Grand Challenge Bio-molecular Simulations". Below are extracts from their paper to give you a flavour.
SPICE is no stranger to awards as it already won the HPC Analytics Challenge Award last November at SC|05. According to their paper SPICE aims to understand the vital process of translocation of bio-molecules like DNA, RNA and polypeptides across protein pores by computing the free energy profile of the translocating bio-molecule along the vertical axis of the pore. The transport of bio-molecules across protein membrane channels is of primary significance in a variety of areas. For example, gene expression in eukaryotic cells relies on the passage of mRNA through protein complexes connecting the cytoplasm with the cell nucleus.
According to the authors, classical Molecular Dynamics (MD) is well suited for this type of simulation, but without significant advances at the algorithmic, computing and analysis levels, understanding problems of this size and complexity were likely to remain beyond the scope of computational science for the foreseeable future.
Necessity is the mother of invention. Thus a novel algorithmic advance was developed by a combination of Steered Molecular Dynamics and Jarzynski's Equation (SMD-JE). Grid computing provides the required new computing paradigm as well as facilitating the adoption of new analytical approaches. SPICE uses sophisticated Grid infrastructure to couple distributed high performance simulations, visualization and instruments used in the analysis to the same framework.
The first fully atomistic simulations of the hemolysin pore capable of capturing the interaction in full have appeared very recently. They address however, only static properties - structural and electrostatic - and have not attempted to address the dynamic properties of the translocating, DNA. The lack of more attempts at atomistic simulations of the translocation process is due in part, to the fact that the computational requirements for simulations of systems of this size, for the required timescales have hitherto not been possible. A back-of-the-envelope estimate of the required computational resources helps to explain why.
The physical time scale for translocation of large bio-molecules through a trans-membrane pore is typically of the order of tens of microseconds. It currently takes approximately 24 hours on 128 processors to simulate one nanosecond of physical time for a system of approximately 300,000 atoms. Thus, it takes about 3000 CPU-hours on a tightly coupled machine to simulate 1 ns. A straightforward vanilla MD simulation will take 30 million CPU-hours to simulate 10 microseconds - a prohibitively expensive amount. Consequently, approaches that are "smarter" than vanilla classical equilibrium MD simulations are required.
Relying only on Moore's law (simple speed doubling every 18 months) we are still a couple of decades away from a time when such simulations may become routine. Thus, advances in both the algorithms and the computational approach are imperative to overcome such barriers.
The SPICE project used the computational resources of a federated trans-Atlantic Grid. It utilizes the combined computational resources of the US and UK, to demonstrate a capability that would not have been possible using either just the US or the UK Grid infrastructure.
First, they used interactive programming with visualization tools to obtain best fit, for the size of their problem. They then submitted BATCH jobs using the Grid infrastructure to perform 72 parallel MD simulations in under a week. Each individual simulation was run on 128 or 256 processors, depending on the machine used. These simulations used approximately 75000 CPU hours; an amount out of reach to probably all but the very few (on Blue Gene/L maybe) today.
Their conclusion is that SPICE provides an example of an important, large-scale problem that benefits tremendously from using federated Grids. In particular it is a good example of the advantages - both quantitative and qualitative - that steering simulations of large bio-molecular systems provide.
This year's winners for the second segment were: B. Bergen, F. Huelsemann, U. Ruede (LANL, EDF, Uni Erlangen), for their paper "Hierarchical Hybrid Grids: Achieving TERAFLOP Performance on Large Scale Finite Element Simulations". A few extracts from their paper follow to give a flavour of their achievement.
Their paper explains that the design of the Hierarchical Hybrid Grids framework is motivated by the desire to achieve high performance on large-scale, parallel, finite element simulations on supercomputers. In order to realize this goal, careful analysis of computationally intensive low-level algorithms used in implementing the library, is necessary.
This analysis is primarily concerned with identifying and removing bottlenecks that limit the serial performance of multi-grid component algorithms, such as smoothing and residual error calculation. To aid in this investigation two metrics have been developed: The Balance metric, and the Loads Per Miss metric. Each of these metrics, make assumptions about the interaction of various data structures and algorithms, with the underlying memory subsystems and processors of the architectures on which they are implemented. Applying these metrics generates performance predictions that can then be compared to measured results, to determine the actual characteristics of an algorithm/data structure, on a given platform. This information can then be used to increase performance.
They state that in practice, it is often the case that the codes used to perform simulations in scientific computing applications, achieve only a small percentage of the theoretical peak performance of the CPU, on a given architecture. In many cases, this poor performance can be attributed to memory bandwidth limitations that are imposed on the underlying algorithm, by an imbalance between the CPU throughput and the burst rate of the memory subsystem.
Although these performance problems cannot generally be avoided, it is unacceptable to point to their ubiquitous nature and simply accept results of 5-10% of the CPU's theoretical peak performance, without understanding why these limitations exist.
A possible approach that can be used to elucidate the real interactions that occur during computation, in an attempt to understand what level of performance is actually possible, is to hypothesize various models of memory access, that may be compared to measured results. Through the use of these models, which are referred to here as metrics, they show that it is possible not only to adequately explain, but also to accurately predict what percentage of the theoretical CPU throughput, can be achieved by an algorithm on a particular architecture.
In the above paper, they introduce two performance metrics: the balance metric and the loads per miss metric. Each of these is formulated by making different assumptions about the memory access characteristics of an algorithm implemented using a specific set of data structures. These metrics are then used to predict and interpret measured results of various implementations of the standard, lexicographic Gauss-Seidel algorithm, which is a popular smoother for use with geometric multi-grid algorithms. As part of this analysis, they give a brief discussion of the performance of a code implemented using the hierarchical hybrid grids.
Their experiments were run on several systems: SGI Altix, Hitachi SR8000 and Xeon. Using the Hierarchical Hybrid Grids (HHG) library they found that good serial performance is necessary to be able to achieve reasonable scalability when performing large-scale simulations. It is important to consider not only what algorithm is being analyzed, but also how that algorithm is implemented, since this affects the way in which data are accessed.
Several experiments were contacted and the results analyzed and interpreted. Finally, they present results from a 1024 CPUs SGI Altix 3700, with 1 GB RAM per CPU where it achieves about 1.4 Terflops and state that the combination of performance in Gigaflop/s and time to solution show the implementation is very efficient. Given a similar machine with 4GB RAM per CPU, they predict that they would be able to solve a problem with 6.8 times 10 to the power of 10 unknowns, at approximately 3.5 Teraflops.
In conclusion they said: "In any scientific endeavour it is important not only to produce great results, but also to be able to explain them... The use of metrics, to try and model the underlying interactions that take place during the execution of complex scientific codes has proved to be quite useful..."
Like its predecessors ISC2006 is promising to be a great event, everyone who is anyone in HPC would like to be there, so if you can, join us in Dresden. Further details on both winning papers can be found on: http://www.isc2006.org/