logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 23 June 2005
>Start
>PrimeurLive! from ISC2005 in Heidelberg
>Blog
>Vampir resurfaced
>TOP500
>BlueGene on its way to Jülich Germany
>Hardware
>Bank Sal. Oppenheim explores potential of Linux cluster for financial operations
>The bumpy path to PetaFlop/s scale computing
>Hot Seat Session: transtec to provide hardware, management software and experience
>Hot Seat Session: Cray going strong with X1E, XT3 and XD1
>Trouble in paradise: technology disruptions threaten power and system optimization
>Hot Seat Session: IBM is taking care of the customer, says Mootaz Elnozahy
>Company news
>Quadrics on the Rocks
>UK Atomic Weapons Establishment chooses Linux Networx
The bumpy path to PetaFlop/s scale computing
Heidelberg 23 June 2005 Steve Louis from Lawrence Livermore National Laboratory (LLNL) in California, and Alain Gara from the IBM Research Center in Yorktown Heights, described the possibilities of a cost-effective path towards PetaFlop/s scale computing by analysing the architecture details of the TOP500 number one BlueGene/L. The BlueGene/L scales to 360 Tflop/s with modified COTS and custom parts and is addressing five critical issues on the road to PetaFlop/s computing. These factors are power, floor space, cost, single processor performance, and network scalability. Five issues which have to provide an affordable path to PetaFlop/s scale computing, according to Steve Louis.
Advertisement
Visit our sponsors
Advertisement
Visit our sponsors

The time line for the BG/L deployment at Lawrence Livermore National Laboratory learns us that 16 racks were delivered in November 2004. The system was tested in December 2004 and runs started in January 2005. Sixteen additional racks were delivered early February 2005. The 32-rack runs started in April 2005. Thirty-two additional racks will be delivered in mid 2005 to achieve the integration into a 64 rack system. This will be followed by a machine shake out and a test as well as a 64-rack scaling for science. In late 2005 there will be an expanded early user access and more science runs to follow.

Louis desribed the BG/L project as a focused effort to enable important science and to lead the way to cost-effective PetaFlop/s computing. The architectural features promote efficiency and scaling for ASC science applications. The multiple complementary interconnects support diverse application scaling requirements. A high reliability is expected from the high level of integration thanks to the system-on-a-chip technology. BlueGene/L's architectural enhancements improve single node performance while the software is architected with the very powerful "divide and conquer" technique for software scale-up.

He emphasized that the Blue-gene is not a goals in itself. Its there to tod new science. Louis concluded that in a number of cases, this has already been the case. Scientists are enthousiastic.

Steve Louis explained that there are two different ways to use the BG/L nodes. In the first scenario the CPU0 does all the computations and the CPU1 does the communications. In fact, communication overlaps with computation. The peak computation performance amounts to 5.6/2 or 2.8 GFlops. In the second scenario, the CPU0 and CPU1 perform independent "virtual tasks". Each does its own computation and communication. The two CPUs talk via memory buffers meaning that the computation and communication cannot overlap. As such, the peak compute performance is 5.6 GFlops.

For the first time BlueGene/L will allow an overlapping in the evaluation of the models. As such, the BG/L simulations will bring a qualitative change to the ASC material and physics modelling and engineering. The scientists at LLNL have planned first-wave and second-wave applications on BG/L. The first-wave target ASC applications constitute efforts identified to be ready for programmatic science runs with a very early machine availability, and an ongoing assessment of code suitability, the speakers explained. The assessment criteria for early applications include relevance to the ASC programme; enthusiasm within the code group, and a potential for good code scaling with simpler architectural needs. The second-wave target ASC applications will also being identified using similar criteria to those of the first wave for runs on the full machine.

BG/L is applied to perform classical molecular dynamics (MD) simulations with highly accurate MGPT potentials.

The MGPT semi-empirical potentials are based on a rigorous expansion of many body terms in the total energy. They are needed to quantitatively investigate the dynamic behaviour of transitions in metals and actinides under extreme conditions. The 64K and 256K atom simulations on 2K nodes are in a order of magnitude larger than previously attempted. They are based on a 16M atom simulation on 32K nodes and are close to the perfect scaling which is expected for the full machine, explained the speakers.

Another application is Miranda. This is a high order hydrodynamics code for computing fluid instabilities and a turbulent mix. The application employs FFTs and band-diagonal matrix solvers to compute spectrally-accurate derivatives, combinedwith high-order integration methods for time advancement. Miranca contains solvers for both compressible and incompressible flows. It has been used primarily for studying Rayleigh-Taylor (R-T) and Richtmyer-Meshkov (R-M) instabilities, which occur in supernovae and Inertial Confinement Fusion (ICF). Miranda has successfully run on 32,768 nodes on BG/L and also on 32,768 processors in the "virtual node" mode. With Miranda, BG/L has proven to handle a wide range of scales in space and time necessary to represent turbulent flows of interest. This is a good time-to-solution improvement from MCR to BG/L, according to Steve Louis.

On classical MD programmes, the BL/G shows perfect scalability. On Miranda, for instance they saw scaling upto 32.768 nodes without problems. On first principle MD applications the scaling is not perfct yet. For instac, they measured speed-ups of 3 in going from 4,000 to 16,000 processors.

A second application in the domain of instability and turbulence is Raptor. Here, a multi-physics Eulerian Adaptive Mesh Refinement (AMR) code is used for applications at LLNL which include astrophysics, Inertial Confinement Fusion (ICF) and shock-driven instabilities and turbulence. Raptor can be used to simulate purely fluid dynamics systems and more complex physical systems where the fluids are coupled to the radiation field, such as in ICF or astrophysics. Simulations at full scale on BG/L will offer the computational power to gain an order of magnitude more resolution in simulations of three-dimensional shock-driven systems, as the speakers told the audience.

SPaSM is a high performance code for Scalable Parallel Short-range Molecular dynamics simulations. This is yet another application in which BG/L has demonstrated its potential. A variety of finite-range empirical potentials are implemented, including EAM and MEAM for metals, Stillinger-Weber Si/Ge, and a reactive empirical bond-order (REBO) potential for detonation studies. SPaSM has exhibited excellent scaling for up to 100 billion atoms on 16,384 nodes, and an initial production run on 8k nodes simulated the shock loading of a 2.1 billion atom EAM copper crystal with 0.41% voids. The speakers are convinced that BG/L will enable the exploration of an entirely new class of problems such as this.

In addition, the Large-scale Atomic and Molecular Massively Parallel Simulator (LAMMPS) has run on BG/L. It consists of a classical molecular dynamics code that models particles in a liquid, solid, or gaseous state. It can model atomic, polymeric, biological, metallic, or granular systems using a variety of force fields and boundary conditions. The speakers stated that on parallel machines, the code uses spatial-decomposition techniques to partition the simulation domain into small 3D sub-domains. LAMMPS has been tested on up to 512 BG/L processors so far, and shown good scaling on a fixed-size problem of 32,000 atoms.

More first-principles Molecular Dynamics are demonstrated with the Qbox. At LLNL, scientists have performed a C++/MPI implementation of the plane-wave, pseudopotential, ab-initio molecular dynamics method within Density Functional Theory (DFT). Steve Louis told the audience that this massively parallel C++/MPI implementation with specialized 3D FFTs is routinely used at LLNL for simulations of condensed matter subjected to extremes such as high pressure and high temperature, as well as in nanotechnology and biochemistry applications. The 686-atom Mo solid and other heavy metal simulations are now under way with BlueGene/L. The scalability tests on BG/L show that Qbox can achieve a threefold speed-up when solving a given problem on 16,384 nodes instead of 4,096 nodes. The speakers promised that further optimizations will provide even greater efficiency.

An IBM Zurich-LLNL collaboration constitutes the computation and communication intensive application developed by Alessandro Curioni and his colleagues at IBM Zurich. This CPMD application uses plane wave basis functions, 3D-FFTs and parallel linear algebra routines to study the electronics and structural properties of complex materials from first principles. CPMD is used worldwide, and has been identified as an important tool for several critical material research areas and efforts at LLNL, Steve Louis explained.

The Parallel Dislocation Simulator (ParaDIS) is a new LLNL code for direct computation of plastic strength of materials. ParaDIS tracks simultaneous motion of millions of dislocation lines and promises to close the computational performance gap that prevents scientists from understanding the fundamental nature of material strengthening or hardening. Steve Louis stated that ParaDiS has run on 16,384 nodes of BG/L, and engineers at LLNL are currently investigating the scaling and dynamic load balancing issues to achieve higher efficiencies.

After having shown this number of exampels the speakers summarized that application results to date on BG/L are pointing to a qualitative change in the way science can be done. The scientists can do a new run every day and can afford to make investigations and then explore the alternatives. An entire scientific study can now be done in the same time as just one run only a year ago.

In short, BlueGene/L promises to revolutionize the DOE mission as well as high-end computin. BlueGene/L is the fastest computer in the world, at only a half of the size of its eventual configuration at LLNL this summer, the speakers proudly stated. The most important however are the application results which enable more effective science and will have an impact on key national missions. In this way BG/L can really mean a cost-effective path to PetaFlop/s scale computing through thorough validation of the BlueGene/L hardware and software design.
Advertisement
Visit our sponsors
Advertisement
Visit our sponsors
Leslie Versweyveld

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!