Cornell Theory Center researchers achieve more than 3-fold speedup with Intel Itanium architecture

Ithaca 07 September 2001 Researchers with the Computational Materials Institute (CMI) at the Cornell Theory Center (CTC) experienced a 3.4 x speedup when they moved their code to the new Intel Itanium (IA 64) architecture. CMI scientists perform large-scale finite element simulations of critically important materials science problems such as failures in engine gears, tears in airplane fuselage surfaces, and cracks in dams. Working with specialists at Intel's Applications Solutions Center (ASC), the CMI team delved into the details of this new architecture and optimised the performance of critical parts of their simulation system for the Itanium.

CMI scientists Gerd Heber and Andrew Dolgert ported their Crack Propagation on Teraflop Computers (CPTC) code to 64-bit Windows (XP) and then visited Intel's ASC in Chandler, Arizona to work with Intel engineers, Max Alt, Karen Mazurkiewicz and Lynd Stringer, to optimise the solver for Itanium. This was the first time researchers visited Intel's ASC with an IA-64 (Itanium) application. After two weeks spent learning how to use the tools for Itanium and how to interpret the compiler (assembly) output, the CMI team achieved a 3.4 x speed improvement for single-processor execution on Itanium. The optimisations they conducted ranged from disambiguation over software pipelining to prefetching and optimised BLAS/LAPACK routines.

Although the optimised code runs in parallel for production, the focus for the visit was single processor optimisation and no changes were made to the message passing (MPI) code. When they tried the code on a 4-way Itanium SMP, the parallel performance (limited by Amdahl's law) improved almost as much as the single processor performance. Cornell has given Intel a Software License Agreement so that they can use CMI's code as a real world example to train their engineers.

The hallmark of CMI's simulation systems, based on their CPTC software, is that they are designed for problems involving complex geometries with arbitrary crack shape, multi-physics, and non-linear system response. These problems require that the size of the region being simulated increases as the crack grows, that the focus of the simulation shifts with the progress of the crack within the material and the scales at which the simulation is being conducted shift along moving boundaries. In addition, snapshots must be taken and data extracted as the simulation proceeds.

"This class of problems promises to benefit tremendously from the new IA 64 architecture", stated CMI director Anthony R. Ingraffea, "taking advantage of the dramatic increase in floating point calculations and memory bandwidth." CMI researchers are developing iterative solution methods with finite element solvers to enable simulations of very large models, involving up to about 1 million degrees of freedom. Iterative solvers require less memory than their direct counterparts for the same problem size. Up to this point, CMI scientists needed to perform time-consuming, error-prone, and non-portable fine-tuning in order to improve the ratio between floating point and load/store operations.

"The IA-64 is among the first architectures to provide an instruction set architecture (ISA) with explicit support for the instruction-level and thread-level parallelism required to optimise CMI's code", Mr. Ingraffea continued, "so that we can overcome critical performance bottlenecks."

Since moving to the Pentium-based Velocity+ cluster at the beginning of this year, CMI has dramatically increased the size and complexity of its typical problem. Prior to this move, a fracture growth simulation using 3D boundary elements might consist of tens of crack growth increments. The problem size was usually limited to 5 or 6 thousand linear elements. The boundary element solution generated a dense non-symmetric system of equations. Integrations were performed at collocation points to generate the equilibrium displacements and tractions in the structure. A 5000-element problem might generate 20,000+ degrees of freedom, which corresponds to 20,000*20,000*8 bytes (or 3.2 GBYTES). This required 4.5 hours of runtime using 48 processors on CTC's previous supercomputer. Simulations recently run on Velocity+ achieved an almost 100 times greater resolution, and were also performed in a fraction-less than one tenth-of the time.

"Scientists and engineers can now do their development on an Itanium desktop, like Heber's Dell Precision 730, and move their work directly to a multi-processor system in the Windows environment using tools like MPI Software Technology Inc.'s MPIPro", stated CTC chief technical officer, David Lifka. "The power of IA 64 is at their fingertips from start to finish."

The mission of the Computational Materials Institute at CTC is to develop and transfer to industry computational simulation systems for crack growth and fracture processes. The group is supported by the aerospace industry (Northrop Grumman, Boeing Commercial Airplanes, Pratt and Whitney, GE), imaging industry (Eastman Kodak Co.), energy development industry (Schlumberger), and government R&D agencies (NASA Langley and NASA Glenn Research Centers, FAA, AFOSR, NSF). Their simulation systems are used for improving manufacturing process control, simulating the stimulation of oil/gas wells, understanding failure mechanisms, and predicting remaining life and residual strength of components and structures.


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]