logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 24 June 2004
>Start
>PrimeurLive! from ISC2004 in Heidelberg
>Blog
>Germany lost
>Cray is back
>Dongarra analyses Tflop/s systems
>Camp, Weber and Red Storm
>Mutter aller Rechner
>TOP500
>Terascale Computing Facility at Virginia Tech to optimize operating environment on system X
>How will the supercomputer systems and their interconnects of tomorrow differ from their current counterparts?
>Hardware
>The world of storage using parallel file systems
>Red Storm: what is it and what about the AMD technology
>Applications
>Using Windows as an HPC operating system proves to be a benefit
>University of Tennessee researchers analyse process fault tolerance on HPC systems
>The space simulator is modelling the universe on a budget
>Company news
>PathScale EKO compiler suite certified as interoperable with Streamline Computing's distributed debugging tool
>Breakthrough HP technology yields up to 100 times more bandwidth for Linux clusters
>More than half of world's Top 500 supercomputers now running on Intel processors (Intel release)
>Voltaire made its debut on the TOP500 list with four supercomputer clusters
>Dolphin SCI Interconnect Selected for International Space Station Training Simulator
Red Storm: what is it and what about the AMD technology
Heidelberg 24 June 2004 Two speakers discussed the project Red Storm. Bill Camp explained what it is and how it came about. AMD's technology for an evolving HPC world was shown by Fred Weber. Bill Camp outlined the road from ASCI Red to its follower Red Storm.
Advertisement
Visit our sponsors
Advertisement
Visit our sponsors

The history of Red Storm

From 1996 until 2001, the world's fastest supercomputer was the ASCI Tflops (ASCI RED) at Sandia National Laboratories. RED was not only fast, it was inexpensive and reliable.

Unfortunately RED was also a one-of-a-kind machine with no follow-on. First Sandia was in discussion with Compaq concerning the Alpha EV7 processor, but it was too expensive. A solution was provided by AMD with the Sledge Hammer. The huge, immediate advantage of Sledge Hammer was that it provided an open-spec., low-latency, and a high-bandwidth interface that could be used for a connection to a custom network. A second advantage was that when the codes were tested on it, performance was extremely good.

The guiding principles, according to Bill Camp, were that the architecture and every component of HW and SW were chosen based on the SURE methodology, that the system was scalable, usable, reliable and economic.

The system is divided into service, log-in, I/O and visualisation nodes, 1280, and compute nodes, 10.368. It will have a peak performance of 41.47 TeraFlop/s. Camp presented some benchmark examples - of their codes - and compared it with ASCI Red, which gave an improvement of a factor of 8 to 12.

Additionally they analysed clusters and vector processors, their programmes are not so vectorisable.

Bill Camp concluded for Red Storm that commodity is nearly everywhere but that customization drives cost. The Earth Simulator and Cray X1 are fully custom vector systems with a good balance. This drives their high cost and, of course, their high performance. Clusters are nearly entirely high-volume with no truly custom parts which drives their low-cost and their low scalability. Red Storm uses custom parts only where they are critical to performance and reliability.

The result is high scalability at a minimal cost/performance.

Fred Weber from AMD then took over to talk about the anatomy of a supercomputer.

Many of AMD's design goals were not the same as Sandia's priorities. He named the instruction set, the legacy support, the development platforms, the OS support, and the ISV support. The factors that were important to Sandia included balanced processing, memory bandwidth, I/O bandwidth, reliability, power efficiency, density, COTS components, and long term support.

Fred Weber presented some of the design goals of Hammer and where specifications could be modified. Additionally he showed architectural features and the interconnect evolution, HyperTransport. He mentioned that this processor was not designed for Red Storm.

He also addressed the topic of x86 in High Performance Computing, the Six System Challenges. He said that x86 is the most widely installed instruction set in the world. The instruction set is not relevant to CPU performance - "to first order". What is important is the system's backward compatibility to x86-32. There is an enormous investment is IA32 for all market segments. In many applications, porting code is not an option. It is necessary to provide a solution that is not only 100% backwards compatible, but designed to run IA32 code faster then any existing 32-bit architecture available. There has to be a gradual and controlled migration path for porting to AMD64 and one has to make

the total cost of ownership minimal.

The cost per processing node is due to cost/performance and I/O constraints. IA32 clusters are limited to two processors putting additional stress on SMP cluster interconnect. One has to bring 4 and 8 processor SMP systems closer in cost/performance to 2 processor systems. Fred Weber also addressed the need to

improve performance and decrease premium without breaking IA32 "commodity" economics. This is only possible if the same processor architecture is used on the desktop.

The third challenge constitutes the memory bandwidth. With increased system memory, come data intensive applications with strides and block sizes that cause cache thrashing. Making the cache larger is not cost-effective. Hence, performance is limited by the size of on chip cache and/or memory bandwidth. Therefore one has to improve memory bandwidth and latency and limit the cache size.

Then there is the addressable memory. Large RAM resident databases and memory intensive applications exceed the 4 Giga-Bytes limit of 32 bit systems. Paging is not a solution. AMD64 processing is the only real solution.

Concerning the I/O infrastructure, the bandwidth of a Front Side Bus causes an I/O bottleneck which continues to exclude IA32 from running challenging parallel applications. One has to provide a dedicated I/O buss which is separate from the memory bus and keeps pace with next generation I/O protocols and CPU clock.

The last issue constitutes Watt density. With clusters exceeding 10,000 processors, watt density is an important issue. As cluster size expands, cooling capacity and costs can be significant. The challenge is to design the lowest watts/Gig Cycle solution leveraging start-of-the-art AMD64 architecture and silicon on insulator process, concluded Fred Weber.

Advertisement
Advertisement

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!