logo

EnterTheGrid - PrimeurMonthly

EnterTheGrid - PrimeurMagazine is the premier Grid Computing and Supercomputing information source in the world. With PrimeurMonthly we provide you a free update with grid computing and supercomputer-news and in-depth analysis.

>PrimeurMagazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
Contents August 2005
ISC2005 - A 20th Anniversary & HPC - back to the future - Celebration
Heidelberg 24 June 2005

"It is encouraging that the designers of the next generation of supercomputers, expected in 4-7 years, are once again picking up the gauntlet and accepting the challenge of solving problems thrown up by the use of new semiconductor devices. In doing so they push back the frontiers of semiconductor technologies": 'Supercomputers & Their Use', Lazou, 1985. Around 625 participants from 29 countries attended the 20th International Supercomputer Conference and 47 exhibitors displayed their ware in the associated exhibition in Heidelberg. For me, this 20th anniversary is a double celebration as I finished writing my book "Supercomputers And Their Use", in June 1985. (Chris Lazou)

Advertisement
Visit our sponsors
Advertisement
Visit our sponsors

This ISC annual event enables many Europeans to assess the new technology from Japanese and U.S. vendors and to also be updated by our U.S.A. colleagues about where they are in addressing the issue of leadership, in large scale scientific technical computing. The presentations at the conference were broad based and most at the cutting edge of developments.

This article gives a flavour of trends in computer architecture designs and the challenges facing industry, on how to deliver the productivity promise, 1Petaflop/s sustained performance by year 2010.

As usual, Professor Dr. Hans Meuer and his team from the University of Mannheim put a fine vendor exhibition, a collection of stimulating presentations and delivered a seamless conference in the beautiful historical city of Heidelberg. The main sponsor this year was IBM, a formidable competitor with promising new low power and compact technology, in the guise of the Blue Gene/L, with a strong potential in influencing future HPC, for better or worse.

ISC2005 kicked off with welcome speeches and the presentation of the TOP500 list. The 25th TOP500 list shows some stark results. The Blue Gene/L topped the list and also took 5 of the top10 positions. The other trend is that clusters are relentlessly taking over. This highlights the shortcomings of Linpack as a metric for complex systems. One would love to see a comparison from results when using all eight tests in HPC Challenge. The top spot in Europe went to Barcelona's MareNostrum system based on clusters of IBM e-server Blades and Myrinet. IBM also took number one spot, with around 60% slice of the market. This dominance has not been seen since Cray Research times, in the late 1980s. One other statistic is that the sum of peak performance of all systems on the TOP500 has reached 1 Petaflop/s.

At this point the Roman adage "Caveat Emptor" (Buyer Beware) comes in mind. Recent developments of including 2 cores on a chip, increases CPU speed, but the memory-CPU gap got worse. The challenge is to develop a balanced system.

As Professor Resch said recently: "Special systems like the NEC SX-8, are specifically tuned for memory bandwidth and interconnect performance. Standard commodity chip systems tend to have poor memory tuning and interconnects; and these often do not work properly. The Blue Gene/L (BG/L) has a clock frequency of 700MHz, expects to have 131,072 processors, using 1.2MWatts total power and a footprint of 2,500 Sq. Feet. The Blue Gene/P is expected to have 1Million processors. Memory to processor power ratio is very low (0.044Bytes per flop/s on BG/L compared to 0.5Bytes on the NEC SX-8) and this makes BG/L unsuitable for memory bandwidth demanding applications and general purpose computing. It is however suited for very special “embarrassingly” parallel applications and there are several Grand Challenge ones in this category. Both reliability (MTBF) and programming complexity are also major issues on Blue Gene/L type of systems. There is a definite trend towards heterogeneous hybrid systems for the future – Vector/Scalar/MD/VPM as described by Ryutaro Himeno from RIKEN – this means that vectors will not disappear”.

There have been many achievements in the last twenty years, scaled speedup, scientific visualisation, move from custom ECL to CMOS, HPCC initiative, MPI, innovative technology such as the Cray XT3, Blue Gene/L and so on, delivering unprecedented growth in line with Moore's Law, but also new challenges now and in the future.

Let us take a journey "back to the future”. Twenty years ago, I finished the draft of my book "Supercomputers and their Use”. [My book is out of print, so I am not promoting sales.] In the introduction I said: "The term 'supercomputer' is attributed to the most powerful scientific computer available at a given time. ... In practice, a small number of computers from different manufacturers vie for this honour, and are grouped into this class of machines. It is not possible to say that one computer model is the most powerful because the power of the computer is not linear. Often the different architectures of each type of machine are a significant factor in determining which supercomputer is most suitable for a particular application [pp1]."

In year 1985 I said: "Supercomputers market share is less than half percentage point of computer sales, so why all the fuss. The simple answer is that supercomputers are of the highest and most pervasive strategic importance. They enable scientists to solve today's problems and develop technology for tomorrow's industry, affecting national employment patterns and national wealth... [pp4]." To put it in contemporary (2005) language: "The President's Information Technology Advisory Committee (PITAC) released a new report that finds that computational science is one of the most important technological fields of the 21st century."

Chapter seven starts with a simple statement: "Computation is data manipulation" [pp98].This has stark implications for computer architecture. It puts data in central position. Users collect data, manipulate it, view it and store it. Apart from fast computing engines, HPC is dependent on fast data transport to/from the processors and storage subsystems, and this inevitably requires low latencies and very high bandwidth.

And again: "Electronic components not only affect the final speed of supercomputers but also to a great extent influence the architecture adopted. As signal transit through logic circuits is much slower than the speed of light, physically large computers whose circuits are far apart cannot be very fast. In addition to the physical size of electronic components, the electrical characteristics of the material used to manufacture them substantially determine their switching speed. Many other factors are also involved.To simplify, small fast components are required for a fast machine. Small components can be packed at higher density, and fast components require higher energies at normal temperatures. The higher energies imply higher heat dissipation, and thus the packing density of components is dependent on the ability of a design to extract the heat dissipated when electronic components in large numbers are placed in close proximity. For each generation of supercomputers the solution of the above problems and the success of the architecture adopted, determines their viability in the market place [pp37]."

This line of thought was reflected in presentations by David Turek and Alan Gara, from IBM, Steve Scott from Cray and Thomas Sterling from Caltech among others.Power consumption is key conditional and physical space has main constraints. A dramatic rise in power density, this huge cost, has a big impact on computing. This debate was held when industry moved from Bipolar to CMOS. Power is again "THE" problem...

For marketing reasons, the term supercomputer has been debased over the years, with the inclusion of PC clusters, a collection of identical standard processors connected by a network. These systems are not "true" supercomputers, although peddled as such on price/peak performance grounds. With these commodity cluster systems, bandwidth and latency constraints are accentuated and fail to deliver supercomputer price/sustained, performance. As Aad Van De Steen said recently: "Typical intra-node bandwidth is 10Gbit/s and inter-node bandwidth of 1Gbit/s. Variation in latencies ranges from 20 to 100 times. From about 100nsec, for intra-net, to 2-10microseconds for inter-node. As for a Grid, one is talking about milliseconds latencies."

"The architectures of some of the earlier supercomputer designs were constrained by the technology at that time. Designers of later machines have had fewer limitations and hence have been able to introduce more complexity and parallelism into their systems. This trend is continuing [pp44]."However, several imbalances were created. Increase in clock frequency accentuated the memory-CPU gap and the complexity of cache, both these conspire to reduce sustained performance.

"It is encouraging that the designers of the next generation of supercomputers, expected in 4-7 years, are once again picking up the gauntlet and accepting the challenge of solving problems thrown up by the use of new semiconductor devices. In doing so they push back the frontiers of semiconductor technologies [pp44]." This statement was true in 1985 and it also resonates now, in year 2005.

The new systems outlined by vendors are focusing on supercomputer absolute sustained performance. In order to achieve this they are concentrating in increasing the speed and bandwidth of memory; decrease the memory latency and diversifying the computing engines. This is reflected in the hybrid systems under development. The Cray Rainer system and both Fujitsu and NEC one suspects are travelling down a similar path. These systems are likely to have some, or all of several discrete computing components, fast scalar, vector, FPGAs, visualization engines, (one expects the IBM Blue Gene/L technology to be such a component engine), serviced by high bandwidth low latency memory (e.g. FC-RAM) or maybe also using PIM (Processing in Memory) to further reduce memory latency and be interconnected using over a thousand optical channels on a chip, essentially develop a new computer architecture. The network tightly integrating specialized individualised addressable compute engines and various levels of high bandwidth low latency memory becomes the infrastructure for a data-centric supercomputer. The software then directs the most suited portion of the application code to the relevant engine manipulating the appropriate data, delivering the highest performance.

In a recent talk Mr. Takeshi Nishikawa, NEC Corporation, said that: "NEC aims to contribute to the continuous success of its customers by providing advanced powerful, stable, seamless and friendly HPC infrastructure. NEC aims to provide the most powerful computing system, with overwhelming sustained performance using advanced hardware and software parallel technologies. It would have a user-friendly environment, with virtualisation technology for easy operation and new power control technology, for low power dissipation."

He went on to say: "In the field of HPC, NEC's computer product strategy consists of leveraging two key technologies, namely, high performance and high reliability. The most advanced high speed, high density VLSI, high-density packaging, high efficiency cooling, high speed interconnect, parallel processing and cluster control technologies, are used. For high reliability it uses VALUMO platform technology, providing autonomy, virtualisation, fault tolerance, enabling continuous operation of systems."

The new systems are likely to consist of vector and scalar processors plus other components, integrated together with file sharing GFS and very fast interconnect. The vector and scalar processors will take charge of executing parts of programme processes, most suited to them and cooperating to minimise total execution time. This system is the precursor for 1Petaflop/s computing. In my view, the fact that NEC was chosen to study CPUs and memory subsystems for the 1Petaflop/s Japanese national project, is indicative of things to come.

When one looks at grand challenge applications, the sustained performance increase required to solve these problems, is estimated as follows: In the biomedical field, electron state computation of protein requires around 100Tflop/s and 30Tbs memory and for screening for drug discovery around 800Tflop/s and 200TBs of memory. In automotive aircraft field, coupled simulation of engine combustion and wing design requires around 500Tflop/s and 100TBs of memory. In the climate environment field, short-term climate prediction requires 20Tflop/s and 10TBs of memory. This is available on the Earth Simulator, presently. Long-term climate prediction requires around 200Tflop/s and 100TBs of memory, whilst forecast of local high impact hazards, such as flash flooding requires around 1Petaflop/s and 500TBs of memory. In the Nuclear energy field, complete plasma analysis, including electron structure requires around 500Tflop/s and 1PBs of memory. Finally, in the Nano-technology field, structural and functional prediction of compound material require around 200Tflop/s and 400TBs of memory. The creation of new material requires around 1Petaflop/s and as much as 2PBs of memory.

Clusters are dominating the TOP500 list. The PathScale example below shows, what these systems are offering in network speeds. PathScale gave a live demonstration at their exhibition booth of their InfiniPath interconnect on a 16-node. They claim that InfiniPath based on InfiniBand, has already posted an all time record-low MPI latency of 1.32 microseconds and a peak unidirectional bandwidth of 952 MB/s, with half of this peak bandwidth achieved at a message size of 385 bytes (streaming). For applications that rely on IP traffic, it achieves TCP/IP throughput of 583 MB/s with a one-way latency of 6.7 microseconds.

The actual performance of InfiniPath interconnect exceeds even what PathScale anticipated. PathScale claims, it is the industry's lowest-latency Linux cluster-interconnect, for message passing (MPI) applications. InfiniPath plugs into standard HyperTransport technology-based HTX slots on AMD Opteron processor-based servers. InfiniPath is being integrated with and optimised for ParTec's ParaStation4, a robust and efficient cluster middleware solution that consists of high-performance communication tools and a software management layer.

As one can see, this type of interconnect with clusters, is ruefully inadequate for delivering supercomputing performance, required to solve the Grand Challenge problems. Clusters of course have a place in the market, as capacity throughput computers and the sooner this is understood, the better for the industry. With clusters in such a prominent position, in the TOP500 list, they have become the new bottleneck, as electrical power and space guzzlers, for delivering absolute sustained performance to the user application. Use the right architecture for the right task.

In conclusion, let me use a few other resonant statements, from the closing remarks section in my book: "As for supercomputer parallelism, all supercomputers possess a high degree of parallelism... . Products will have to be upward compatible to use the wealth of mature software products users are currently running on present systems... . The supercomputer industry will require both new technology and new architectures... . The communication channels would have to be very fast... , constructed using fibre optics... . Massively parallel systems tend to use slower off-the-shelf technology. New generation systems will increasingly group their processors into modules to reduce interconnection overheads... . Each module will have an ever-increasing amount of local memory and special chips for vector floating point arithmetic... . There is a dearth of mature software tools for exploiting the raw power of new generation parallel computers. A massive injection of funds for work in this field is urgently needed. The tools needed range from new functional languages to more immediate objects such as data analysis tools... . As for biological computer devices they must still be considered as the technology for the twenty-first century, and so on [pp238-9]."

In the area of financial viability I said: "As far as the main supercomputer vendors are concerned, the uncertainty stems from the intentions of IBM. If IBM decides to enter the fray in earnest, the financial landscape will change dramatically [pp239] ..." Do the above comments and issues, from 1985 and 1988, ring any bells today?

At ISC2005, I interviewed Steve Scott, Chief Design engineer at Cray, so watch this space.

Advertisement
Advertisement
Chris Lazou

EnterTheGrid - PrimeurMagazine

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - PrimeurMonthly