logo
EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world. With Primeur Live! it brings you Live reports from Europe's main Supercomputing and Grid events

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
Issue 27 June 2003
>Start
>A new design for supercomputers?
>Focus
>GRIA takes Grid computing into the real world
>It is hard work to keep up with people expecting us to follow Moore's law
>TOP500 supercomputing
>Off-the-shelf supercomputing is a dead end
>Interdependence of architecture and software for effective terascale computing
>Building a PetaFlops class machine for large scale system design experience and biomolecular simulation
>Exploring the benefits of FPGA-processor technology for genome analysis at Acconovis
>Twenty years experience at NAL with software for HPC in aerospace science and engineering
>Software for large-scale computing: it is scalability that matters!
>Can SuperData Centres be secured?
>Complexity of data in the passenger services systems of the DB AG
>Billing of million customers at German Telekom
>The Grid
>Taming huge data volumes
>Company news
>Rapidly evolving microprocessor technology turns throughput computing into alternative for HPC
>Dell introduces 64-Bit server for high-performance computing market
>Efficient network-storage, TCP processing and processor development under the loop at Intel
>AMD Opteron processor answer to tough challenges in high performance computing
Interdependence of architecture and software for effective terascale computing
Heidelberg 27 June 2003 Dr. Thomas Sterling, California Institute of Technology, highlighted the important relationship between computer architecture and the software to achieve an effective computing in the terascale range. The performance, performance to cost, efficiency, and programmability as well as the opportunities afforded by technology advances are demanding simultaneous and strongly interdependent innovation in system architecture and software. He discussed the potential and nature of these interdependencies and the innovation they will create.
Advertisement
Visit our sponsors
Advertisement
Dolphin's SCI interconnect features the lowest latency and wire speed

The hardware vendors dominant strategy to high-end computing in the direction of Teraflops scale supercomputer systems was the integration of COTS hardware components with supporting system software and tools to facilitate coordinated concurrent operation and parallel programming in addition to sell these computers in the commercial market.

It failed providing a solution to the challenges of efficient and easily programmable high performance computing, as the components of these COTS architectures are not designed to support large scale parallel computing. They do not reflect a scalable execution model nor include mechanisms for efficient parallel computation and represent the physical integration and interconnection of independent sequential processing elements. Thus the software has to provide the paradigm, methods, and tools for achieving effective programmable parallel computing. In most cases this has proven difficult, time consuming, error prone while often exhibiting low efficiency.

Performance Challenges and Opportunities

Sterling listed some efficiency factors like

  • Latency,
  • Overhead,
  • Contention
  • Starvation

The latency is the number of cycles required to transfer a request to a remote resource (and back) and has impact on the utilisation of critical resources. Cache systems attempt to avoid latency. Other methods, as in the Earth Simulator, partially hide latency. Latency can be predicted but the additional delays caused by contention for shared resources of multiple requesting sources (e.g. memory bank conflicts) at run time cannot always be determined ahead. Starvation is the result of insufficient work of a processor either due to lack of programme parallelism or to poor load balancing. Conventional processor architecture incorporates little functionality to address these problems. Software and algorithmic techniques are the only opportunity.

Today the floating-point unit, once critical and most expensive, is now one of the least expensive to fabricate in VLSI. An FPU can take up as little die area as 1% of an entire chip (or less). Half the die area of a modern microprocessor chip will be consumed by cache.

Innovation in Architecture

Thomas Sterling discussed the streaming architecture being developed at Stanford University and the Trips architecture under development at the University of Texas. When the temporal locality is low or there is no temporal locality such that data access patterns fall into the category of touch once data, then the operations are best performed as close to the memory as possible and the FPUs may be merged directly on the DRAM dies. Here, latency and memory bandwidth is most important and the merge of logic and memory is referred to as processor in memory, or PIM. He mentioned projects like DIVA at USC ISI, PIM-lite at the University of Notre Dame, and the MIND architecture at the California Institute of Technology as an advanced class of general-purpose PIM architectures.

Sustainable Petaflops-scale performance should deliver the Cascade project by Cray Inc. in support of the DARPA High Productivity Computing System programme. This system comprises a potentially large set of interconnected "Locales", each incorporating a heavyweight processor (HWP) with many tightly coupled FPUs and a number of PIM chips with multiple memory/processor nodes (LWP) on each device.

Cascade is a shared-memory architecture, any HWP or LWP can access any word within the entire system. With hardware support for thread context switching and message processing, near fine grain processing can be made efficient.

Future Software Roles and Relationships

A major component of the software system will be the run time system to provide fine grain manipulation of resources and data in rapid response to both application requirements and operating system support. The run time system establishes a new relationship with the compiler to make best use of knowledge at each level of the decision tree. Mr. Sterling discussed some of the issues getting more information out of the code and using it in the decision tree.

The new relationship between hardware architecture and software support (compiler and run time system) needs a new autonomous intelligence in managing and mastering control of system operation. The executing application code must be fully virtualised from the physical hardware. Some of the issues of autonomous computing can be transferred to the software to isolate faults and correct them.

He proposed a control decision tree to acquire the information from the user, the programme, the compiler, and the hardware at execution time to determine the best choices in resource allocation and task scheduling. An emergent run time software system will play an increasingly important role comprising synergistic agents with introspective threads.
Advertisement
Visit our sponsors
Advertisement
Uwe Harms

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!