logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 24 June 2004
>Start
>PrimeurLive! from ISC2004 in Heidelberg
>Blog
>Germany lost
>Cray is back
>Dongarra analyses Tflop/s systems
>Camp, Weber and Red Storm
>Mutter aller Rechner
>TOP500
>Terascale Computing Facility at Virginia Tech to optimize operating environment on system X
>How will the supercomputer systems and their interconnects of tomorrow differ from their current counterparts?
>Hardware
>The world of storage using parallel file systems
>Red Storm: what is it and what about the AMD technology
>Applications
>Using Windows as an HPC operating system proves to be a benefit
>University of Tennessee researchers analyse process fault tolerance on HPC systems
>The space simulator is modelling the universe on a budget
>Company news
>PathScale EKO compiler suite certified as interoperable with Streamline Computing's distributed debugging tool
>Breakthrough HP technology yields up to 100 times more bandwidth for Linux clusters
>More than half of world's Top 500 supercomputers now running on Intel processors (Intel release)
>Voltaire made its debut on the TOP500 list with four supercomputer clusters
>Dolphin SCI Interconnect Selected for International Space Station Training Simulator
University of Tennessee researchers analyse process fault tolerance on HPC systems
Heidelberg 24 June 2004 Graham Fagg and Edgar Gabriel won one of the ISC2004 Awards with their paper on the extension of the MPI specification for process fault tolerance on high performance computing systems. They were invited to present their work to the ISC2004 audience and talked about the trends in high end systems with thousand of processors. They tried to define the behaviour of MPI in case an error occurs but stressed that most current systems are robust and do not crash because of a node failure.
Advertisement
Visit our sponsors
Advertisement

Graham Fagg stated that one has to give the application the possibility to recover from process failures. Therefore, a regular, non-fault tolerant MPI programme will run using FT-MPI. He offered a detailed description of the process, summarizing that fault-tolerance for MPI applications is an active research target. A large number of models and implementations is available.

The semantics of FT-MPI is very close to the current specification design of FT-MPI and it is in the spirit of MPI.

The speaker proceeded by giving a classification of FT message passing systems which can be automatic or non-automatic. The general steps when dealing with fault tolerance are failure detection and notification.

The different FT-MPI communicator modes include:

  • Abort: just do as other implementations
  • Blank: leave hole
  • Shrink: re-order processes to make a contiguous communicator
  • Rebuild: re-spawn lost processes and add them to MPI_COMM_World
  • Reset: ignore and cancel all currently active communications and requests in case an error occurs. The user will re-post all operations after recovery.
  • Continue

The collective communication modes can be atomic meaning that either everybody succeeds or nobody. This is good for the user, but bad for the performance. It can also be non-atomic meaning that if an error occurs the outcome of the collective operations has to be analysed.

The implementation details include a user application, the MPI library layer, derived datatypes, a buffer management, message lists, non-blocking queues, MPI level, a datatype engine 1, message management, a datatype engine 2, Myrinet,

TCP ShMem, and the hardware level.

The architecture offers high level services, running outside of a core using an

MPI application. There is a pre-recovery, a recovery, and a post-recovery stage. In the recovery, the leader or co-ordinator collects failure information and using user parameters to build a new global store, and to spawn a replacement.

In the post-recovery, all processes filter MPI objects based on the agreed state

and restart the previous communication channels.

There has been a performance comparison with non-fault tolerance in which a latency test-suite for small messages was used. Graham Fagg also talked about the PSTSWM and HPL benchmarks.

Edgar Gabriel described how the fault-tolerant parallel CG-solver is tightly coupled and can be used for all positive-definite, RSA-matrices in the Boeing-Harwell format.

If the application shall survive one process failure at a time, then there is a

recovery procedure to rebuild the work-communicator, recover the data, and reset the iteration counter on each process. If your application shall survive two process failures, it is determined with x as in the Red-Solomon Algorithm.

The FT-PCG performance on an AMD64 cluster constitutes a master-slave framework

which is useful for parameter sweeps. The basic concept consists of a master that keeps track of the state of each process.

Fagg and Gabriel concluded by saying that FT-MPI currently implements complete MPI-1.2 and some parts of MPI-2. In the future, FT-MPI will continue to advance towards full MPI-2. It is used as the basis for testing new methods in fault tolerant application development and design. FT-MPI has surely contributed to the new open MPI project.

The FT-MPI first full reease was at SuperComputing 2003. The next release 1.0.2. is expected at the end of August 2004 and will have more MPI-2 support, better collectives, and new DDT.

More project information can be obtained at http://icl.cs.utk.edu/ftmpi

Advertisement
Advertisement
Leslie Versweyveld

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!