|
Graham Fagg stated that one has to give the application the possibility to recover from process failures. Therefore, a regular, non-fault tolerant MPI programme will run using FT-MPI. He offered a detailed description of the process, summarizing that fault-tolerance for MPI applications is an active research target. A large number of models and implementations is available.
The semantics of FT-MPI is very close to the current specification design of FT-MPI and it is in the spirit of MPI.
The speaker proceeded by giving a classification of FT message passing systems which can be automatic or non-automatic. The general steps when dealing with fault tolerance are failure detection and notification.
The different FT-MPI communicator modes include:
- Abort: just do as other implementations
- Blank: leave hole
- Shrink: re-order processes to make a contiguous communicator
- Rebuild: re-spawn lost processes and add them to MPI_COMM_World
- Reset: ignore and cancel all currently active communications and requests in case an error occurs. The user will re-post all operations after recovery.
- Continue
The collective communication modes can be atomic meaning that either everybody succeeds or nobody. This is good for the user, but bad for the performance. It can also be non-atomic meaning that if an error occurs the outcome of the collective operations has to be analysed.
The implementation details include a user application, the MPI library layer, derived datatypes, a buffer management, message lists, non-blocking queues, MPI level, a datatype engine 1, message management, a datatype engine 2, Myrinet,
TCP ShMem, and the hardware level.
The architecture offers high level services, running outside of a core using an
MPI application. There is a pre-recovery, a recovery, and a post-recovery stage. In the recovery, the leader or co-ordinator collects failure information and using user parameters to build a new global store, and to spawn a replacement.
In the post-recovery, all processes filter MPI objects based on the agreed state
and restart the previous communication channels.
There has been a performance comparison with non-fault tolerance in which a latency test-suite for small messages was used. Graham Fagg also talked about the PSTSWM and HPL benchmarks.
Edgar Gabriel described how the fault-tolerant parallel CG-solver is tightly coupled and can be used for all positive-definite, RSA-matrices in the Boeing-Harwell format.
If the application shall survive one process failure at a time, then there is a
recovery procedure to rebuild the work-communicator, recover the data, and reset the iteration counter on each process. If your application shall survive two process failures, it is determined with x as in the Red-Solomon Algorithm.
The FT-PCG performance on an AMD64 cluster constitutes a master-slave framework
which is useful for parameter sweeps. The basic concept consists of a master that keeps track of the state of each process.
Fagg and Gabriel concluded by saying that FT-MPI currently implements complete MPI-1.2 and some parts of MPI-2. In the future, FT-MPI will continue to advance towards full MPI-2. It is used as the basis for testing new methods in fault tolerant application development and design. FT-MPI has surely contributed to the new open MPI project.
The FT-MPI first full reease was at SuperComputing 2003. The next release 1.0.2. is expected at the end of August 2004 and will have more MPI-2 support, better collectives, and new DDT.
More project information can be obtained at http://icl.cs.utk.edu/ftmpi
|