Tutorials in High Performance Computing in Munich

Munich 18 February 2002 KONWIHR, the Kompetenz-Netzwerk für wissenschaftliches Hoch- und Höchstleistungsrechnen in Bayern, and LRZ, the Leibniz-Rechenzentrum, organise two supercomputing tutorials. OM March 21, 2002: Cache Based Iterative Algorithms and on March 22, 2002: Algorithms for Parallel Computers.

You can apply at the website http://wwwphp.lrz-muenchen.de/kursanmeldung/.

Details for each tutorial:

Cache Based Iterative Algorithms

Lecturer: Prof. Ulrich Rüde, Lehrstuhl für Informatik, Universität Erlangen

Date: March 21, 2002 (Thursday)

Contents: In order to mitigate the effect of the growing gap between the high execution speed of modern RISC CPUs and the comparatively poor main memory performance, computer architectures nowadays comprise several additional levels of smaller and faster cache memories. These cache modules are located physically between the processor and main memory.

Efficient program execution, i.e. high MFLOPS rates, can only be achieved if the codes respect the underlying hierarchical memory design. Unfortunately, today's compilers are still far away from automatically performing code transformations like the ones we apply in order to achieve remarkable speedups. As a consequence, much of this optimization effort is left to the programmer.

In this tutorial, we will first discuss the underlying hardware properties and then present both data layout optimizations, like array padding for example, and data access optimizations, like e.g., loop blocking. The application of these techniques to iterative numerical schemes --- like Gauss-Seidel and multigrid --- can significantly enhance their cache performance and thus reduce their execution times on a variety of machines. We will consider both structured and unstructured grid computations.

Algorithms for Parallel Computers: Classification, Design Principles and Implementation

Lecturer: Prof. Marian Vajtersic Technische Universität München, Institut für Informatik,

Lehrstuhl für Rechentechnik und Rechnerorganisation Date: March 22, 2002 (Friday)

Contents: The efficiency of the exploitation of the computing power of state-of-the-art parallel high-performance computers is heavily dependent on their algorithmic and programming equipment. The development of algorithms for these systems can be seen as a necessary prerequisite to obtain satisfactory performance figures when solving large scientific-computing applications. It is a known fact that for a parallel computer, the difference between the performance of a good and bad algorithm may be easily of a factor 1000 while this gap is much smaller for serial machines.

This course is devoted to the area of parallel algorithms. Its main goal is to present some design principles and selected algorithmic solutions for two of the most frequently used parallel computational models. We will not concern ourselves too closely with the architecture and memory organization of the machine, instead the intention is to extract the algorithmic kernel. Also programming issues are not in the center of our interest when speaking about the implementation, but rather a language-agnostic description of the data distribution and interprocessor communication during the computational process.

Selected algorithms are primarily from the problem area of parallel linear algebra. This is not because of historical reasons (the first parallel algorithms and computers where devoted to the iterative solution of partial differential equations), however the solution of the majority of scientific computing applications requires the use of linear algebra operations.

After some motivational arguments for exploiting parallelism, a brief overview of the parallel computing models will be introduced. Basic construction principles of parallel algorithmics will be the content of the next block. We will show that the design of a parallel algorithm does not necessarily imply straightforward modification of its serial counterpart, but with some intricacy also original parallel methods can be developed.

Discussing the simple problem of a summation of a finite number of vector components, the data-parallel model will be illustrated considering different interconnection topologies. Furthermore, parallelization of Cannon's, Winograd's and Strassen's matrix-multiplication techniques will be presented for this parallel computational model. An efficient approach for histogram evaluation will provide an example for bit-level SIMD (Single Instruction Multiple Data) computing.

The final block brings some algorithms designed for MIMD (Multiple Instruction Multiple Data) parallel computers. We will show how a system of linear algebraic equations with block matrices can be solved by the parallelized LU and matrix-decomposition methods. For a solution of the computationally intensive problem of SVD (Singular Value Decomposition), the two-sided block-Jacobi method will be chosen as a good candidate for MIMD-parallelization. Two approaches will be compared: the first one takes a given prescribed ordering for parallel solving the block-subproblems, while the latter one selects the subproblems in a greedy manner. The implementation results will accompany this discussion. As a concluding example, the application of the parallel Jacobi-based SVD solver will be presented.


Ad Emmen

[News on Advanced IT]   [Calendar]   [Analysis]   [IT in Medicine]