|
Starting with a discrete probleme with N > 350 million grid points this leads to a system of N linear equations A*u=f, u temperature on the grid, A a sparse matrix. The objective: compute time is independent from the number of processors p if N/p is constant. Then he listed the requirements, number of operations - complexity - the sequential part is constant, the parallelisable part at most proportional to N. The communictaion costs are proportional to N/p. He proposed a multi-level iteration algorithm.
Then he shortly described the Earth Simulator and the matrix vector product (MVP) on it. It needs 2 loads + 1 store per diagonal, but there is only 1 load/store pipeline. He expects a performance of 1/3 * 8 GFlop/s that means about 2.7 GFlop/s with loop unrolling. In comparison to the NEC SX-5, which has 1 load + 1 load/store pipeline, which results in 2/3 * 8 GFlop/s = 5 Gflop/s.
Another topic is the grid distribution, there is a block distribution on p processors, to achieve scalability, and a stripe distribution. He discussed the pro and cons for different cases. The summary of Lutz Gross:"The (nearly) scalable algorithm may not be the fastest! But eventually will be for large p. Should/can we develop efficient code for large-scale computing?" This is still an open question. |