Navigation

Back to Table of Contents


© The HOISe-NM Consortium 1997


mail the editor



TURBOMOLE speeds up quantum chemistry computations on Karlsruhe 256 processor SP2

Mannheim, 21-6-97 TURBOMOLE is a program package for quantum mechanical ab initio computations of the electronical structure of molecules. This program (about 500,000 lines of code, mostly Fortran) was developed for RISC workstations. It was optimised for cache structures and register reuse. At the Mannheim Supercomputer Seminar, Professor Reinhard Ahlrichs, Theoretical Chemistry University Karlsruhe, ported it to parallel machines like the new 256 node IBM RS6000/SP in Karlsruhe. Porting does not imply parallelisation of the programme: Ahlrichs does not want to repeat the job every 3 years.

The TURBOMOLE package is used in academic and industrial environments. Ahlrichs and his group have computed systems with 300 atoms. He mentioned that they do not parallelise the program by hand for good reason: every 3 - 4 years a new method is introduced,which means that all the parallelisation efforts have to be done again from scratch. To ease automatic parallelisation Ahlrichs relies on data replication and message passing paradigms.

He presented an example with two different methods that have a different distribution of the computing time, the DFT (density functional theory), 87% (82 hours) in classical Coulomb, 11% (10.5 h) in DFT, 1% (.9 h) in linear algebra and .3% (.3 h) for the rest - that makes a total of 3 days, 22 hours.

The new method RI-DFT (resolution of the identity) needs 36% (6.6 h) in RI-DFT, 57 % (10.5 h) in DFT, 5% (0.9 h) in linear algebra and 2% (.4 h) for the rest, total of 18.5 h. This requires different parallelisation strategies, as the percentage is distributed differently.

With DFT for SiAl14 one needs about 720 minutes wall clock time on 8 processors, compared with about 400 on 16 proc., 230 on 32 proc. and 130 on 64 processors - the ideal figure is about 90 on 64 proc. The RI-DFT method needs for the same molecule 170 minutes on 8 proc., 110 on 16 proc. and about 60 on 32 proc. The speedup is being reduced with this faster, new method. The efficiency comes close to 60% and can be improved by bigger main memory (300 MB) to reduce paging, more memory per node and a highly parallelised and optimised library for matrix algebra.

An important new element of the latest generation of parallel machines is the bigger memory on for instance the IBM SP. In the past Ahlrichs always computed integrals on the fly, but now it is possible to store important data.

As a heavy supercomputer user, Ahlrich has the following requirements on large parallel machines:

  • the machine has to run stable
  • the machine should be used for big computations only, e.g. a job with 24 hours wall clock time on 32 nodes should be delivered within 48 hours
  • only qualified access, not hundreds of jobs from users without workstations

Uwe Harms

Top of Article