logo

EnterTheGrid - PrimeurMonthly

EnterTheGrid - PrimeurMagazine is the premier Grid Computing and Supercomputing information source in the world. With PrimeurMonthly we provide you a free update with grid computing and supercomputer-news and in-depth analysis.

>PrimeurMagazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
Contents March 2003
BASF connects supercomputers and office PCs with Platforms' LSF
Munich 10 February 2003 Badische Anilin- und Sodafabriken (BASF) connects its IBM Supercomputer, Linux Clusters and more than 300 office PCs, NT cluster, with LSF from Platform Computing, Toronto. This allows a more efficient and automated production of scientific research. Additionally this is a real grid approach.
Advertisement
Advertisement
Visit our sponsors

BASF Ludwigshafen BASF is a transnational chemical company that aims to increase and sustain its corporate value through growth and innovation. The company's product range includes high-value chemicals, plastics, colorants and pigments, dispersions, automotive and industrial coatings, agricultural products and fine chemicals as well as crude oil and natural gas.

BASF's approach to integration, known in German as "Verbund", is one of its particular strengths, ensuring cost leadership and a unique competitive advantage. With sales of about EURO 36 billion (circa $36 billion) in 2000 and over 90,000 employees, BASF is one of the world's leading chemical companies. BASF acts in accordance with the principles of Sustainable Development.

The department Polymer Research and Polymer Physics, GKP, of BASF is the competence centre for physical problems in chemistry. Two teams are involved in molecular modelling. Dr. Erich Hädicke's laboratory realises mesoscopic modelling using the programme Mesodyn as well as with atomistic simulations. Dr. Horst Weiß' lab concentrates on quantum chemistry. His team uses the programmes CPMD and Turbomole to simulate the mechanisms of chemical reactions. An example of their activities is the catalysis of polymerization reactions. Both teams have an impressive computer equipment.

The Computational Equipment

The numerical simulations require different types of computers. The actual working horse is the parallel computer IBM RS/6000 SP with 4 SMP (symmetrical multiprocessor), each of it is equipped with 16 processors, Power 3 with 375 MHz. The aggregated main memory sums up to 64 GB. The BASF configuration uses a local disc per node, 33 GB, for scratch space and the operating system AIX. This machine is used for high-end, time-critical and huge Turbomole runs.

Since 1997 they run self-made Linux Beowulf-Clusters - in the meantime the third generation. One consists of 24 CPUs, 24 GByte memory, AMD MP 1900+, the other of 40 CPUs and 40 GByte of the same AMD processors.

Additionally, the team uses more than 300 office PCs of the plastics laboratory for quantum chemical computations with Turbomole since July 1999. A self programmed software schedules the jobs on this "leisure-time" cluster. This solution is ideal for "embarrasing parallel" problems and those which are not time critical with high throughput.

These heterogeneous computers, 128 Unix (AIX)/Linux CPUs and 300 NT-PC CPUs called for a uniform resource management system.

Some years ago, the team automated the computations. Interactively the scientist built molecule structures. Then he accessed starting structures, which he then modified. He chose the computational method and produced the input. Discussing the usage of the computers with his colleagues, he selected the available computer and submitted the job. Another issue was the copying and retrieving from input and output files from all the different machines to his own workstation by hand. This was very time consuming.

The Automation of the Computational Procedures

First the scientist creates a chemical model and the technical approach. Here the best practice approach is used, expert knowledge is integrated in the work flows. But there is no way so far to choose automatically the best chemical model. In the job preparation step he uses the directory structure, start structures and produces inputs.

Then he submits the job, there is job maintenance, he collects the results and analyses the results. Horst Weiß defined the work flows: "A work flow in our context is an automated procedure to carry out a full task of modelling. Examples are the geometry optimisation and single point energy, transition state search, the computation of a complete reaction sequence or the generation and pre-optimisation of a whole batch of start structures."

Technically, a work flow is a short, less than 20 lines, shell script which calls other work flows or Perl-Programmes. All modules, programmes to do the job, are written in Perl.

Load Sharing Facility

A central element in this work flow approach is the batch queuing system LSF (Load Sharing Facility) from Platform Computing, Toronto. LSF chooses the appropriate computer and the job execution, supervises the job and the file-handling. After intensive benchmarking BASF decided to introduce LSF because of its proven stability, flexibility, its co-operation with the IBM SP-architecture and the well-designed concept.

Now BASF scripts read the actual load of the different hosts, accessing the LSF information. Then they define the resources requirements, the number of needed processors and the appropriate architecture for this job. LSF takes the job and is used as a pure batch queuing system. Because of the resources requirements LSF decides which is the optimal computer. He puts it into the queue and transmits it to the executing machine, if she is available. If different hosts are available, LSF sends the job to that which is less loaded or free. The BASF and Platform experts extended the system by pre- and post-execution activities.

In the execution step, before starting the job, they process some job specific adaptations, e.g. the automatic transferring of the input and output files from the user's workstation to the LSF scratch directory (pre-execution) and back (post-execution) as well as the queuing and distribution of the jobs to the hosts, when they are free. Especially these tasks relieves the user from the time-consuming copying and transferring to the hosts by self-written scripts.

Grid Computing on Office PCs - Virtual NT Cluster in the Spirit of SETI

The flexibility of LSF allows the integration of the BASF PC-Batch System with little overhead into the work flows. Twenty reaction possibilities with 5 parameter variations result in 100 batch jobs which can run on the PCs, extremely cost effective - nearly cost free - and independent of each other.

This approach frees the big systems for jobs, whose results are needed very fast. In this case LSF send the jobs to a predifined "interface host" and starts there the BASF-developed job management system. The reason is the privacy of the PCs. A LSF solution would be available, but is too expensive. Even with these mass jobs, LSF sends the job outputs directly into the users' directory.

Currently the scientists use 324 PCs, 160 PCs with the job class "big", >400 MHz, 86 PCs "medium", 350 MHz, and 78 PCs job class "small", 233 MHz.

LSF and the IBM SP

BASF installed in December 2000 its IBM SP with the new SP2 Switch and the IBM product LoadLeveler. In the heterogeneous computer environment of BASF, LSF, the platform independent batch queuing system is used and not the LoadLeveler. A team composed of specialists from IBM and Platform Computing integrated LSF into the SP with new components as the Switch. They realised beneath the batch queuing and the accounting the access to IBM's Switch tables - in co-operation with the IBM benchmark centre in Poughkeepsie, which answered questions concerning the APIs. Now the system administrator need not install and configure the LoadLeveler.

With LSF, the BASF chemists can view its heterogeneous computing platforms - IBM SP, Beowulf Clusters and more than 300 Office PCs as one computing system. Depending on the load, LSF automatically realises different strategies. It chooses a less optimal computer, if the best is overloaded. Other important aspects are the automatic copying of the files from the users' workstation to the computing engine and the results back without manual operations as needed before the LSF implementation. This frees the user from copying, managing and deleting files from the host to the workstation and vice versa.

The Result

"The batch queuing system LSF proved successful in a heterogeneous computing environment. It freed the scientist from unnecessary manual work. I estimate, we saved about 20% working hours per employee. Now he can concentrate on his scientific and research work. This results in a better computer load too, it grew from 70% on the IBM SP now to 95%. On the Linux cluster it is lower but in total 70%. LSF is a building block for job scheduling and automatisation. Since October 1999 we processed more than 150.000 standard jobs with 5 people. Compared to 2 years ago, we now process 10 times more structures because of less interactive work and automatic, more robust and faster methods and work flows", Dr. Horst Weiß, head of Quantum Chemistry Lab, BASF. "We have further demand and are now evaluating the accounting system in LSF for a better calculation of the single projects."

Advertisement
Visit our sponsors
Advertisement
Visit our sponsors
Uwe Harms

EnterTheGrid - PrimeurMagazine

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - PrimeurMonthly