logo

EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world.

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
News digest 25 June 2004
>Start
>PrimeurLive! from ISC2004 in Heidelberg
>Blog
>Castle Party
>Be honest
>Applications
>TENT and DataFinder systems allow complex numerical simulations at the German Aerospace Centre
>ISC2004 AMD Award honours TeraGyroid experiment
ISC2004 AMD Award honours TeraGyroid experiment
Heidelberg 25 June 2004

Stephen Pickles, one of the winners of the ISC2004 AMD Award, presented the TeraGyroid experiment. The project is funded by EPSRC in the UK and NSF in the USA to join the UK e-Science Grid and the U.S. TeraGrid. It constitutes an application from RealityGrid, a three-and-a-half year UK e-Science project including work exhibited at SC 2003 and SC Global in November 2003. It has received thumbs up from TeraGrid in mid-September of last year and funding from EPSRC approved later. The main objective was to deliver high impact science which it would not be possible to perform without the combined resources of the U.S. and UK Grids.

Advertisement
Visit our sponsors
Advertisement
Visit our sponsors

The TeraGyroid experiment studies defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods. The project featured the world's largest Lattice Boltzmann simulation. TRICEPS was the HPC challenge aspect of this work. TRICEPS stands for Transcontinental RealityGrids for Interactive Collaborative Exploration of Parameter Space. It was honoured as the "most innovative data-intensive application" at SC 2003.

Steven Pickles explained that a RealityGrid generalises the concept of a Reality Centre across a network of computational, visualization and data resources managed by Grid middleware. It optimises the scientific discovery process by integrating simulation, visualization and data from experimental facilities using real-time data mining. It builds on and extends the functionality of a Data Grid. New middleware issues arise because a RealityGrid must address synchronicity of resources and their interaction.

The Lattice-Boltzmann 3D (LB3D) code is written in Fortran90 and parallelized using MPI. It scales linearly on all available resources and uses the parallel data format PHDF5. The data produced during a single cycle run can exceed 100s of gigabytes to terabytes. These simulations require supercomputers, as the speaker stated, since high end visualization hardware and parallel rendering software are needed for the data analysis.

The LB3D is instrumented for steering using the RealityGrid steering library. Stephen Pickles explained that malleable checkpoint/restart functionality allows to "rewind" simulations and run time job migration across architectures. The steering reduces storage requirements because the user can adapt data dumping frequencies. It is possible to save CPU time because users do not have to wait for jobs to be finished if they can already see that nothing relevant is happening. Instead of doing "task farming", the parameter searches are accelerated by "steering" through the parameter space. The analysis time is significantly reduced because less irrelevant data is produced. The speaker showed how this was applied to the gyroid mesophase study of amphiphilic liquid crystals at un precedented space and time scales.

The aim is to use federated resources of US TeraGrid and UK e-Science Grid to accelerate scientific processes. The strategy is to map out parameter space using a large number of independent "small" simulations. Then, their behaviour is monitored using on-line visualization. Hence parameters for high-resolution simulation can be identified. Selected simulations were used for long-time studies. All simulations were monitored and steered by a geographically distributed team of compuational scientists at four different sites, as Steven Pickles explained.

The team steeres the checkpoint restart. It is possible to explore different branches of the tree and obtain different results. Then you can give computational scientists unrestrained access to a Grid, and they will quickly litter it with input, output and checkpoint files. The team stores checkpoint metadata in a checkpoint tree service. The migration of a running job involves contacting 15 services on up to 12 hosts. The users need friendly tools to manage this so the team built a "wizard" in Qt on top of the Globus scripts.

The visualised output is streamed to a distributed set of collaborators located at Access Grid nodes across the USA and UK who also interact with the simulations. One of the RealityGrid's central aims was to provide Grid-enabled collaboratories and this is now realised. The distributed collaborative activity further accelerates the discovery of new scientific phenomena hidden in terabytes of data, as the speaker outlined.

The hardware involved in the TeraGyroid experiment included the computational power of more than 6000 processors, SGI visualization technology at multiple sites, service registry in Manchester, 20 TB of science data generated in the project, 2 TB of data moved to long term storage for ongoing analysis, and Access Grid nodes in Boston University, Manchester, Martlesham, and Phoenix, as the speaker told the audience.

The software infrastructure consisted of different Globus Toolkit versions, Access Grid 2.0, OGSI Lite, a RealityGrid Steering library and toolkit, VTK 4.0 for visualization, malleable checkpoint/restart, and port forwarding s/w from the EPCC.

What the team learnt in the experiment was that the porting and deploying of applications is still hard since you get little help from the middleware. The question remains where to find the compilers and libraries, where to write files, and to know what is the retention policy. Stephen Pickles found that

evolution, expansion and federation of Grids requires too much co-ordination. The deployment overhead must be lower to engage end-users and you need

third-party file transfers involving "dual-homed" systems. The client on host A wants to move file B from host C to host D using network E but the authentication confuses the host identity with the network address. Moreover,

GridFTP doesn't support different addresses for control and datachannels and it is even harder when host A cannot contact B by its E address.

There is still room for improvement in file transfer performance, Stephen Pickles stated. The debugging of the networks is hard. Researchers are in

trouble when compute nodes are not directly connected to the Internet, which is the common situation on large clusters. Work-arounds are possible, but they are imperfect since this requires port forwarding and process pinning. The team is

still waiting for advance reservation and co-allocation. What they need is

compute, visualization, network resources, and Access Grid rooms, virtual venues and nodops simultaneously at times when it suits the researchers and without involving the system administration.

Still, the TeraGyroid experiment allowed to apply a bleeding-edge distributed collaborative technology for the delivery of new scientific results in materials science. The unprecendented scale enabled the researchers to investigate phenomena which were totally out of reach hitherto, concluded Stephen Pickles.

http://www.realitygrid.org/TeraGyroid.html

Advertisement
Visit our sponsors
Advertisement
Leslie Versweyveld

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!