The aim of DataGrid is to deliver the necessary tools to the next generation of scientific explorations which require intensive calculations and the analysis of enormous distributed databases, as stated by Mr. Gagliardi. The need to distribute very large data volumes and computing resources between communities spread over the whole world is felt in many scientific disciplines including particles physics, biology, and earth sciences.
Distribution however is difficult because of the inevitable diversity of the resources, the dispersion of scientific users, the size of the databases, and the limited speed of networks. In order to solve these problems, DataGrid is developing the necessary computing tools in collaboration with the best centres of excellence in this domain, as explained by the speaker.
The contract that was signed with the European Union on December 29, 2000, provides a 9,8 million euro funding over three years to deploy the DataGrid project. The six principal partners are CERN, the European Particle Physics Laboratory in Switzerland; CNRS, the French National Centre of Scientific Research; ESA/ESRIN, the Centre of the European Spatial Agency in Italy; INFN, the Italian National Institute of Nuclear Physics; NIKHEF, the Dutch National Institute of Nuclear Physics and High Energies; and PPARC, the British Council of Research in Particle Physics and Astronomy.
The other project partners are ITC-irst in Italy; the University of Helsinki in Finland; the Swedish Council of Research; the Konrad Zuse Centre for Information Technology in Berlin (ZIB) and the University of Heidelberg, both in Germany; the Commission of Atomic Energy in France; IFAE in Spain; CNR in Italy; CESNET in the Czech Republic; KNMI and SARA in the Netherlands; and MTA-SZTAKI in Hungary.
Three industrial companies are also associated to the project in partnership with all these organisations spread over Europe: Communication et Systèmes in France; Datamat in Italy; and IBM-UK in the United Kingdom.
Mr. Gagliardi summarised the principle of how a computing grid functions as follows: when a user submits a query, it is analysed to evaluate the computing needs required to fulfil the request. An arbitrating algorithm then searches through the grid for the available resources and assigns them to the task under consideration. In the same way, the necessary data to execute the task are retrieved and transmitted if needed to the available computing cycles. The different tools allowing to analyse the queries; to know at every instant the available resources on the grid; to execute the arbitrating process and the data transmission, the follow-up of operations, and the registration of possible errors are all developed by DataGrid.
Another critical activity performed by DataGrid consists in supervising the computer networks in order to determine the best quality of service, to anticipate problems of transmission, etc. DataGrid will use the new European high-performance infrastructure Géant to conduct the different tests, according to Mr. Gagliardi.
DataGrid has selected applications adhering to three different scientific domains to validate the computing grid. CERN, represented by Mr. Gagliardi, is piloting a high energy physics application in which a solution must be found allowing to store and process the enormous flow of data produced by the experiments with the new LHC particle collision machine currently in construction at CERN. The aim of DataGrid is to demonstrate the feasibility of this approach by implementing an integrated computing service with access to the data used in the LHC experiments across an international distributed environment.
The analysis of data provided by the ozon layer, an application launched by ESA, offers a similar challenge, because of the multiplication of data sources originated by the many devoted data captors. Finally, the applications in biology, initiated by CNRS, allow transparent access to the many international databases and to the distributed processing of genome decoding algorithms and of medical images via an access portal over the Internet.
All these applications are currently being deployed in a test-bed under the responsability of CNRS using the first integrated version of the DataGrid tool. Five major sites participate at the moment to these initial tests in France, Switzerland, Italy, The Netherlands, and The United Kingdom. The certification authorities, qualified to deliver informatic passports allowing to circulate freely across the grid, have been established in a co-ordinated manner in each country with mutual identification coding.
Mr. Gagliardi announced that the numerous data originated by the first tests will allow to deploy the applications on some forty European sites during 2002 while the second series of tests with additional and more complete features will be prepared for Autumn 2002.
The DataGrid partners are very anxious to permanently connect the academic research with eventual commercial opportunities. Various industrial companies are therefore being informed about the progress of DataGrid during regularly organised forums. This has to guarantee a solid rooting of these new concepts in the industry searching to increase their productivity through the implementation of their own grid, as well as in these branches that wish to commercialise the grid services, as Mr. Gagliardi pointed out.