Data, data, metadata

Bristol, 03 December 98 A few decades ago, all the data that were used to write a scientific article, were included in the article itself. Technical data were available in reference books that summarised all the basic data needed in a specific engineering area. As Kerston Kleese pointed out at the RCI conference, we have come a long way. Specialized meta data centres, as the one at the British Central Laboratory of the Research Councils for which Kleese is responsible, host several specialized data centres.

The data sets themselves grow exponentially. For instance the European Centre for Medium Range Weather Forecast (ECMWF) doubles the amount of data stored every 18 months. A few of the data centres are: The World Data Centre for Solar-Terrestrial Physics; the Space Data Centre; and The national British Atmospheric Data Centre.

It is becoming increasingly important for these data centres to support data discovery and interpretation. Hence, they add data analysis and visualisation software to their service and are moving towards interactive Virtual Reality.

Because of the huge amount of data, especially the first phase of data selection and data reduction has to be done as close to the data as possible. Hence the data centre are more and more providing systems and tools for this. Parallel software to support I/O mainly is or will be based on MPI-2 or OpenMP. HPF and Co-array Fortan lack performance.

Not only is the amount of data growing rapidly, also the quality is improving. Hence, the data centres have to start asking themselves what to do with older data: keep it or destroy it. According to Kleese this is a growing problem. CLRC data management web site .


Ad Emmen