logo
EnterTheGrid - Primeur Live!

EnterTheGrid - Primeur is the premier Grid and Supercomputing information source in the world. With Primeur Live! it brings you Live reports from Europe's main Supercomputing and Grid events

>Primeur Magazine
>PrimeurLive!
>EnterTheGrid
>Analysis
>Backissues
>Calendar
>Subscribe
>Advertise
>Contact
Issue 25 June 2003
>Start
>More participants at ISC2003 supercomputing conference in Heidelberg
>Focus
>ALiCE Cluster in operation for 3 years
>TOP500 supercomputing
>Clusters march into TOP3 of the TOP500 supercomputer list
>Europe losing ground in the TOP500
>A 100 Tflop/s supercomputer for the UK in 2005
>The Earth Simulator evaluated after its first year in business
>Selecting the best suitable HPC architecture for Earth system modelling at ECMWF
>Community Climate System Model to simulate ocean, land and atmospheric models
>The future of anthropogenetic, historical and evolutive climate change research in Europe
>The Grid
>On-line Science the worldwide telescope as a prototype for the new computational science
>HPC and Grids in Asia
>Company news
> SuSE Linux enterprise server selected by Cray for Department of Energy's new "Red Storm" supercomputer
>First 15 nodes of the NEC SX-6 supercomputer installed successfully at UK Met Office in Exeter
>ClusterVision to install first supercomputer cluster in Europe based on Infiniband technology
>Intel, Swiss Institute of Bioinformatics and HP open Life Sciences Center in the Swiss Biotech Valley
On-line Science the worldwide telescope as a prototype for the new computational science
Heidelberg 25 June 2003 In the keynote talk Jim Gray from Microsoft Research discussed new aspects in handling and analysing data that is held in databases and huge files. He presented the evolution of X-Info, the World Wide Telescope as Archetype and did Data Mining, the Sloan Digital Sky Survey.
Advertisement
Advertisement
Visit our sponsors

First he presented the Evolution of Science and divided into:

Observational Science

  • Scientist gathers data by direct observation
  • Scientist analyses data

Analytical Science

  • Scientist builds analytical model
  • Scientist makes predictions.

Computational Science

  • Scientist simulates analytical model
  • Scientist validates model and makes predictions

Data Exploration Science

Data captured by instruments

Or data generated by simulator

  • Processed by software
  • Placed in a database / files
  • Scientist analyses database / files

The Information is growing to an avalanche because of better observational instruments and better simulations. They produce a huge amount of data. He gave some examples, the turbulence produces 100 TB by simulation, then the scientist has to mine the information. Another extreme example is CERN, the LHC will generate 1GB/s that sums up to 10 PB/y.

The next-generation data analysis looks for needles in haystacks: the Higgs particle for example. The haystacks are for example dark matter and dark energy. Global statistics have poor scaling. The correlation functions are N 2, likelihood techniques N3. As data and computers grow at the same rate, we can only keep up with N logN. He presented a way out, e.g. one has to discard notion of optimal (data is fuzzy, answers are approximate) and can not assume infinite computational resources or memory. To solve these problems a combination of statistics and computer science is necessary.

Another important issue is the data access. It is hitting a wall, as FTP and GREP are not adequate. One can GREP 1 MB in a second, 1 TB in 2 days, 1 PB in 3 years - this means ~5,000 disks.

Thus one needs at some point indices to limit search and parallel data search and analysis.

Smart Data (active databases) allow to take the analysis to the data and do all data manipulations at database. One can build custom procedures and functions in the database and use integrated tools. Jim Gray proposed to use clever data structures (trees, cubes), fast approximate heuristic algorithms and to take the cost of computation into account - best result in a given time, given our computing resources.

Data Federations of Web Services

The massive datasets live near their owners, near the instrument's software pipeline, near the applications, near data knowledge and curation, and Supercomputer centres become Superdata centres. Each archive can be published as a web service. Then scientists get "personalised" extracts and have uniform access to multiple archives. The web services can be the key. The Web SERVER, when given a url + parameters returns a web page (often dynamic). The Web SERVICE, given an XML document (soap msg), it returns an XML document.

Then he discussed the issues of the World Wide Telescope Virtual Observatory. The Internet is the world's best telescope, as it has data on every part of the sky in every measured spectral band: optical, x-ray, radio, and it is up when you are up. He discussed some of the questions researchers asked in the astronomy and presented examples to find stars easily or with a high effort.

In the end he called to action. If you do data visualisation: we need you (and we know it). If you do databases, here is some data you can practise on. If you do distributed systems, here is a federation you can practise on.

If you do data mining, here is a dataset to test your algorithms.

If you do astronomy educational outreach, here is a tool for you.

http://research.microsoft.com/~gray

http://www.astro.caltech.edu/nvoconf/

http://www.voforum.org

http://www.sdss.jhu.edu/ScienceArchive/sxqt/sxQT/Example_Queries.html

.

Advertisement
Dolphin's SCI interconnect features the lowest latency and wire speed
Advertisement
Visit our sponsors
Uwe Harms

EnterTheGrid - Primeur

James Stewartstraat 248

1325 JN Almere

The Netherlands

http://EnterTheGrid.com

mailto:primeur@hoise.com

© EnterTheGrid - Primeur Live!