Brain data and the knowledge Grid

Amsterdam 06 March 2001There are many data and information sources available that could be useful to neuroscience. However, they are stored in different formats and put together by experts in specialised domains. Hence they are incompatible, not only in format but also in terms and wording used. As Bertram Ludaescher, an NPACI representative from the San Diego Supercomputer Centre explained at the First Global Grid Forum Conference, creating a mediator service which understands the user's question and can translate it to the language used by the different information sources to get a composite answer, could be the solution.


Data are stored on storage media. To create a usable knowledge management infrastructure from these data sources, as for instance the CCMIR database from Caltech neuro-surgery, one has to build first a data management infrastructure. Elements which are used at NPACI, and that build on top of each other include:

  • Data management infrastructure (DICE/NPACI)
  • MIX - Mediation in XML
  • MCAT information discovery - a meta-data catalogue
  • SRB data handling
  • HPSS storage

On the Brain Data Grid, resources are maintained by several scientific groups. What they do is create data products, combine them in collections, add meta-data to them, and make them available for sharing. The size and packing of data is in many cases still a technical challenge, as is the heterogeneity of data types, storage technologies and transport mechanisms.

Making them available as a Data Grid also poses security problems. One has to take care of distributed data management storage and implement a request-brokerage service on the meta-data. Also, the resource sharing problems have to be solved. Resources not only include the data, but also the network and the computer cycles available.

Then, on top of this data management infrastructure, the so-called Data Grid, one can start building a knowledge-based Grid infrastructure to be able to extract semantic information from the underlying data sources. At NPACI, they implement a mediation architecture for that. In a client-server fashion, a user can send off XML queries to the mediator which searches the databases and composes an answer.

Integration over the Grid of different services, is called federation of services. The services look all the same to the mediator, because a wrapper handles the traffic from and to them. To the mediator, all wrappers look the same. To the information server, the wrapper looks like just another of its query modules. So the wrapper takes care of the translation.

When for instance a medical researcher wants to ask a specific question, he should ask the mediator. The mediator will disassemble his question into queries to the different databases which it can access through wrappers. This can be for instance a protein location database or databases with neuro-information.

Simple as it sounds, in practice there are many problems. For instance, in each specific sub-domain different objects, terms and words are used that could have the same meaning. The mediator has to know about all of these. So a knowledge-based semantic mediator must integrate information from different worlds. It should join compatible terms and be able to do complex associations.

In a lot of information sources, there are unstated integrity constraints. They are unstated because everyone in the specific field knows about them and will never cross the border. On the other hand, even experts from related fields may have difficulties with them. Hence they must be made explicit for the mediator.

The approach is to use XML-objects which can be used as classes in a complete theory that includes query possibilities like if-then. The tool they use for that is Domain MAP. The sources are registered to the mediator server. There is a Domain MAP registry that provides concepts of knowledge domain space and ontology. This type of techniques will later on also be integrated into larger applications like tele-microscopy.

Ad Emmen

