The Grid is not only about high-performance computing, as Mr. Jean-Raoul Scherrer from the Swiss Institute of Bioinformatics explained at the Grid workshop, organised in March 2001 by the European Commission in Brussels. Hundreds of large medical and bio-molecular databases and information sources already exist, that can benefit from Grid technologies to make them more accessible. "If it is allowed somehow to consider the Grid as an offspring of the HPCC action, one large part of the use of such an infrastructure should be devoted to Education, Research and Health (ERH) dealing with co-ordinated resource sharing and problem solving in dynamic, multi-institutional organisations", Mr. Scherrer noted.
The information on the World Wide Web is unstructured. Many distributed, multimedia and multilingual tools have been developed to help users search for useful information, such as subject hierarchies, general search engines, browsers, and search assistants. These tools, however, lack a lot, mainly in terms of precision, multilingual indexing, and distribution. That is why the Swiss have developed MARVIN, a multi-agent indexer. For a given domain, MARVIN filters all relevant documents from a set of Web pages, by following links to new documents. Currently, MARVIN is implemented for medical information by Health on the Net, and consists of a set of agents running in parallel which download, filter, and index the Web pages.
The indexing can be performed by several agents running in parallel, even distributed over a cluster of workstations. The agents co-operate in order to synchronise their activities. As an example of co-operation, before filtering and indexing a page, an agent checks that the page has not been previously analysed or is presently indexed by another agent. The agent broadcasts a message to all other agents and only starts analysing the page, if the action is okayed by the other agents. Agents can be specialised, for instance in analysing documents from a given Internet domain. The global index can therefore be distributed over several local indexes, when agents are run on remote workstations.
MARVIN uses a purpose-built medical dictionary of 20.000 words, as well as the 33.000 Medical Subject Headings (MeSH) from the U.S. National Library of Medicine, to automatically retrieve the non-reviewed sites. Each word has been given a weight describing its relevance and specificity in documents for health and medicine. The weightings have been determined by performing a statistical evaluation of all the words contained in a pre-selected set of 1000 medical documents.
MARVIN is able to store selected documents in a database which users can query, for example MedHunt, HON's own medical search engine. MARVIN is also applied to a variety of scientific domains, such as molecular biology and 2D electrophoresis, constantly feeding and updating the different databases. The MedHunt global database is organised in four categories: a general one including all the health and medical Web sites, and three others dedicated respectively to the hospitals, support groups, and conferences. MedHunt also provides a simultaneous translation of the queries in eight languages with a pre-formatted search associated with the translations.
According to Mr. Scherrer, being able to ask questions such as "How does a disease tissue compare with healthy tissue?" based on data acquired from large databases all around the world, will enormously increase our understanding of biology, and bring us towards the understanding of function. For more information and details, you can check in at the Health on the Net Web site.