Frequent users of large databases and retrieval systems know that the search for relevant information can be very time-consuming, not only because enormous quantities of data are stored everyday but also because the maintenance of logic and hierarchical relationships between the data is a far from evident task. In March 1998, five partners have started a two year Esprit funded project to tackle the problem of effective information retrieval from medical databases by parallelizing the process through the use of a scalable High Performance Architecture (HPA). The team has the strong ambition to develop a metacomputing search machine, described in clear terms as CIPRESS, a Complex Information Patterns Retrieval with a parallel distributed processing knowledge Engine Search System.
The Internet and medical databases, such as Medline, DrugLine and CANCERLIT, are good examples of vast sources of information. Today, it has become extremely difficult for professionals to select qualitative data of the required relevance in a reasonable amount of time. In order to assist health care providers in their search for specific data about diseases, drugs, diagnosis, therapies and even market information, the CIPRESS project team has based the development of the Knowledge Engine on the two principles of association and learning. The super search engine will be able to sort out the required data through the detection of factual relations in documents and reports and through the building up of a hierarchical keyword dictionary to keep track of previous contexts.
The starting point for the project constitutes an existing high performance database engine for general free text retrieval. Incorporated in a parallelized HPA structure, the initial system will be extended to an analyser to convert the collected data into a standard hierarchical format for the generation of mutual associations. The user will be supported in his search by a highly sophisticated neural network to guide the retrieval process via a keyword dictionary, looking for terms which are actually present in the documents. The neural network is responsible for the smooth monitoring of both the associative task of the analyser and the dynamic interaction of the user with the system. The analyser as well as the neural network are developed by Search&Find Technology, a Swedish company focusing on the design and commercialisation of search engines for databases.
The third component is formed by a user interface to facilitate remote access of the CIPRESS services, located on the various servers, and to exploit the resources of the different processors in the best possible way. The Italian company Aleph Informatica SRL, specialized in distributed object oriented technologies, will build the platform independent Java-based user interface, applying the standardized TCP/IP (Transmission Control Protocol/Internet Protocol) communication protocol between client and server. Aleph will also take care of the synchronization between the interface and the database to dynamically adjust the user's requests to the available functionality of the server at a particular time.
To transform the CIPRESS Knowledge Engine into a metacomputing search system, high performance computing and networking (HPCN) techniques will have to be installed at three levels. First, the search engine has to be able to explore all textual and numerical databases in a parallel and distributed way to establish a library of logically structured and easy to search information, containing items like documents, paragraphs, sentences, words, numbers, records, and fields. The second level involves the analyser, which has to reformat the meta data for the building and maintenance of associations in the library. In third place comes the neural network for permanent update and simulation of the learning process.
The CIPRESS team has selected the medical sector as the primary field for implementation since a very large market is already in place here to supply sophisticated information services to. In fact, the Karolinska Institute, a medical research institute in Sweden, renowned for the distribution of the medical Nobel Prize, will participate in the project, both as developer of the neural network and as an end-user, specialized in virtual medical libraries. The Knowledge Engine will be made available through commercial services for doctors, pharmacists, and scientific and industrial researchers. In this regard, the Italian company, Arakne and HealthGate Europe, the European subsidiary of the US HealthGate Data Corporation, will play a major role as online content providers of medical and health information services.
In the long run, the CIPRESS Knowledge Engine will be used in various other domains too, such as the industrial and military intelligence, scientific and financial areas, and even marketing organizations, as a powerful solution for information management and retrieval. For more details about the further development of this project, we refer to the CIPRESS home page.