Nowadays the scientific community has access to many on-line bioinformatics resources. This number grows exponentially day by day. In biomedical research, the scientific community has access to more and more resources generated by researchers - databases, software, multiple resources, which should speed up scientific progress. Discovering, locating and learning how to use new applications has a cost - especially in terms of time - that most researchers cannot afford. For this reason, existing resources need to be organized to make these search tasks as straightforward as possible.
Led by Professor Víctor Maojo, a team of researchers from the GIB at the UPM's Facultad de Inform´tica - Guillermo de la Calle, Miguel García-Remesal, Diana de la Iglesia and Stefano Chiesa - have developed an innovative methodology designed to discover, retrieve and automatically classify bioinformatic resources from specialized scientific literature. The developed index of resources is freely available via the web application hosted at the server.
The methodology is based on natural language processing and artificial intelligence techniques used to retrieve and automatically classify key information contained in scientific articles, primarily abstracts. Each article is analysed morphologically, syntactically and semantically in search of a series of set patterns that are able to automatically identify the names, functionality, access URL and, in some cases, the resource inputs and outputs without user intervention.
Additionally, the resources are classified by two dimensions: (i) the application domain, e.g. DNA or proteins, and (ii) the category (functionality/type) of the resource, e.g. alignment, database or annotation. For the purposes of classification, the application uses a taxonomy of domains and categories specially designed for this purpose and based on other existing taxonomies, for example, BLD - Bioinformatics Links Directory.
To validate the methodology, the UPM group ran a preliminary experiment on 400 articles indexed in the ISI Web of Knowledge. A search was run with the "bioinformatics resources" string and selected the top 392 most relevant articles by impact factor. The others articles were unrelated to bioinformatics resources and were entered as a control group to verify method robustness. A total of 376 names of resources were automatically retrieved from the above set of resources. This amounts to a success rate of almost 95 percent.
Additionally, a web services-based web application has been built for the scientific community to use to access the index and search resources by name, category and domain. The key advantage of this method over existing resource indexes is that it is automatically created and updated. As it is a general-purpose methodology, it is being applied as part of the European ACTION-Grid project, the first European Grid Computing, Biomedical Informatics and Nano-informatics Initiative, co-ordinated by Professor Víctor Maojo.
Both the methodology and the results were published in sector congresses and journals, like BMC Bioinformatics by Guillermo de la Calle, Miguel García-Remesal, Stefano Chiesa, Diana de la Iglesia and Víctor Maojo. The paper is titled "BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature". It has been published in BMC Bioinformatics 2009, 10:320doi:10.1186/1471-2105-10-320.