New software tool to discover needle in haystack of medical literature using linguistic analysis

Groningen 20 January 2001Pharmeceutical elements that in some cases produce unpleasant side effects, may have a healing impact in other medical situations. A post-graduate student at the University of Groningen has developed an analysis software tool that is able to search for unsuspected qualities in medication prescriptions among the piles of titles within the existing medical literature. Since September last year, Dr. Marc Weeber is testing the new system at the National Institutes of Health (NIH) in Washington.


Originally, Dr. Weeber is an expert in linguistics but in the near future, he might become one of the discoverers of a new medicine which can cure 5 serious affections, including the muscular disease myasthenia gravis, acute pancreatitis, and chronic hepatitis C, which is a form of jaundice. In fact, the new medicine called thalidomide, already exists half a century. It was brought to market in the late fifties as the Softenon drug and very soon withdrawn in 1961 after the occurrence of a considerable amount of serious birth defects. In 1998, it was registered again but this time as a cure against dermatological problems that leper patients were suffering from.

With the assistance of a group of pharmaceutical experts and immunologists, Dr. Weeber found some clues in the medical literature, which suggested the beneficial effects of thalidomide in a series of auto-immune diseases utilising the software system which he developed. The tool allows to search through ten thousands of on-line publications in order to trace down the side effects within chemical elements and to connect them with diseases which possibly might be fought precisely with these qualities. Marc Weeber is not the only researcher to indicate that side effects in certain medication are beneficial in other cases but his tool for the first time in practice is able to automatise the search.

"The whole idea behind the tool is that nobody today is able to consult the entire collection of medical literature. However, there still remains a lot to be discovered by combining the existing publications in the right manner. We now have developed a system which produces a list of potential relations sufficiently surveyable to be studied by an expert", stated Dr. Weeber. In the case of thalidomide, an element that despite its stained past continues to provoke a lot of interest among immunologists, the system has generated a list of 60 effects as well as the frequencies in which they could be brought into relation with the element.

Together with the immunology expert, Dr. Grietje Molema, Dr. Weeber ran through the list. Within 3 seconds, Dr. Molema's eye caught the element interleukine-12, which was cited in thirteen different sources. This was a remarkable fact because this element stands in close relationship to the five specific diseases mentioned above. Considered afterwards, this was quite easy to understand. In each of those diseases, the balance between two different types of immune cells plays a crucial role. Thalidomide prevents via the interleukine-12 element that one of these two types prevails, which forms exactly the cause of all five diseases. Only, nobody had thought of it in this way.

Dr. Weeber's tool performs a linguistic analysis of the titles and summaries of medical articles which are freely available on the Internet via the Medline service. From the text sources, the software filters quotations of effects by searching in sentences at the particular spots where "effect" and "side effect" are mentioned. The decisive immunological factors can be found there. The list with effects is automatically brought in relation to the medical terminology in the Unified Medical Language System (UMLS) Meta-thesaurus. In the most recent edition, this catalogue contains 730.000 items and their backgrounds.

This seems a hopeless task considering the fact that a paragraph filled with ungrammatical nonsense equals a paragraph of neat English for a computer and otherwise, words and expressions might have an unclear meaning. The linguistic system however functions optimally: the effects and side effects are correctly being tracked down in texts, whereas useful relationships are being established with current diseases. By way of testing the system, Dr. Weeber last year reconstructed a number of famous accidental medical discoveries, uncovering side effects of elements which seem useful to completely different purposes than they were meant to be initially.

A medical researcher who knows exactly what he is looking for most probably prefers the normal search functions in databases such as Medline, according to Dr. Weeber. For those researchers who have the need for a broader view, Dr. Weeber currently is developing an enhanced interface for his software in the U.S. national medical library in Washington. The new interface will allow scientists to search for unsuspected beneficial side effects in pharmaceutical elements without a linguistic expert at their side to guide them, according to de Volkskrant.

Leslie Versweyveld

