Medical Semantics, Ontologies, Open Solutions and EHR Systems

Shepherdstown 12 August 2009This article offers a high level introduction for management to the topics of medical semantics and ontologies as they relate to healthcare and health IT systems. Medical semantics and ontologies tend to be poorly understood by many audiences and are subjects that sound more complex and difficult to grasp than they really are. In this article the authors attempt to briefly introduce and try to demystify these topics, and hopefully, to generate some enthusiasm among skeptics. The article also explores some of the 'open source' tools that have emerged related to this area, and how all this ties into the future of Electronic Health Records (EHR) and Health Information Exchange (HIE) systems.


Medical Semantics

Semantics is the study of meaning in communication. It is also the study of interpretation of signs as used by agents or communities within particular circumstances and contexts. Did anyone read or see "The Da Vinci Code"? A simple example involves the meaning of the word 'cold'. It has completely different meanings in the phrases "I am cold" versus "I have a cold". Words may have different meanings depending on the context in which the word is used. See

The application of semantic technology to the medical domain will provide IT systems with the ability to better understand terms and concepts as data is transmitted from one system to another, while preserving the meaning of the content. For this process to work effectively, extensive investments are being made involving the classification of medical terms and their meanings. Tools in this area make use of classification systems that produce controlled vocabularies, lexicons, taxonomies and ontologies.

For humans, the meaning of a given word is normally obtained by consulting a dictionary or by looking at the context where the word is being used. The computer does not make use of textual dictionary definitions and has no pre-existing repository of contexts, but instead requires a semantic representation that is simpler and more precise. Natural language processing systems represent the meaning of a given word or phrase using a symbol or code. For an Electronic Medical Record (EMR) system, "heart" and "cardiac" are two unrelated terms. For humans, however, both terms have the same 'semantic' meaning.

Increasingly, healthcare institutions have access to computerized patient medical records containing massive amounts of raw data. Much of the available data are in textual form as a result of transcription of dictated reports, use of speech recognition technology, and direct entry by health care providers. While textual data are convenient for tasks such as review by clinicians, they present significant obstacles for graphic presentation, searching, summarization, and statistical analysis. The techniques of natural language processing translate the meanings of terms in the record into more meaningful use.

At this point in trying to provide a simple explanation, you're either 'nodding' your head in agreement, or 'nodding off' in sheer boredom. To briefly summarize, medical semantics is a mechanism for applying tools and techniques that leverage semantic knowledge to enhance the use and utility of healthcare IT systems.

Semantic Interoperability

IEEE defines 'Interoperability' as the ability of two or more systems or components to exchange information and to use the information that has been exchanged. (

'Semantic interoperability' is defined by the National Alliance for Health Information Technology (NAHIT) as "the ability of different information technology systems, software applications and networks to communicate and exchange data accurately, effectively and consistently so providers can use the information as they care for patients". (

It is important to emphasize that there are levels of interoperability, sort of a "Pyramid of Health Data Interoperability" if you will, that helps facilitate the enhancing functions of semantic interoperability and ontologies. What follows is a high-level description of the International Standards Organization (ISO) Open Systems Interconnection (OSI) model. Envision that this "pyramid of health interoperability" has the following layers:

The ISO OSI Seven Layer Model for Networking


One of the most important ways semantic interoperability services and resources in healthcare can be used relates to reconciling clinical data contained in diverse EHR systems. Semantic interoperability is a concept that will definitely contribute to improvement in healthcare over time because it will deliver the right meaning of medical terminology to each collaborating system user every time, via a service-oriented web-based solution.

Semantic Web

The Semantic Web provides a common framework that allows data to be shared and re-used across application, enterprise, and community boundaries. The Semantic Web is an evolving extension of the World Wide Web (WWW) in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. See

At its core, the Semantic Web comprises a philosophy, a set of design principles, collaborative working groups, and a variety of enabling technologies. Some elements of the semantic web are expressed in formal specifications such as the Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle), and notations such as RDF Schema and the Ontology Web Language (OWL), all of which are used to provide a formal description of concepts, terms, and relationships within a given knowledge domain.


Ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. It is used to reason about the objects within that domain. Ontologies are used in artificial intelligence, the semantic web, software engineering, biomedical informatics, and information architecture as a form of knowledge representation about the world or some part of it. Ontologies generally describe:
  • Individuals: the basic or primary objects
  • Classes: sets, collections, or types of objects
  • Attributes: properties, features, characteristics, or parameters that objects can have and share
  • Relations: ways that objects can be related to one another
  • Events: the changing of attributes or relations

An ontology language that may be used in an example like this, is a formal language used to encode the ontology. There are a number of such languages for ontologies, both proprietary and standards-based.

For example:

  • OWL - Ontology Web Language (OWL) is a family of knowledge representation languages for authoring ontologies, and is endorsed by the World Wide Web Consortium.
  • KIF - Knowledge Interchange Format (KIF) is a computer-oriented language for the interchange of knowledge among disparate computer programs.
  • Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and database of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. It has its own ontology language called CycL.
  • RIF - Rule Interchange Format (RIF) effort involves the development of a format for interchange of rules in rule-based systems on the semantic web. The goal is to create an interchange format for different rule languages and inference engines.

'Open Source' Semantic/Ontology Solutions

Open source describes a broad general type of software license that makes source code available to all with relaxed or non-existent copyright restrictions. It is an explicit 'feature' of open source that it put little or no restrictions on the use or distribution of the code by any organization or user. There are many open source projects and tools available related to semantics and ontologies that can be found at, such as:
  • OntoWiki and Powl - OntoWiki is a semantic collaboration platform for the development of Semantic Web knowledge bases. Powl is a web-based ontology authoring and management solution for the Semantic Web.
  • Nepomuk Semantic Desktop Project - NEPOMUK brings together researchers, industrial software developers, and users in a collaborative open source project to build the Social Semantic Desktop solution.
  • CuiTools - A Perl package for supervised word sense disambiguation (WSD) experiments that utilize features extracted from the Unified Medical Language System (UMLS). The word Cui comes from the Concept Unique Identifiers in the UMLS.

Many other related open source projects are underway such as:

Semantic/Ontology Projects & Tools in Healthcare

The following are examples of open source Medical Semantics/Ontology projects and tools:
  • ARTEMIS Project - A Semantic Web Service-based P2P Infrastructure for the Interoperability of Medical Information Systems. IST-1-002103-STP. In early stages of development.
  • MII Medical NLP Toolkit - This is a toolkit for medical natural language processing (NLP). The core engine is general enough to be used in a variety of text processing domains, though the toolkit includes specific support for medical reports and patient de-identification.
  • ONTODerm - ONTODerm is a specialty specific ontology for dermatology to integrate dermatology with medical software systems. In early stages of development.
  • Medical Language Processing - Natural language processing of free-text clinical documents into an information representation in XML accessible via a rich system of categories familiar to clinicians. In early stages of development.

Other major health care related projects, tools and organizations of interest include:

*Also take the time to visit the Semantic Web Tools wiki at

Some examples of published ontologies include:

OntoSelect monitors the web to provide an access point for ontologies on any possible topic or domain that is automatically updated, organized in a meaningful way and with support for ontology search and selection. Swoogle is another good semantic/ontology web search engine that is available for use. Also consider trying Ontaria which provides a searchable and browsable directory of semantic web data.

Social Security Administration (SSA) Health IT Semantic Interoperability Pilot Project

The Health IT Semantic Interoperability pilot was developed in by the SSA as a proposed proof-of-concept in 2006 for integration of a Health IT and Disability Determination business process. In particular, this business process requires data sharing and processing across various governmental and private sector enterprises such as SSA, VA, CMS, HHS, NARA, hospitals, healthcare providers, insurance providers, legal communities and others.


Semantic Interoperability, Ontologies and Electronic Health Record (EHR) Systems

Medical information systems need to be able to communicate complex medical concepts unambiguously, even those expressed in different languages. This is obviously a difficult task and requires extensive analysis of the structure and the concepts of medical terminologies. It can be achieved by constructing medical domain ontologies for representing medical terminology systems. This not a trivial task.

An information model is needed to describe the relationships of different data elements in a patient's medical record. Data elements and relationships in the information model are often tacitly assumed. Difficulty arises in the situation where two disparate EHR systems make different assumptions. This leads to the need for an Information Mediation Service (IMS) to perform application-to-application mappings between EHR systems. This is not a trivial matter. Work on medical domain ontologies and information models have already been in progress for almost two decades.

VA VistA Electronic Health Record (EHR) System - Lexicon Utility XE "Lexicon Utility"

The adoption of a standardized reference for clinical terminology across Veterans Health Administration (VHA) facilities enables clinical information to be recorded, transmitted, retrieved, and analyzed in a precise manner independent of clinic or medical center. The scope of the Lexicon Utility, used within the VistA Computerized Patient Record System (CPRS), is to express diagnostic clinical problems in easy-to-understand terminology and associate these terms to coding systems such as ICD, DSM, NANDA, etc. It works in conjunction with other VistA applications such as the Problem List, Encounter Form, and Text Integration Utility (TIU) and provides a comprehensive API so that any application that needs to use standardized terminology can be interfaced. Major features or functionality include:
  • Provides a basis for a common language of terminology so that all members of a health care team can communicate with each other.
  • Provides terminology that is well defined, understandable, unique in concept, and encoded by multiple coding schemes.
  • Provides for site modification of text presentation, term definitions, synonyms, shortcuts, and keywords.
  • Provides the ability to upgrade coding systems (e.g., ICD-9-CM to ICD-10) and to add, change, and delete codes.
  • Provides for limited views of vocabulary (lexicon subsets).
  • Allows each site to add its own vocabulary to the lexicon.
  • Accepts the provider terminology if a search of the dictionary does not find a match.
  • Uses subsets of terms based on specialty or clinic.
  • Allows abbreviations or shortcuts to provide quick access to frequently used definitions.
  • Supports CPT terminology and codes.


In addition to the 'open source' or 'public domain' VA VistA system, there are other examples of projects aimed at demonstrating the applicability of semantic web and ontologies with EHRs. For example:

The Artemis Project and the Artemis Message Exchange Framework (AMEF) have been developed to provide the exchange of meaningful clinical information among healthcare institutes through semantic mediation. Some of the achievements of the Artemis project include:

  • Finding and retrieving clinical information about a particular patient from different healthcare organizations where concrete sources are unknown.
  • Demonstration of a very robust, but highly flexible approach to security and privacy.
  • The partnering entities of the Artemis project developed Web services for exposing their existing healthcare applications and patient data.


The Telemedicine and Advanced Technology Research Center (TATRC) within the U.S. Department of Defense (DoD) has initiated the TATRC Natural Language Processing (NLP) Systems Project. The purpose of this project is to design and develop a natural language processing engine that is compatible with Armed Forced Health Longitudinal Technology Application (AHLTA) and is linked to MEDCIN, UMLS and SNO-MED CT ontologies. The goal is to be able to process semi-structured or free text note sections of AHLTA and be able to capture both contextual and structured terms for surveillance and data mining. The tool must show how these captured structured terms can be extracted and searched from the clinical data repository. The task is to design and develop a natural language processing engine which can be used to allow providers to document their care in the electronic health record in a natural way, without forcing them to use structured documentation. Currently, much of the documentation is "too structured", forcing providers to use a very hierarchical structure of MEDCIN. There is significant evidence that this method causes significant errors and the result is a documented note which does not accurately capture the essence of the patient encounter. See

Semantic Interoperability and Privacy & Security

At the Health IT Definitions Project Public Forum held on January 16, 2008, Dr. Karen Bell, former Director of the Office of Health IT Adoption within the Office of the National Coordinator for Health IT (ONCHIT), said there were two things they wished they had done sooner: vocabulary harmonization and privacy & security. This gave rise to the establishment of the Health Information Technology Ontology Project (HITOP) working group by ONCHIT. See

A Semantic Web Information Infrastructure must comply with commonly accepted privacy and security policies and standards related to handling sensitive patient data contained in electronic medical records (EMR). For example, the SAPHIRE Project in Great Britain employs comprehensive privacy and security mechanisms to complement their infrastructure, which is based on end-to-end and system-to-systems connections with semantic interoperability. Specifically, EU directives 95/46/EC and 2002/58/EC presenting the general principles of processing of personal data were taken into account, with particular attention paid to recommendation R(97)5 of the Council of Europe discussing protection of medical data collected and processed automatically. See

Work by the Center for Clinical Translation Sciences (CCTS) at the University of Texas Health Science Center at Houston may also be of some interest. The CCTS Environment, Documentation, and Authorization models enable the system to dynamically and automatically contextualize availability, access, utilization, and retrieval of all informational resources governed by the CCTS program and its collaborators through combinations of constraint based on role, investigator, research project or research question. Thus, CCTS utilizes Semantic Web technologies not only for integrating, repurposing and classification of multi-source clinical data, but also to construct a distributed environment for information sharing, and collaboration online with security and privacy of personal data. See

Benefits of Using Ontologies & Semantic Interoperability Systems

The following are some ways the healthcare system can be improved by using medical ontologies and semantic interoperability tools and practices include:
  • Improve accuracy of diagnoses by providing real time correlations of symptoms, test results and individual medical histories through standards-based systems for systematic cross-checking diagnoses.
  • Increase prompt payment of Medicare and Medicaid claims by reducing billing questions through adoption of IT standards for clinical care codes, medical nomenclature, lab tests, etc.
  • Reduce the burden of fraud on the overall system by enhancing the capability to detect fraud by the use of semantic interoperability tools.
  • Ontologies can help build more powerful and more interoperable information systems in healthcare.
  • Ontologies can support the need of the healthcare process to transmit, re-use and share patient data.
  • Ontologies can also provide semantic-based criteria to support different statistical aggregations for different purposes.
  • Possibly the most significant benefit that ontologies may bring to healthcare systems is their ability to support the indispensible integration of knowledge and data.


The set of technologies associated with semantics and ontologies in healthcare are, relatively speaking, still in their infancy or early childhood. While there are high expectations, only modest progress has occurred to date.

Partnerships between major technology vendors such as commercial database companies and large scale integrators working in collaboration on public-private sector EHR projects will help break through some of the existing major barriers.

With the ease of posting structured lists on the Internet, and with Extended Markup Language (XML) as an emerging standard for such lists, it is likely that the next decade will witness an explosion of medical ontologies available in the public domain.

Next Steps

The following are some recommendations and next steps healthcare organizations should consider taking with regards to Ontologies and Semantic Interoperability solutions:
  • Consider establishing a workgroup to identify functional requirements and/or potential uses of medical ontologies and semantic interoperability systems for use by your healthcare organization.
  • Conduct a detailed literature search and market survey annually and obtain lessons learned from medical ontologies and semantic interoperability projects underway at other institutions.
  • Identify potential organizations to collaborate with on the research, development, testing and use of medical ontologies and semantic interoperability, e.g. medical schools, vendors.
  • Investigate changes in clinical and IT practices that may need to be made in anticipation of utilizing medical ontologies and semantic interoperability systems.
  • Initiate and fund a pilot project(s) and complete a detailed cost benefit analysis of investments in this arena. The pilot may involve use of either a commercial or open source solution.

Selected Reference Web Sites


Peter J. Groen is an adjunct faculty member of the Computer & Information Science Department at Shepherd University in West Virginia. He is one of the founders of the Shepherd University Research Corporation (SURC) - see

Marc Wine works as a senior health systems advisor with Northrop Grumman Information Solutions, served as senior advisor to the U.S. Department of Defense, Telemedicine & Advanced Technology Research Center (TATRC). He also served with the Veterans Health Administration (VHA) for most of his career. He is also an adjunct faculty member at The George Washington University where he teaches Health IT Systems Management. You can contact him at

Peter Groen, Marc Wine

[Medical IT News][Calendar][Virtual Medical Worlds Community][News on Advanced IT]