Patient records that are to be shared within the research community must have any identifying information removed, according to the United States Health Insurance Portability and Accountability Act (HIPAA). However, manual removal of identifying information is prohibitively expensive, time consuming and prone to error-constraints that have prompted considerable research toward developing automated techniques for "de-identifying" medical records.
The MIT team aimed to solve this problem. "We've developed a free and open-source software package to allow researchers to accurately de-identify text in medical records in a HIPAA-compliant manner", stated Gari D. Clifford, a principal research scientist in the Harvard-MIT Division of Health Sciences and Technology (HST) who led the work with Principal Investigator Roger G. Mark, a professor in HST and MIT's Department of Electrical Engineering and Computer Science.
According to Dr. Zohara Cohen, programme director at the National Institute of Biomedical Imaging and Bioengineering, sponsor of the work, the information in patients' medical records is a "largely untapped treasure trove" that the biomedical research community could use to better understand diseases and their treatments.
"The automated de-identification software developed under the guidance of Dr. Mark is a big step forward in permitting the widespread sharing of patient information without the risk of compromised privacy and confidentiality", Dr. Cohen stated.
Gari D. Clifford, Roger G. Mark and colleagues tested their censoring software on 1836 nursing notes - a total of 296.400 words. Using multiple experts and additional algorithms, they replaced all personal information with "fake" data. In their BMC paper, they report that "the software successfully deleted more than 94 percent of the confidential information, while wrongly deleting only 0,2 percent of the useful content. This is significantly better than one expert working alone, at least as good as two trained medical professionals checking each other's work and many, many times faster than either."
The team is providing researchers access to the evaluation dataset together with the software to allow others to improve their systems, and to allow the software to be adapted to other data types that may exhibit different qualities.
Gari D. Clifford and Roger G. Mark's co-authors are Ishna Neamatullah; Margaret M. Douglass; Li-wei H. Lehman, an HST research engineer; Andrew Reisner, an HST visiting scientist; Mauricio Villarroel, an HST visiting engineer; William J. Long, a principal research associate in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL); Professor Peter Szolovits of the Department of Electrical Engineering and Computer Science and HST; and George B. Moody, HST sponsored research staff.
The titel of the paper is "Automated De-Identification of Free-Text Medical Records" and can be accessed via the BMC Medical Informatics and Decision Making journal's website.