Algorithms, Analysis And Adverse Events: How ICIJ Used Machine Learning To Help Find Medical Device Issues

ICIJ data guru Rigoberto Carvajal.
Rigoberto Carvajal
By Emilia Díaz-Struck Rigoberto Carvajal
/ December 19, 2018

Over the course of the Implant Files investigation, ICIJ and its partners filed more than 1,500 public records requests and collected more than 8 million device-related health records worldwide. These include recall notices, safety warnings, legal documents and corporate financial filings.

The largest share came from the U.S. Food and Drug Administration’s Manufacturer and User Facility Device Experience (MAUDE) database. More than 5.4 million “adverse event” reports sent to the FDA over the decade ending in 2017 were reviewed for the investigation. For some devices, more recent data available from the first half of 2018 was also analyzed.

These reports come from doctors, manufacturers, patients and even lawyers and describe cases where a device is suspected to have caused or contributed to a serious injury or death or has experienced a malfunction that would likely lead to harm if it were to recur.

In some cases, the connection between the harm described in an adverse event report and the device isn’t clear, and the FDA cautions that conclusions about a device’s safety or role in an injury or death cannot be made from an adverse event alone.

Nevertheless, ICIJ’s analysis, which included identifying devices, sometimes listed under hundreds of different brand names or spellings, gives a never-before look at potential harm. Several dates are linked to each event in the database. ICIJ decided to use the dates on which cases were reported to the FDA to track them over time. The team used FDA’s product categorization system to cluster similar devices by their general purpose.

Medical devices that broke, misfired, corroded, ruptured or otherwise malfunctioned after implantation or use were linked to more than 1.7 million injuries and nearly 83,000 deaths over the last decade through “adverse event” reports, an ICIJ analysis found.

ICIJ and its media partners also used device approval datasets, recall notices, “adverse event” reports data available in different countries and U.S. Securities and Exchange Commission corporate filings for its analysis. As a result of the work, and because no global resource for recalls and safety notices exists, ICIJ decided to build one. The International Medical Devices Database for the first time gathers recalls, safety alerts and field safety notices — more than 70,000 from 11 countries — to create a searchable portal that anyone can access to help discover whether a device was flagged for official safety concern.

Using algorithms to identify problems

To explore the MAUDE data and its event descriptions linked to the reports filed with the FDA, ICIJ used machine learning algorithms to screen the millions of records.

The machine learning programs Talend Real-time BigData Platform, Microsoft SQL Server 2017 as well as the programming language R were used at different stages of the analysis. The entire process involved text mining, clustering, feature selection, association rules and classification algorithms to identify events not always described consistently in different parts of the data.

Machine learning algorithms were used, for example, to screen millions of records and identify reports in which the description of an adverse event indicated that a patient had died, but the death was misclassified as malfunctions or injuries.

The process involved, teaching the computer to identify death reports based on language. The word “death” was not always present. There were other terms and sentences that could reflect the event, such as “the patient expired.” ICIJ originally received a list of 121 death-related terms that appeared in the MAUDE data from founder and chief executive of U.S.-based Device Events, Madris Tomes. We refined it, then expanded it by running code through all real death reports to mine the text and develop a set of more than 3,400 key phrases to use for the analysis.

The process involved identifying sentences and recognizing nouns, verbs and adverbs from thousands of real death reports in the data and refining it based on a verification process run by journalists. This allowed us to discard false positives, such as cases in which a relative of a patient or even the device itself had expired — but not the patient.

Once the computer learned to identify deaths based on variations of the language, ICIJ ran the analysis through the millions of records from the MAUDE database. At a second stage, another algorithm was used to review whether it was possible to know whether or not the device contributed to the death. A team of journalists then checked the results one-by-one by reading the event description and comparing it to the classifications obtained through the machine learning process to complete the analysis.

ICIJ found 2,100 cases in which patients died, but their deaths were classified as device malfunctions or injuries. Of these, 220 reports indicated that devices may have caused or contributed to the deaths. The other reports did not include enough information to determine conclusively if the device played a role in the patients’ deaths.

Update, February 20, 2019: This story has been updated to include reference to early research assistance provided by the founder and chief executive of U.S.-based Device Events, Madris Tomes.