Using the power of machines to complete impossible reporting tasks

Earlier this year, the International Consortium of Investigative Journalists and our reporting partners faced a common problem in this line of work: we had received more leaked documents than we could hope to read from start to finish.

A whistleblower had given us more than 200,000 files from an offshore law firm based on the island tax haven of Mauritius. The stories buried in these documents would eventually shed light on a system that diverts tax revenue from poor nations back to the coffers of Western corporations.

But first, we had to figure out how to sort through this mass of data.

Rather than attempt to read every document, our reporting team turned to “machine learning” to automate the sorting process. This subset of artificial intelligence could learn to identify files more likely to contain stories — pulling out, for instance, information-rich tax returns from the massive trove to queue up for review by our reporters.

This was just one example in a presentation on machine learning in journalism that John Keefe, an editor at Quartz and a member of the Mauritius Leaks reporting team, gave last week to ICIJ members.

Keefe, an early adopter of using algorithms in reporting, defines machine learning as the use of complex code to create programs that can detect patterns and sort information faster than any team of humans. Such tools have helped parse massive datasets in many of ICIJ’s recent investigations, including the Paradise Papers, and Implant Files.

As datasets become bigger and more complex, machine learning models that help reporters sort and analyze data are becoming not only more sophisticated but also more accessible to reporters everywhere, Keefe explained. Pre-established learning models, which initially took major work to create, can now be easily fitted to new datasets. Some of these models are even available for free online. “This has only happened in the past few years,” Keefe said.

Keefe pointed to several resources to help reporters navigate the quickly evolving world of machine learning, including his own Quartz AI Studio and an open-source service called Fast.AI.

Keefe’s session was part of ICIJ Labs, a new webinar series for ICIJ’s journalist members to engage in discussion with industry leaders. ICIJ members from more than two dozen countries including India, Japan, Israel, Peru, France, Sweden, Germany, Russia, Egypt, Jordan and Slovenia attended.

Keefe emphasized that the point of machine learning isn’t to create perfectly-crunched data, but instead to automate repetitive tasks that frequently involve sorting through text, numbers or visual information. “What this does is help you find more documents that you wouldn’t have been able to find with a plain text search,” Keefe said. “But you still have to go back and double check, just like with any source.”

Keefe says that machine learning in journalism is rapidly evolving. “We have not even scratched the surface on ways that we can use machine learning systems like this to help solve our problems,” Keefe said.

MACHINE LEARNING

Using the power of machines to complete impossible reporting tasks

DATA METHODOLOGY

Algorithms, Analysis And Adverse Events: How ICIJ Used Machine Learning To Help Find Medical Device Issues

Dec 19, 2018

DATA JOURNALISM

How Artificial Intelligence Can Help Us Crack More Panama Papers Stories

Mar 25, 2019

INVESTIGATIVE JOURNALISM

Investigate the enablers, understand the science and more reporting tips from GIJC

Oct 15, 2019

Cyprus anti-corruption watchdog refers former president to prosecutors for alleged ‘abuse of power’

Jun 23, 2026

Mexico seizes suspicious Keytruda in raid to dismantle counterfeit medication ring

Jun 04, 2026

Crypto ATM operator Bitcoin Depot files for bankruptcy

May 19, 2026

Alleged cryptocurrency Ponzi scheme ‘goddess’ extradited from Thailand to face conspiracy charges in US

May 17, 2026

Spencer Woodman

GIVE TO HELP US INVESTIGATE!

About us

Investigations

More

Follow us