Mar Cabra talking to journalists at a meeting in Washington, DC.

How ICIJ went from having no data team to being a tech-driven media organization

Technology is one of the ingredients of ICIJ’s secret sauce in projects like the Paradise and the Panama Papers. The custom-built tools we’ve developed over the years have allowed hundreds of reporters around the world to access millions of files securely at the click of a mouse. Journalists that didn’t know each other collaborated hand in hand remotely, using a virtual newsroom to gather and share their findings. Complex issues seemed less so with the help of visualizations.

ICIJ has been producing global investigations for more than 20 years, but it was only three and a half years ago that ICIJ created an in-house data team. Mar Cabra, who was head of the Data & Research Unit from its creation until September this year, recalls how it all started, the iterations on the software based on the reporting needs and the lessons learned along the way. The story below is a chapter she contributed to the book Data Journalism: Past, Present and Future (Abramis, 2017), which was published in October, one month before the Paradise Papers broke.

This post also appeared in Source.

In April 2015, I had a conference call with my boss, the director of the International Consortium of Investigative Journalists, Gerard Ryle. He didn’t want to tell me the purpose of the call in writing. When we started talking, he spoke to me in code language to avoid naming names. The bottom line was that Süddeutsche Zeitung, a German newspaper with which we had worked in the past, had a leak of about one terabyte, too big for them to handle. They wanted ICIJ’s help, and Gerard was seeking my advice as the editor of the data team on how to proceed.

‘How on earth are we going to do this?,’ I thought, but I didn’t tell that to him. Even though I felt a bit overwhelmed by the situation, I knew I had a great team I could count on to tackle this challenge. What I didn’t imagine at that time was how big of a role we, and our technology, would play in what became at the time the largest collaborative investigation in journalism history.

The so-called Panama Papers exposed like never before a system that enables crime, corruption and wrongdoing, hidden by secretive offshore companies. It had historic global effects. At least 150 inquiries, audits or investigations were announced in 79 countries around the world due to its revelations. There were resignations from high-ranking officials, including the prime minister of Iceland. The prime minister of Pakistan was removed from office. ICIJ won almost twenty awards, including the Pulitzer Prize and the Data Journalism Award.

We were lucky that such a request for help by Süddeutsche Zeitung came to us at that point in time. ICIJ was founded in 1997 as a global network of investigative journalists who collaborated on in-depth investigative stories but it was not until 2014 that it incorporated a data team for its newsroom. That doesn’t mean data had not been important to investigations before. Data was key in a two-year series on overfishing called Looting the Seas (2010-2012) and also to Skin & Bone (2012), an exposé on the human tissue trade. However, the project where its relevance became more evident was Offshore Leaks (2013).

Exposing the secrecy of the offshore economy

When Gerard became ICIJ director in the fall of 2011, he brought a hard drive with 260 gigabytes full of documents that exposed the secrecy of the offshore economy. The investigation was not easy on many levels. One of the most difficult parts was making the data available to partners around the world. Seeing that assisting all of them would be too labour intensive, we resorted to technology to help us. We ended up putting the documents in the cloud and making them searchable securely on the web; we had an online forum to share leads and discuss the research, and we created a public website for our readers to explore the names of the people with companies in tax havens. Freelancers – including myself – and the data team at La Nación newspaper in Costa Rica, with which ICIJ collaborated, did most of the data work.

One of the lessons learned from Offshore Leaks –and its sequel, China Leaks– was that ICIJ needed data journalists and programmers in-house. When we started the next project, ICIJ hired two of the developers we had worked with before, Rigoberto Carvajal and Matthew Caruana Galizia, and ICIJ put me in charge of the team. In April 2014 – one year before that call from my boss – the ICIJ data team was created.

Our first year was hectic. ICIJ published three investigations over that period and a fourth was being reported – many more than the average the organization had been doing in recent years. Our team’s mission was nothing short of ambitious: ‘to add a data component to every project ICIJ does right from the start and not as an afterthought.’

The projects that took most of our time were those connected to leaks. Our first task was dealing with more than 1,000 image PDFs of secretive tax agreements between corporations and the Luxembourg government. We needed to make them searchable and available to reporters worldwide. It was a similar problem to the one we faced in Offshore Leaks, but this time we wanted to use open-source tools that would allow us to keep improving the system as the need grew. Matthew had the brilliant idea of using a software called Project Blacklight, originally created for library catalogues, to allow reporters to search documents remotely. To improve the virtual newsroom where journalists interacted on a regular basis, Rigoberto proposed to repurpose Oxwall, an open-source social networking software meant for dating – among other things.

As we were working on this, the French newspaper, Le Monde, shared with ICIJ 60,000 leaked files from the bank HSBC. They were mostly spreadsheets with names of people connected to accounts in its Switzerland subsidiary and the amounts of money in those accounts – in many cases, hidden from the tax authorities. We also used Blacklight and Oxwall in this project and executed an agreement with a French company to use its software, Linkurious, to visualize connections and follow the money more easily. In these two projects, we created the base of the stack that would later allow us to move quickly on the Panama Papers.

ICIJ’s Rigoberto Carvajal (left) with Süddeutsche Zeitung’s Frederik Obermaier to show reporters the power of graph databases.

As our tools and platforms solidified, the number of journalists working on ICIJ projects and their engagement grew. LuxLeaks (2014) involved more than 80 reporters in 26 countries. Swiss Leaks (2015) more than 140 reporters in 45 countries.

On top of helping reporters secure access to the documents, we performed data analysis – the key to strengthening the articles – and created interactive applications that were among the most viewed items in ICIJ’s website.

Becoming essential to ICIJ’s investigations

Leaks were not the only type of data we worked on. In Evicted & Abandoned, a project about how the World Bank regularly failed to protect people displaced by development, we estimated 3.4 million people had been affected in a decade and created a unique database of projects using public data. In Fatal Extraction, we combed corporate data and combined it with information from our reporters in the field to reveal deaths, injuries and community conflicts linked to Australian mining companies across Africa.

Within a year, we had grown to a team of five and were around half of the people in ICIJ’s small newsroom. We added Emilia Díaz-Struck as research editor and hired then-intern Cécile Schilis-Gallego as a data journalist. This is the team I was counting on to help me solve the Panama Papers data challenge after the director called me.

Firstly, we travelled to Munich to get the data. Rigoberto flew in from San José, Costa Rica, and I from Madrid, Spain. We stayed in an Airbnb apartment which we converted in our base camp to copy encrypted hard drives. During the first meeting with our German colleagues, we discovered the complexity of the data, and one of my first comments to my bosses was: we need to hire an extra developer for the team. A few weeks later, Miguel Fiandor joined us from Spain.

The data included mostly emails, but it had millions of PDFs and images that needed to be made machine-readable. We used more than 30 servers in the cloud to process them in parallel to make the first batch of data ready for reporters in less than two months. That was the most difficult part, because after the data was searchable, we used the same tools we had created for the previous projects. In late June, ICIJ had its first meeting with a small group of reporters in Washington, D.C. to kick-off the project, although most journalists joined in September after a meeting in Munich.

As the months progressed, the leak grew to be 2.6 terabytes and contain 11.5 million files, which meant we had to continue processing data throughout the whole project. The number of reporters involved also skyrocketed – we had almost 400 when the investigation went live in April 2016. They produced more than 4,700 articles.

With more reporters, more needs appeared: we had to create a ‘support team’ to help them with problems over our platforms; we created manuals and conducted training in three languages for people on four continents, and we kept improving our tools until publication. For example, we incorporated a popular feature to search for lists of individuals and know, in one go, if there were any hits. We also updated the public database of offshore companies, making it the most-used product in the history of the ICIJ. Today, it is used by reporters, investigators and authorities around the world to chase tax evaders.

Conclusion

It’s impossible to know how the Panama Papers would have been without the work of ICIJ’s data team but, for sure, we could not have had so many reporters working on it. We would have missed many stories and would have had less impact. Technology and data worked together to make the Panama Papers become part of history.

As we move into the future, three things are clear to me. One is that massive leaks are the new standard, and we’ll see more – and bigger – leaks. Second, global collaboration is the only way to deal with the complex world in which we live. And finally, data journalism is here to stay. If you don’t believe it, let me share just one more fact: almost three years and half into its creation, ICIJ’s data team now has 11 people.