Unlike previous ICIJ investigations such as Luanda Leaks, Mauritius Leaks, or Panama Papers, the FinCEN Files documents didn’t number in the hundreds of thousand or millions.

In fact, the entire investigation was based primarily on a cache of just over 2,600 documents that BuzzFeed News shared with the International Consortium of Investigative Journalists and 108 media partners, including more than 2,100 suspicious activity reports, several hundred spreadsheets, a few dozen Word documents, and emails.

But don’t let the size of the leak fool you. It took months of deliberate, meticulous data work and diligent reporting to eke out some of the investigation’s big stories and findings, like analyzing the more than $2 trillion in suspicious transactions flagged by U.S. banks, or identifying the networks of shell companies used to launder potentially dirty money.

With any data journalism project, it’s important to put a human face to the figures. Our Confidential Clients feature was a selection of businessmen, fraudsters and political leaders whose stories appeared in the leaked files. The aim was to show how global banks continued moving billions of dollars around the world for clients they suspected were funding illicit or illegal activities.

We quickly discovered that each profile offered a microcosm of the huge amount of reporting and analysis that would eventually go into every story across the entire investigation, whether it was a 7,000-word feature on British shell companies or a 300-word profile of a corrupt politician. Here’s what we learned along the way:

The data required reading between the lines

The files we were looking at were not the sort of frank exchanges between a client and their trusted wealth manager, accountant or lawyer that we’ve seen in past investigations. There were no details of intimate client meetings or clear indications of intentions or motivations.

The suspicious activity reports and spreadsheet lists of transactions we accessed were put together by banks’ compliance officers reporting their suspicions to the U.S. Treasury’s Financial Crimes Enforcement Network, known as FinCEN. These officers often had no direct connection to the transaction or the client themselves, and were often either ill-informed or limited in the amount of details they could provide FinCEN.

The data we had was also quite patchwork. Most of the files were unstructured narratives outlining officers’ suspicions, written into documents from which we could extract some details about transactions. On occasion, we had access to transactional data in spreadsheets. Those files gave us very detailed information from the point of view of the filing bank;  some came in the FinCEN Files with the corresponding reports. But there is no mandatory format for the transactional data to be filed — each bank had chosen its own way of presenting the transactional information. Only 46 of these spreadsheets came with contextual narrative files attached.

As a result, the reports had to be treated with a lot of caution, though they still contained a wealth of useful information to use as a starting point. For example, JPMorgan Chase reported hundreds of millions of dollars worth of transactions made by Paul Manafort, the former manager of Donald Trump’s 2016 presidential campaign, and his associates. The SAR itself wasn’t particularly detailed but we fortunately had access to the corresponding list of transactions, which showed the money flows tied to Manafort and his companies.

For each Confidential Client profile, we had to assess what information we had, what information we needed to get, and then piece it all together.

The data was selective

The FinCEN Files documents represent less than 0.02% of the more than 12 million suspicious activity reports that financial institutions filed between 2011 and 2017. According to BuzzFeed News, some of the records were gathered as part of U.S. congressional investigations into Russian interference in the 2016 U.S. presidential election; others were put together following requests to FinCEN from law enforcement agencies. What we had was but a very small window into the world of a few banks, who were selectively picking the information they deemed worthy of being reported to the U.S. government.

It wasn’t possible to pick a bank’s famous client and follow a steady trail of information exchanged between them over the years. Nor was it possible to carefully build a complete picture of each bank’s relationships with their correspondent clients around the world.

A few names of individuals and companies popped up quite quickly. Still, it took a lot of heavy lifting to fill in the blanks. For example, we knew Deutsche Bank took an interest in a few companies owned by Ukrainian businessman Ihor Kolomoisky, especially in 2016 when it filed several reports mentioning transactions made by Kolomoisky’s Ukrainian International Airlines. But we couldn’t know what exactly prompted the reports to be filed in the first place, whether more reports were filed than those we had, and what happened after they were sent to FinCEN. What we had was not dissimilar to the feeling you get when driving on a sparsely lit highway. You can distinguish patches illuminated by streetlights, but you cannot know what lies in the darkness beyond.

The data was full of duplicates

One of the things we like to do in the ICIJ data team is to count things. We count documents, we add up amounts, we calculate date ranges, and we classify entities, such as banks, all in the name of contextualizing information and building understanding. I like to use the analogy of a person collecting shells on the beach: whereas most people would select the most beautiful or original shells and discard the rest, the data team goes through every shell there is, and considers each one against a methodology. In the end, the questions we need to answer are plentiful: is it useful? Is it representative? Can it be trusted? What can we safely say, and what don’t we have enough information about?

The first “confidential client” we looked at was Mukhtar Ablyazov, a Kazakh national who has lived in exile for more than ten years and is accused by Kazakh authorities of having embezzled billions of dollars during his tenure as chairman of a bank. Ablyazov has appeared in a number of previous ICIJ investigations. In the FinCEN Files, his companies were mentioned in suspicious activity reports filed by seven different banks. This gave us an inclination that it was a good profile to focus on. Not only were we able to review the banks’ reports, but we also had access to FinCEN’s own summary reports about Ablyazov, which provided additional background information.

But because the suspicious activity reports came from numerous banks, filed over the course of several years, we ran the risk of having multiple banks reporting the same transactions. Different banks can be involved in the same transaction, or a series of transactions can partly intersect with another series reported by another institution. This risk of double-counting also existed when banks filed multiple reports and failed to explain what transactions they had included in previous reports. We had to cross-reference the transaction data from the suspicious activity reports, the transaction spreadsheets, and FinCEN’s own reports to be able to identify and set aside potentially duplicate records.

The data needed more data

Because so many of the rich and powerful rely on shell companies, including for legal reasons, it is often very difficult to identify the individuals that ultimately benefit from operating those companies. The U.S. banks themselves sometimes have a hard time tracking those ultimate beneficial owners (or UBOs), as shown in the FinCEN Files. We had to research what companies were owned by each of the individuals portrayed in the files, and check whether we had additional transactions in the data tied to those companies which hadn’t been identified by the banks.

This happened for transactions tied to Isabel dos Santos, which Standard Chartered flagged as part of a report about various companies called Unitel. Thanks to our previous Luanda Leaks investigation, we were able to identify the transactions that were relevant to the Unitel company partly owned by Isabel dos Santos, and include this amount in her Confidential Clients profile.

We came across several other cases where banks reported suspicious transactions we could tie to one of the Confidential Clients through outside research. Such as Oleg Deripaska, a Russian businessman, whose companies used accounts at Expobank in Latvia. The Bank of New York Mellon reported suspicious transactions to FinCEN that included Deripaska companies using the Latvian bank, but the U.S. bank didn’t report about Deripaska directly. It’s only through contextualizing the documents that the profiles could be built. Thankfully, we could rely on a network of knowledgeable media partners to help with that contextualization, and also on ICIJ’s access to previous data leaks.

Finally, ICIJ reached out for comment to the individuals, companies, and banks mentioned in the Confidential Clients interactive, including those that were part of the transactions we chose to visualize with each client. In some cases their responses helped contextualize and inform our reporting; in all cases we included their responses alongside their profiles.