Baltimore became the world’s capital of data journalism last week, as the National Institute of Computer Assisted Reporting (NICAR) convened its annual conference. The event drew record attendance of roughly 1,000 participants, including journalists and programmers who are among the leading international experts in the use of data to produce content.
A team from the International Consortium of Investigative Journalists (ICIJ) gave several presentations at the conference. In one of them, we explained how we analyzed the leaked 2.5 million files that resulted in the Offshore Leaks investigation. In this post, we describe how journalists and other researchers can best use this data to yield maximum results for investigations.
ICIJ’s panel was an illustration of how we collaborated in the project with more than 100 journalists all over the globe. We had members from the team that worked in it from our Washington DC headquarters – such as ICIJ research editor Margot Williams and the Center for Public Integrity data editor David Donald - but also data journalist Sebastian Mondial, who participated in the investigation from Germany, and I (ICIJ’s project research manager and member from Spain). In the audience there were two other members of the team, Emilia Díaz-Struck, who produced the stories about Venezuela, and Djordje Padejski, who investigated the role of offshore finance in Serbia. You can see our full presentation here:
One of the goals of the talk was to show how to make the most of the Offshore Leaks Database, which visually displays the identity of the owners behind more than 100,000 entities in 10 tax havens and the networks around them. Its data is also available for download. If you want to learn how to use it most effectively, follow these tips we gave to the NICAR audience.
To understand how the ICIJ Offshore Leaks Database was created and what it includes and doesn’t include, read these three pieces: How ICIJ’s Team Analyzed the Offshore Files, How We Built the Offshore Leaks Database and Data Caveats and Limitations (and don’t forget the glossary at the end!). You can also look at the stories we’ve done so far from every country in this map.
As a second step, do searches on the Offshore Leaks Database. Try to search by filtering your country to see if you can identify any of the names. If you want to learn some tips on how to make efficient searches, read this blog post. You also can watch a 5-minute online tutorial:
Find people you know
You can also go further than the basics outlined in the posts above. It’s possible that a person from your country is in the database but not associated with your country. In the original databases ICIJ obtained, he or she likely was not linked to an address from your country. For example, Spanish art collector Carmen Thyssen is associated with an Andorra address, not a Spanish one, so she only appears under the Andorra filter. You can read the story we wrote on her use of offshore structures to manage her collection here.
To discover more people, you can either make searches in the database (like you would do in Google) to see if you get lucky or you can try to match the whole database against a list of names you have.
Match our data with yours
In order to do the “massive” matches, download the raw data here. You can follow the steps outlined in this post to work with the data. The main file to use is nodes.csv. You can do the matches in many ways. However, you need to know that as the data has not been altered –and it has typos and misspelling — doing a simple VLOOKUP in Excel won’t get you all the results. Instead, you can do simple analysis that only requires Microsoft Excel 2010 (or higher) and a PC by using the Fuzzy Lookup add-in, which can be downloaded for free here. Please note this does not work on Macs or older versions of Excel, unfortunately. Here is a simple tutorial to learn how to match data sets with Fuzzy Lookup. Other more complex ways of fuzzy matching, using SQL or other programs, could also work.
Use the Fuzzy Lookup tool to match against your list of names, which has to be in an Excel file. If you want to give it a try with a sample dataset –try to see if there are any names listed in the U.S. Department of the Treasury “Specially Designated Nationals List” (SDN) that also appears within the Offshore Leaks data.
The SDN list tracks “individuals and companies owned or controlled by, or acting for or on behalf of, targeted countries. It also lists individuals, groups, and entities, such as terrorists and narcotics traffickers designated under programs that are not country-specific.“ You can download this list from here.
Among the several matches you’ll get, you’ll end up finding Nalinee Taveesin, a former government minister who is currently Thailand’s international trade representative. In 2008, the U.S. Department of the Treasury’s Office of Foreign Assets Control (OFAC) designated her as one of four “Mugabe regime cronies.” The mining of the leaked offshore documents and the local reporting by an ICIJ member in Thailand led to this story.
If you find that your own dataset has a match with a record in the ICIJ Offshore Leaks Database and you are a reporter, please contact us. We may have documents within the leaked files that will help you tell a better story.
Find out first! Receive ICIJ's investigations by email