Skip to content

Data Harvest Conference: What You Missed

Some 150 investigative and data journalists and programmers from all across Europe took part in a great harvest of ideas and digital research methods.

David Leigh, Duncan Campbell, and Mar Cabra at Data Harvest.

BRUSSELS – On 3 and 4 May 2013 the third edition of’s Data Harvest Conference took place in Erasmushogeschool in Brussels. Some 150 investigative and data journalists and programmers from all across the European continent took part in what was a great harvest of ideas and digital research methods.

While in 2011 thirty-odd early adopters participated in the conference and last year’s edition attracted 100 journalists, this year’s #dataharvest13 brought together 154 participants from 22 different countries. About 50 speakers in total supported up to six hands-on workshops and masterclasses simultaneously, divided into four tracks: cross-border, Farmsubsidy, wobbing and, finally, the data and journalism lab. 

The cross-border track focused on international journalism collaborations, such as an investigation into false ‘Made in Italy’ labels on canned tomato puree, which was supported by a working grant. The Farmsubsidy track was all about data research into the European farm subsidies, while the wobbing sessions dealt with the latest developments in information gathering through Freedom of Information Acts. The lab primarily acted as a working space where new datasets were prepared and where journalists with ideas could meet up with hackers – and vice versa.

Here is a selection of the wide variety of lectures, debates and sessions that were organized.

“I’ve got something amazing”

Headliners at #dataharvest13 were without doubt the group of some of the main figures in the recently published Offshore Leaks, the collaboration between 86 journalists led by the International Consortium of Investigative Journalists (ICIJ). 

Journalism superstar David Leigh, (hear his presentation) until recently investigative journalist at The Guardian, told about how he was approached by Gerard Ryle, ICIJ’s Australian director, who told him “I’ve got something amazing”. The last time he had heard that before, Leigh confessed, was from Wikileaks’s Julian Assange, when he was about to hand The Guardian a secret dataset about the war in Afghanistan. While for the Afghanistan investigation it concerned a 1.6 gigabyte dataset – 90,000 documents – the Offshore Leaks dataset is 200 GB in size. Quite the job to clean and make usable.

That’s where British data specialist Duncan Campbell, (hear his presentation) who was the project’s data journalism manager, Spanish journalist Mar Cabra, (hear her presentation) the project’s data research manager, came in. They made sure the data were properly cleaned up and sorted. 

In a total of three sessions at the conference the three of them and a number of other computer whiz kids involved in Offshore Leaks explained how they went to work.

Olympic torchbearers

Paul Bradshaw, British blogger, journalism teacher and founder of crowdsourcing platform Help Me Investigate, demonstrated how he investigated the relay of the Olympic torchbearers in the build-up to the 2012 Olympics in London. Through a combination of cross-border research, data journalism and crowdsourcing, Bradshaw discovered that a lot of torchbearers, officially people with an “inspiring story”, were very often Olympic sponsors’ CEOs, or commercial partners, PR-people or journalists.

Bradshaw made intensive use of datascraping for his investigation, a research method where you ‘scrape off’ information from online databases – in this case that of the Olympic torch relay – pour it into a computer programme and then start using it for your own investigations. Datascraping even proved to be not at all as complex as it sounds, as was shown in one of the workshops. By scraping on a regular basis, Bradshaw’s team noticed that as the first discoveries were documented in the press, some names disappeared from the database. That struck them and lead to even more articles, which in turn generated more interest for the crowdsourcing platform and lead to other contributions from the crowd.

Bradshaw mentioned that he did all this more or less between times, “on the train, during lunch breaks, etc. It’s a mistake to think that data projects cost a lot of time. And what’s more: you don’t have to be admitted into ICIJ or another network: it’s just one of the many different methods to do journalism.”

Photo: Shutterstock

What else?

The BBC’s internet research specialist Paul Myers unfolded some advanced online search techniques with which he supports the BBC’s journalism team. Thanks to Myers we now know how to trace the owners of obscure websites, determine where a photo was taken and go undercover with a fake Facebook profile. From the director of the EU Ombudsman office and two lawyers specialised in FOI requests we heard how to fight an FIO request rejection. Stephen Grey, investigative journalist at press agency Reuters, expounded on the research methods that he used to expose the corrupt CEO of a Greek bank. And so much more.

Noticed in the Farmsubsidy track: Austrian journalist Hans Weiss caused a stir a few years ago with his book on financial and fiscal misconduct in the Austrian agricultural business, for the investigation of which he had used public and easy-to-obtain EU and Austrian farm subsidy data – not secret, leaked documents. The fresh 2012 farm subsidy data was already being worked on at the conference, too.

Wobbing – making use of FOI legislation – leads to strong results. German journalist Marco Maas and his team built a complete website with various EU legislation lobbyists’ propositions concerning data protection on the one side, and the amendments introduced by MEPs and members of the commissions on the other. Often they turn out to be so similar, that they can almost be called plagiarism. Hence the name of the website:

Among the speakers on the cross-border track was a Belgian freelancer, Damien Spleeters, who covers the Belgian export of arms, particularly those of arms manufacturer FN Herstal. By analysing Belgian archived documents and through detailed investigation of the serial numbers of Belgian weapons in conflict zones, he found out that, despite an embargo on arms export to conflict zones, Belgian weapons often reach present-day war zones after detours of several years – particularly Libya, Syria and Mali. Other cross-border cases that were presented were that of Gaza Power Plant and the millions of EU subsidies for it that amounted to nothing, and the criminal networks that are behind Roma beggars and that can be traced back to a few Roma bosses at the top of the criminal pyramid.

All the while, the hackers, coders and other whiz kids were busy in the data and journalism lab, scraping and cleaning new datasets, preparing them for journalistic exploration…

A lot of interesting material from the various sessions can be retraced via Twitter hashtag #dataharvest13: links to databases, audio recordings of presentations, etc. The conference was closed by Brigitte Alfter, who asked the participants for feedback and suggestions and already announced #dataharvest14 (planned for 9 and 10 May 2014). Finally, she urged everyone to gather for a closing drink in Brussels after what had been an intense, but massively fruitful two days.

Hear some of the presentations (recorded by Paul Bradshaw)

Translation by Rafael Njotea

ICIJ is dedicated to ensuring all reports we publish are accurate. If you believe you have found an inaccuracy let us know.