A police officer gives a mask to a commuter in Valencia, Spain.
Police give out a face mask in spain

COVID-19

Investigating the coronavirus: Incomplete data creates headaches for reporters

ICIJ’s mission is to reveal injustice and hold the powerful to account. Do you know of someone exploiting the coronavirus crisis, or have another story to share? We’d like to hear it. (Ici en Francaisaquí en Español)

I’m a COVID-19 data skeptic.

The most widely-used numbers to describe the impact of the pandemic — reported cases and deaths — are flawed. Limited testing in much of the world, including the United States, has meant untold thousands (or hundreds of thousands) of infected are not included in official tallies. There are too many blind spots where the disease may be incubating and set to explode.

I’m also a data journalist who believes reporters have an obligation to use data to explain the world’s biggest problems — including the coronavirus pandemic.

How to reconcile the two, and produce responsible stories that add to the public understanding of the crisis? For starters: journalists should be transparent about the limitations of data when they share it.

The Los Angeles Times coronavirus tracking page, as an example, doesn’t just offer up counts. It clearly points out the severe limitations of tallies, while still offering valuable insight into how the outbreak is spreading in California. Story links interspersed among the graphics guide readers to deep dives that explore how the numbers reflect reality – like a link to this story, for example.

For example, to accompany the map titled “Where are the cases in L.A.?” the paper links to a story that found the highest rates came from predominantly white, affluent areas. Crucially, it notes the caveat: “Experts say this is likely skewed by uneven access to testing, and in some instances by wealthy residents who traveled internationally.”

Journalists should also proceed with extreme caution if selecting and crunching datasets on their own. Experts and their modeled estimates will usually provide better guidance than news nerds crunching all manner of data sourced differently.

As awareness of the new virus increased in the U.S., The New York Times shared a database of cases reporters had compiled with researchers at Columbia University — who then combined it with Census Bureau transportation data to estimate how the pandemic might spread with and without effective measures to flatten the curve.

A screenshot of the New York Times data visualization.

The resulting story provided a powerful preview of what lay ahead in the U.S. and is an excellent example of how data gathered by reporters can help researchers.

The Financial Times coronavirus tracking page offers a great global snapshot of all things coronavirus. It includes a map, “How the world locked down due to COVID-19,” which uses a “stringency index” to indicate the severity of the lockdown measures. The index was built by researchers at the University of Oxford’s Blavatnik School of Government, from data on a range of government response measures, such as school and workplace closures and restrictions on travel and gatherings.

A report published by German daily Süddeutsche Zeitung used a set of charts to effectively illustrate the concept of “exponential growth” to show how coronavirus spreads so rapidly.

As in the U.S., Spanish authorities reacted too slowly to the spreading pandemic, necessitating long periods of physical distancing to try to get the outbreak under control. El Pais produced a beautiful explainer about the power of physical distancing to slow the exponential spread. After three weeks of lockdown it served as a valuable reminder of the necessity of prolonged confinement in a country with a large outbreak.

If you are a researcher with data that you think could help inform the public response to the coronavirus pandemic, let us know.