In my first few TechRepublic columns focused upon the intersection of technology of society, I looked at 24/7 connectivity, innovation and the surveillance statedata-driven policy and algorithmic transparency, and the potential and peril of “smarter cities”. This week, I focus on a topic that cuts across all of these subjects: data-driven journalism (DDJ).  

Data journalism is an area that I’ve been researching and writing about for years, going back to my first year at O’Reilly Media right up to this very moment, as I continue a fellowship focused upon the topic at the Tow Center for Digital Journalism at Columbia Journalism School.

If you’re unfamiliar with the subject, don’t be turned off by jargon. Computer-assisted reporting has always been used to find data and tell stories. The outcome is still journalism, focused upon explaining the who, what, when, where, how, and why to a reader or watcher. The difference is that the work incorporates statistics and social science and interrogates databases as sources along with humans.

In its simplest form, that might be a figure with a box score comparing percentages of baseball players or a visualization showing employment rates and demographic information over time. In more advanced applications, journalists might use machine learning, powerful algorithms, and cloud computing to crunch through huge data sets, looking for patterns and connections between documents, contractors, or companies.

In recent years, skyrocketing Internet penetration and mobile devices have transformed how data is accessed, processed, presented, and published. As more data has flowed online, there is more demand for converting it to information that people can understand and consume, from smartphone applications to web services and narrative broadcasts.

As the amount of data generated continues to rapidly expand, the 21st century evolution of computer-assisted reporting will be a force for public good, holding profound importance for society. Data journalists are making sense of this flood and holding governments accountable as they pursue more data-driven policy and performance measures.

In a way, data journalism is the peanut butter to the jelly of open government data releases: Journalists are a crucial component of confirming that the data public officials describe has actually been released in a form and quality that can be consumed.

While I’m admittedly biased, I think the growth of DDJ is one of the most significant current trends in the media. It’s also top of mind this week because I attended the annual National Institute for Computer-Assisted Reporting (NICAR) conference in Baltimore, just up the road from Washington. NICAR is a joint program of Investigative Reporters and Editors, Inc. and the Missouri School of Journalism. They’ve been turning out journalists who know how to use computers to support their reporting for decades. In recent years, the NICAR conference has grown by leaps and bounds, nearly tripling in attendance since 2012, when I first attended.

Over the years, I’ve found that the best sources for explaining this work are the editors and reporters doing it day in and day out, some of whom are building the tools and platforms that their colleagues use. That’s no small detail: In an industry that has not always been quick to adapt and adopt new technology, the emergence of journalists collaborating on GitHub and creating news applications that can scale to millions of users and quickly render on mobile devices is an exciting, important trend.

In the video below, I moderate a Google+ Hangout with several notable practitioners of DDJ. We covered a lot of ground in 53 minutes, discussing what data journalism is, how journalists are applying it, the importance of storytelling, considering ethics, the role of open source software, “showing your work,” and much more.



With the upcoming launch of new media ventures from FiveThirtyEight, Vox Media, and The Washington Post that prominently feature DDJ, along with existing newsrooms, the demand for people with the skills NICAR offers may never have been higher.

The stories data journalists can tell with these new tools and techniques reach the most aspirational heights available to the profession, revealing the hidden channels of money, power, and influence in society to the public and government, serving as a bulwark to democracy. That does not, however, make it a panacea. Just as data-driven policy can be corrupted by bad data, hidden biases, or mistaken analyses, journalists may also successfully clean and present data but fail to clearly tell a story to readers or wrap it in the necessary context. Skepticism and intellectual rigor becomes more important, not less, if journalists seek to apply a scientific mindset to their work.

While data journalism massive open online courses (MOOCs) offer bonafide new options for distributed learning, they are not a replacement for the experience of the hands-on workshops available to attendees. I’ll have more thoughts on education and learning when I publish a white paper on DDJ later this spring. We’ll update the article with the link to the white paper when it’s available.