The Open Data 500 study is an index of US companies using open government data to generate business and develop products. Read our interview with Joel Gurin, the project's director.
I've been trying to make dollars and sense of the open data economy for years. Does releasing data result in a more transparent healthcare marketplace? What business models for open data work? Will publishing open climate data increase community resilience? Will releasing more open government data make better laws and result in more accountable governments, improvements to public services, or trillions of dollars of additional economic activity, as McKinsey & Company projected in October 2013?
The somewhat frustrating, if accurate, answer to all of these questions is often "maybe," with anecdotal examples and many layers of caveats to ground the response in context. Before I go any further on that count, I think it would be useful to step back and consider what open data means.
Open data: What it is and what it means
First, for purposes of discussion, I'll use open data to describe data that is:
- free for use, reuse, and redistribution;
- released under a legally open license; and
- technically open, available in a machine-readable format.
Such data may come from the public or private sector, academia, or media, and describe an infinite number of things that those entities collect or generate data about or around.
Second, open data can and does have value for government or corporate transparency, whether it is aggregated and released by governments, nonprofits, academia, the private sector, or media, not just economies. Open data may describe laws, regulations, performance, budgets, spending, services, sensor readings, or a host of other government functions or processes. When open data from regulatory agencies is released, it can improve services, and protect or empower consumers.
Third, open data about the business or functions of government may be different than that around deliberative processes, particularly with respect to how decisions are made, by whom and with what justification, which is to say how power is exercised. That means that not all of it may be directly useful to holding officials and executives to account. On that count, it's true that not all open data is about transparency, because it isn't. It's also true that some is, and that we should strive for nuance in discussing it. For those looking for more context on this count, read about the ambiguity of open government and open data, and this seminar on open data (PDF) at Crooked Timber.
All that being said, one of the answers to the questions I posed at the outset of this column are increasingly clear in April 2014: releasing open data fuels economic activity, creating value in both the public and private sector. Research from McKinsey suggests that seven sectors could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations.
While governments looking for economic outcomes from open data must focus on releasing assets with business value, the reach of that category across sectors is quite broad, as new research from New York University (NYU) makes clear.
The Open Data 500 study
The Open Data 500, conducted by the Governance Lab (the GovLab) at NYU with funding from the John L. and James S. Knight Foundation, bills itself as "the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services."
This weekend, I spoke with Joel Gurin, senior advisor at the GovLab and director of the Open Data 500 project, about his book, Open Data Now, and what he has learned since its publication. Gurin was formerly the executive vice president of Consumers Union and editorial director of Consumer Reports. From 2009 to 2012, Gurin was the chief of the Consumer and Governmental Affairs Bureau of the Federal Communications Commission and served as chair of the White House Task Force on Smart Disclosure.
"We have better and better evidence that the commercial application of open data is widespread and comes in many forms," he said, in our interview. "Through the Open Data 500, we have been able to really document the breadth and diversity of companies using open data."
Gurin said these 500 examples show that open data is a key business resource, not a niche of the economy. "2014 feels like the year open data is coming of an age as a source for business ideas," he said.
Of the 500 in the index, more than 180 responded to the GovLab's survey. Gurin says that the respondees have tended to be new companies, with two thirds of them founded in the last five years. "These people are doing more than developing apps," said Gurin. "Half have more than 10 employees. They're building to do something bigger. We're seeing them all over the country. While California and New York are way ahead, there are companies in Massachusetts, Washington, Virginia, Illinois, Texas, and beyond. We're seeing businesses in 36 states. This is the kind of tech that can be applied everywhere."
The study broke down the business types into two broad categories: companies that generate revenue from advertising, lead generation, and subscriptions, and companies that do data management, analysis, licensing, or similar software development.
As the graphic below illustrates (a fully interactive version is available at the Open Data 500 website), these companies are acquiring open data from 30 agencies and offices at the federal government level. (Companies use state and local open data too, although the index didn't go as deep in that area.)
"Commerce supports all of the categories, largely through the U.S. Census and some NOAA data," said Gurin. "What this means is that even an agency like HHS shouldn't just chat with healthcare companies."
Gurin also highlighted increasing international interest, from the impact of open data on developing countries to developed countries moving to tapping it as a national resource. (Research from the Oxford Internet Institute, based upon Open Knowledge Foundation data, suggest that the wealthiest countries may be the most open with their data.)
Given that this index and the book paints a generally rosy picture of the space and the outcomes from the release of open data, I asked Gurin about where potential negative results might occur.
"The biggest fears — legitimate ones — is that publishers can't sufficiently anonymize aggregated personal data," he said. "There will be more demand. It's really hard to avoid variations on mosaic effects."
Gurin said that several presenters at a recent conference in Ohio that he chaired raised this issue to him, holding that many ways of anonymizing data will be breakable. (The recent controversy in England over the sharing of patient data from the National Health Service is useful context here.)
Gurin also highlighted the "nothing happened" scenario, where industry incumbents and public officials create obstacles around different kinds of open data releases.
"In terms of the use of open government data," he said, "perhaps the biggest risk is that the agencies involved will become overwhelmed by the enormity of what they're trying to do. There are so many horror stories about the terrible shape of data sets, in terms of technology, accuracy, and data gaps. We know that it's a mess. If you combine with people in agencies sensitive about being blamed for problems, and people on the business side finding workarounds and profiting from the conclusion, the risk is that it could be stuck."
Even so, Gurin is fundamentally optimistic about increased access to open data around the world, from the World Bank and development data to more open scientific data. "The foundations funding research are demanding it," he said. "Look at Alzheimer's, multiple myeloma, or Parkinson's. They really want data to be shared."
The huge extent and diversity of open data companies was surprising to Gurin, both as he researched the book and conducting the Open Data 500.
"We're seeing a real acceleration of interest and visibility, going from a quiet underground to a genuine business and government movement," he said.
"I'm hoping that's true. I think it's a good thing, because people and companies recognize value will drive more demand to make data better and more useful. There has been a supply-side model, where agencies pick data, and put it out. We're seeing a move to a more demand-driven model. As companies recognize value, they demand better quality and standards."
Given his career of consumer advocacy, however, the potential of open data to empower regular people animated Gurin the most when we spoke.
"We're moving into a reboot of the consumer movement," he said. "I spent 15 years at Consumer Reports. I loved the organization. We focused on putting trustworthy information online. The initial smart disclosure company was Consumer Reports. It evolved back in the 1930s, when data didn't exist. It was an expert-driven model. We spent $20 million dollars a year on lab testing when I was there. There will be cases where that's important in the future, where people want expert judgement. As more open data became available, however, we did definitely use it. In the same way we have shifted from thinking about expert reviews to social media, I think we'll see an analogous shift from expert opinion to a combination of government data, social media, and consumer comments."
Gurin specifically pointed to the open data releases of the Consumer Financial Protection Bureau as very interesting.
"As a new agency, they don't have legacy culture or constraints. They're using technology well. Their use of consumer complaints are a fascinating phenomenon. While they come from consumers, there is a government agency figuring out how to validate and release them. This "skeletons in the database" paper can level public accountability, and can change the balance of power between consumers, service providers, and manufacturers."
Gurin, however, would ask the public, media, and government to think about open data in the big picture.
"This really can have tremendous impact on science, society, technology, the economy, and consumer well-being," he said. "Smart disclosure is a big part of it being relevant. It has tremendous applications for agencies. Just think about how you could calculate ROI for a particular college, major, and institution based upon whatever data you can collect. It's a completely different way to think about healthcare, in finding the right doctor. When you look at the Climate Corporation, they set out to sell weather insurance, not focus on agriculture. Now, they're talking about adaptation to climate change globally."