Big Data

Why data-driven campaigning matters in the 2014 US elections and beyond

The 2012 Obama campaign took data-driven campaigning to a new level. In the future, more campaigns will leverage data to decide how to influence you to vote.

electionscnet092614.jpg

As is the case with every industry and sector in society, technological changes are having an impact upon the political process. In just over a month, Americans will head to the polls to vote in the midterm elections in the US. Despite historic lows in public approval, with just 14% of the people approving of the actions of what looks like the least productive Congress in history, if reelection trends hold up voters will send over 90% of incumbents, continuing stagnation. Why? Gerrymandering is one answer. Another is that, while people don't much care for the job that Congress is doing, they generally approve of their own representative. Plus ça change, plus c'est la même chose.

As is the case in every US election cycle where the presidency isn't at stake, voter participation rates are likely to be low. It's possible, however, that those dismal ratings might increase turnout. Combined, those trends could raise the stakes a bit for contested elections, which in turn puts a premium on the ability of campaigns to register new voters who support their candidates, activate electoral bases, and get the voters they care about to the polls. That's where data-driven campaigning matters, and why the combination of mobile devices, analytics, and social media has captured the attention of the smartest consultants around and the imaginations of aspiring politicians, threatened incumbents, and conspiracy theorists alike.

Campaigns and consultants have been using data for decades, of course, from public voter files to commercial data. Profiling voters isn't novel, nor is the use of demographic data. What's new is the explosion of data collection around other activities that can inform those profiles and models of behaviors: what you buy, what you watch, what you "like," what you share, or which people or brands you interact with online. In aggregate, that data can be used to predict any number of things, from which political party you're likely to be in to how you stand on a given issue to how likely you are to be persuaded to shift your vote. That last element can add up to a "persuadability score," and measuring it accurately and acting on it is something an increasing number of campaigns that want to win are paying attention to today.

"You're seeing analytics shops pop up left and right," said Patrick Ruffini, cofounder of Echelon Insights, a well-regarded new political intelligence firm, and chairman of Engage, a political advocacy consultancy, in an interview.

"The question is how many [campaigns] are taking advantage of them. I haven't looked at FEC files recently, but my sense is that most are reliant on national infrastructure from the Republican National Committee (RNC). It's filtering through systems designed, with Data Trust and the RNC. It's seen as a service the RNC will provide, and it's happening to a degree it did not happen before."

These data files and analytics are used to target potential voters online in search or social media and by canvassers offline going door-to-door using mobile devices, tailoring messages to individual voters. As Sasha Issenberg reported in TechReview and in his excellent book, "The Victory Lab," these techniques were how the Obama campaign used data to rally voters in 2012. At this point, the story of the engineers that built the software behind the 2012 Obama campaign has been well-told, but the work of the campaign's data science team was closely held until after the election, despite ProPublica's considerable efforts to report on the operation.

Last year, I was offered an unusual opportunity by my former colleague, Alistair Croll, to interview the Obama campaign's chief data scientist, Rayid Ghani, at the Strata Conference. Ghan, now the chief data scientist for the Urban Center for Computation and Data, was (to me) surprisingly frank and forthcoming about what they did. Our conversation is embedded below. (If you have an appetite for more insight, read Jim Rutenberg's feature on what happened next for other "digital masterminds" in the campaign.)

In 2012, data-driven decision-making enabled the Obama campaign to choose advertising buys, raise money, and model voter movements as the race came down to the wire. As Issenberg detailed in an analysis of the upcoming midterm elections, the more that data from actions can be tracked in experiments, the more efficiently and effectively campaigns can apply their resources:

"Individually targeted tactics like direct mail, phone calls, and canvass visits, however, are easy to test, particularly when measuring methods to register and turn out voters. Over the last 15 years, hundreds of experiments have yielded a clear understanding of which of those methods perform best. The most effective techniques now appear on a frequently updated Analyst Institute best-practices sheet that has become mandatory decor in Democratic field offices.

Accordingly, field operations have been transformed from busywork for volunteers into the most rigorously scientized corner of the trade. All the research suggests that the most effective form of outreach is also the most seemingly old-fashioned: a conversation on a doorstep between a potential voter and a well-trained volunteer. Experiments have even pointed the way toward the best kinds of volunteers; canvassers can be most successful when they're reaching out to non-voters of the same ethnicity or from the same zip code. Productivity has been tabulated, too. Surveying a decade's worth of experimental research for the 2008 edition of their book Get Out the Vote, Yale professors Don Green and Alan Gerber calculated that a typical canvasser can complete six encounters per hour. Assigning a monetary value to that labor makes it possible to put a price on democracy. One new vote mobilized through fieldwork costs $29."

To the extent campaigns can adopt and adapt technical infrastructure, they're now being adopted around the country and the world, as campaign veterans and consultants bring their expertise to other races and the private sector. At present, the action is taking place primarily within groups that have national scope, where they're trying to duplicate the level of infrastructure, analytics, nightly modeling, and training that the Obama campaign achieved.

"Some larger campaigns are using these tools," said Ruffini, "but as you move down the chain, you'll see less and less use of sophisticated analytics at the local level. The challenge is how you use this effectively, where Congressional campaigns do have a lot of data. It's something I'm working on, so we can get it down to where it can be used by everyone in the space."

What many of those campaigns are still missing is capability and a "megafile" like the one Michael Scherer detailed in the "secrets of the data crunchers" the treasure trove of detailed profiles that helped Obama's campaign win:

"The new megafile didn't just tell the campaign how to find voters and get their attention; it also allowed the number crunchers to run tests predicting which types of people would be persuaded by certain kinds of appeals. Call lists in field offices, for instance, didn't just list names and numbers; they also ranked names in order of their persuadability, with the campaign's most important priorities first. About 75% of the determining factors were basics like age, sex, race, neighborhood and voting record. Consumer data about voters helped round out the picture. "We could [predict] people who were going to give online. We could model people who were going to give through mail. We could model volunteers," said one of the senior advisers about the predictive profiles built by the data. "In the end, modeling became something way bigger for us in '12 than in '08 because it made our time more efficient."

Early on, for example, the campaign discovered that people who had unsubscribed from the 2008 campaign e-mail lists were top targets, among the easiest to pull back into the fold with some personal attention. The strategists fashioned tests for specific demographic groups, trying out message scripts that they could then apply. They tested how much better a call from a local volunteer would do than a call from a volunteer from a non-swing state like California."

Ruffini's perspective is that the two major parties in the US are close, if not at, equal terms on data collection.

"There has been an effort to do this, to move the actual data file outside of the RNC so it can talk to third-party organizations," he said. "There's concern about it being fragmented, which you can analogize to NGP VAN and Catalist. There are splits on the Democratic side, too: some firms are more Clinton-aligned, others more Obama-aligned. There are steps that have been taken to improve data sharing, like Data Trust." (For a deep dive into the history of this space, read Issenberg's book or his 2012 column on the RNC and Democratic National Committee getting into data mining.)

"We can basically say that there is going to be parity on data," said Ruffini. "It's relatively easy to collect public data files. This is something Democrats were behind on in 2004, and they caught up and leapfrogged Republicans. Today, we're talking about sharing the same information and data sets. Democrats tackled it in 2010 and 2012. Republicans are doing it now."

As in so many other contexts, however, collecting or gaining access to data and cleaning it is a first, if fundamentally essential, step. Putting it to work is another matter.

"I view the analytics as something that is at this point fairly easy to do," said Ruffini. "I could log onto a system today and launch 2,000 survey calls into a state. With a voter file, I could match up and do modeling. With the modeling scores, I can ship names to an online ad vendor and target ads. This is all getting to the point where more people have the skills to do this."

If that is the case, why isn't every campaign doing this? Cost, culture, and capability remain barriers to entry, although that's changing.

"There's both a cost factor and technical complexity," explained Ruffini, "and people [in the campaign] even understanding what it is they're going to be doing. The first barrier is even explaining what a 'persuadability score' is. Within every organization, there will be people who understand. This has only been introduced in a couple cycles. There isn't much good information about what's happening until the campaign is over, and then everyone jumps on the shiny object."

While candidates or incumbents in 2014 may wish to tap into the same abilities, but for many the capability may remain out of reach for this cycle.

"Not every campaign has those resources," said Ruffini. "The best way is through field experiments, where you measure what's different between those contacted and those who were not. There are various other ways to get at how movable someone may be, through self-reporting, though that's not trustworthy. It can get us 90% of the way there to knowing the likelihood of something to change their mind."

If you fit the profile for a swing voter or an independent in a contested area, you may well already be seeing personalized ads already on Google and Facebook.

"It's an interesting frontier," said Ruffini. "It's now very easy to get that data and onboard it onto cookies. Certainly data providers and data platforms are making it very easy to match a voter profile of certain people and certain segments to match ads against people. It's at the point where I can get a highly customized audience of people most likely to change minds and undecideds in a given race. This is not a 'be-all and end-all tool,' though: there are many tools in the arsenal. You can send direct mail, or match universes to TV rating systems to figure out which buys to make."

The 2012 Obama campaign also combined techniques from social science and consumer advertising with data analysis to "win friends and influence people" on Facebook, along with figuring out which voters would be best reached with non-digital outreach like calls or visits.

"It was a really interesting innovation and something we're working on," said Ruffini. "We are doing persuadability at the Congressional level. In terms of Obama's campaign and how they built this, they went door-to-door and then measured the effects."

Ruffini, who was the webmaster of the 2004 Bush-Cheney campaign, has been at the cutting edge of digital politics and advocacy for a long time. (A decade might as well be a century, in internet time.) He explained to me that the use of these tools in campaigns is where the use of the web was three campaign cycles ago. "Data science is now where the web was in 2008," he said.

One thing that neither Ruffini nor Ghani talked about in our discussions (and I didn't bring it up) was how the power that technology platforms and the companies and executives that control them now have to move voters or votes. On Election Day 2012, for instance, both Google and Facebook chose to help voters find their polling place online, a capability powered by open data supplied by the Voting Information Project. (This year, I bet they'll do it again, hopefully with even more accuracy.) In 2012, Facebook also did something else that led to public policy manager Katie Harbarth ending up in the "Politico 50": they included an "I voted" button that political scientists concluded drove more young people to vote, through "social contagian." The button was used more than four million times in India's recent elections and is rolling out around the world.

Increased civic engagement through the world's social network sounds good, but it comes with a catch: a lack of transparency about who the button is shown to and when. That's what led Harvard computer science professor Jonathan Zittrain to warn about the potential for digital gerrymandering earlier this year, where an election is subtly influenced without anyone knowing.

Even if the platforms themselves stay neutral, the possibility for campaigns or third-parties, like "super political action committees" that have emerged in the US, foreign governments, or non-state actors, to engage in engineering the public no longer is the stuff of science fiction. There's already a gap between what we don't know about elections and the reality. If democratic states are going to be able to detect that kind of effort, more investigative journalists will need to learn some data science, practicing data journalism, and campaign and election regulators will need more technical expertise to conduct audits and enforce algorithmic accountability.

About

Alex Howard writes about how shifts in technology are changing government and society. A former fellow at Harvard and Columbia, he is the founder of "E Pluribus Unum," a blog focused on open government and technology.

Editor's Picks