Big Data

Election Tech: How to make yourself a DIY data scientist

TechRepublic has used big data and social media as a lens to understand the 2016 US presidential race. You can use our simple method to gain deep insights from social media for your business.

manchartsshironosovistock.png
Image: iStock/shironosov

"Campaigns are all small businesses," said author Jonathan Tasini, surrogate for the Bernie Sanders campaign in New Hampshire. "Social media data helps us better understand the issues voters care about, and how to best use limited campaign resources."

This campaign cycle TechRepublic is covering the relationship between social media, big data and political campaigns. There are a number of parallels between campaigns and startups—and small business can learn a lot by watching how campaigns gather and analyze social media data to help fine-tune messaging, react to the competition, and better understand issues.

Each day TechRepublic tracks several data points related to increasing and decreasing interest in candidate Twitter accounts through the campaign cycle. Our goal is to determine if, in fact, there is a relationship between Twitter activity and real-world results.

READ: How presidential campaigns use data and social media to microtarget voters

Twitter is our initial data source because the network is widely-used by campaigns and the media. Over the course of the campaign, we hope to add a variety of data points and social networks to our analysis.

We grab publicly available data from presidential candidate Twitter accounts daily from the public API, around 11:00 pm Eastern Standard Time and log the data in an Excel spreadsheet. We rely on public data like new followers, account follow-back ratio, relative growth, and the text copy of Tweet content because this information is not proprietary, and is available to everyone.

changeinnewfollowers.png
TechRepublic Election Tech Chart | Image: William Stodden/Excel

We use Excel to log our daily snapshots. We have a master log of Twitter data, as well as individual sheets for debates, elections, conferences and speeches as needed. This allows us to monitor historic trends, and zoom in on specific events.

Our method is simple and direct. In our daily tracking sheet, in each row we list the candidate name, followed by our priority data points. Through the primaries, we track number of Tweets, Followers, the Following to Follower ratio, and number of times each account has been added to a list. We add a new column each day. The only difference in our event tracking sheet is that we log by hour, rather than by day.

By logging information in this way, each day we are able to calculate nominal growth—pure numbers of additional followers—and relative growth, account percentage increase.

This information, logged over time, allows us to use Excel's built-in graphing tool to generate charts that present the data visually.

We do not yet have a theoretical understanding of how real-word events correlate with social media data, but charts generated from historic data produce data-driven insights, can assist in generating theory and speculating more effectively.

Our simple process can be also be reproduced using Google Sheets and Apple Numbers. Additionally, a number of powerful, professional-grade tools like Stata, Tableau (Tech Pro Research review), Apache Spark, Informatica and the R project are available for professional and enterprise users. These tools perform powerful visualization tasks, and can work with more diverse data sets.

LISTEN: Technologies shaping presidential race: The big 3 (TechRepublic podcast)

Third party sites can be useful for acquiring large and specific data sets. A mountain of social media meta-data is available from sites like Keyhole.co, Datasift, and Gnip. Third party sites like TweetStats.com, TwitterCounter.com, and Foller.me can be useful for aggregating social media account data for free or low cost. As with all third-party sites we strongly advise you examine the privacy policy on each site before diving in.

Our method, however, can be applied easily by gathering information manually from Twitter, Facebook, Instagram, and other social sites directly. If you're consistent about your data capture routine, this is the most simple method of building an information library and producing insightful charts.

If you are running a business, for example, you could apply the same kind of analysis we're doing with presidential candidates to your competitors. You could track their progress on social media and compare it to yours. You could also track hashtags, product names, product keywords, and industry jargon to detect changes in customer demand over time.

TechRepublic's Election Tech 2016 coverage

Over the course of the campaign we will continue to perform simple data analysis. In the future we hope to correlate sentiment with follower actions like retweets and likes. We hope to uncover additional and unique insights. If you're a data scientist, social media professional, or inquisitive TechRepublic reader we'd love your ideas on how to inspect campaign social media data. Please leave a comment below, or ping us on Twitter @ TechRepublic.

Read more:

About

Dan is a Senior Writer for TechRepublic. He covers cybersecurity and the intersection of technology, politics and government.

Editor's Picks

Free Newsletters, In your Inbox