Not long ago, an online UK retailer ran some interesting and unconventional analytics on its Internet data and discovered that the housewives of devoted soccer fans went on online shopping sprees during the times that their husbands were away at the games.
This isn’t your “average” consumer demographic-but it did uncover hidden connections between consumer buying behavior and seemingly unrelated events that gave this company a significant competitive edge.
It should be noted that this isn’t an isolated occurrence, either. More and more companies are evolving (or will soon evolve) beyond the first stage of Big Data and transactional analytics. They are sharpening the insights of their marketing staffs beyond just getting first tier demographics data on consumers (e.g., where they live, how old they are, whether they are male or female). They want to know other elements that could be triggers of consumer behaviors—even if it is what consumers do when soccer games are being played.
If IT is to support these deeper-rooted analytics where more factors are related to each other for a meaning, it must stay on top of tools in the marketplace that can position the Big Data amassing in its databases so this data can be queried in new and innovative ways.
To date, Apache Hadoop has become a kind of de facto standard for sorting through big and unstructured data. Hadoop is capable of dishing out multiple threads of Big Data to parallel processors on analytics servers, and crunching through this data quickly. But one of the things Hadoop doesn’t do as well is log the connective relationships between the pieces of data that it is processing In other words, if your system uses Hadoop alone, it could take some time to come up with the conclusion that soccer fan wives in the UK go on online shopping binges while their husbands are at the game.
Social media outlets are already using new tools for this type of analysis, and a prime example is Facebook. When I signed onto Facebook the other day I immediately got an introduction to the site’s new “graph” database tool. The graph tool presented new dimensions in Big Data intelligence to me as a user-such as, “Click on this link to find people in Seattle who also like cycling,” or “Click on this link to see which restaurants in London your friends have recently visited.” Indeed, these very precise and highly complex drilldowns into Big Data seem every bit as inter-relational as the earlier example of the soccer fans’ wives going on shopping sprees during the games.
To provide this more inter-relational analysis of Big Data, Facebook uses Hbase, another Apache product. Hbase uses a set of tables that are all defined with their own unique primary keys.. Within each table is a series of columns that contain attributes for the table’s primary key. Fr example, if the table’s primary key is a person (e.g., “John Smith”), attributes contained within the columns of the table for that key might be hobbies (e.g., “likes cycling”) or place of residence (e.g., “lives in Seattle”). Where Hbase can out-perform a Big Data processing product like Hadoop is with its ability to analyze the types of connections between various piece of Big Data (and what they might mean) faster-and in a more granular way.
So what is the takeaway for IT?
Very simply, that as your organization’s Big Data analysis matures, you should also be on the lookout for new software and databases that can address these more sophisticated needs. Hadoop is a great place to start with Big Data-but the process shouldn’t end there.