Big Data has become a daunting proposition. The technology is fairly complex, and many of the leading analysis applications undergo almost weekly updates and are hardly enterprise stalwarts. Even if the technology were fairly simple, the salaries associated with people who can actually perform the complex analytics required to glean useful information from these data are climbing into the $300K U.S. range, and that’s assuming you can even find someone with the right credentials looking for a job.
Should you get the technology and people in place, the other major challenge in most companies is that their data are a mess, to state the situation kindly. In the best case, a company might have a reasonably well-maintained data warehouse that’s current and properly managed to eliminate duplicate and conflicting data. The more likely scenario, however, is the company with business-critical data stashed across a wide variety of systems and databases, with the interrelationships among all these different sources unknown, and multiple versions of “the truth” existing depending on your role or the application being examined.
While this situation is far from optimal, the good news with Big Data is that it need not be an all or nothing proposition. Data are imminently portable, allowing an IT leader and his or her business counterparts to dip their toe in the Big Data waters without building a complex and costly internal capability. Throw an RFP into the vendor-infested waters, and you’ll likely have potential partners lined up at your door, each claiming to be able to rapidly provide some sort of answer from the mass of data you propose to send them. Before jumping to vendor selection, try the following:
Create a hypothesis
While one of the promises of Big Data is that it can provide answers to questions you haven’t even asked, this is a scenario generally reserved for companies that already have a deep analytic competency. Sony in its prime had a deeper understanding than most consumers, allowing it to conceptualize the Walkman. For the average company, however, it’s easier to examine what you already know about your market and customer base and create a series of what amounts to questions that Big Data can validate and expand upon. If I’m in the movie business, I might hypothesize that consumers are no longer willing to pay for movies on a physical disc. A Big Data analysis might examine sales trends and macro data, and be able to validate that claim and provide some insight as to when such a shift might occur.
Examine your data assets
With a list of hypotheses in hand, examine that data you already own. While Big Data is useful with general and broad public data sets, it’s most strategic when leveraging data that are unique to your organization. Most companies have years of data, ranging from sales to customer demographics, that can be mined to answer your hypotheses.
If you’re in a high-volume, transactional business, Big Data may be relevant in identifying early trends in your business in near real-time. From potential security breaches to identifying sales trends on an hourly basis, existing transactional data feeds might be the best first step in leveraging Big Data in your business.
In either case, your data will likely need some cleansing. Look for data sets that will validate a few of your hypotheses, while leveraging data that are relatively clean and readily accessible. While your first foray into Big Data may not answer your most burning questions, strive for success with your initial Big Data efforts, rather than perfection.
Grab your partner
With relevant data in hand, mapped to a hypothesis or two, take a look at the many consulting shops now providing Big Data services. While historically the domain of the big guys, these types of services are moving down market. Usually, an initial effort will involve providing the vendor with a data set and your proposed hypothesis and having their data scientists create an analytical model to test the hypothesis. With the model in place, you can continue performing ongoing analytical efforts with the provider, or eventually build the tools in-house while getting your models elsewhere. For all but the most data-driven companies, this is an approach that allows you to experiment with Big Data without a major investment in building an internal capability.
As your organization matures, expand your data cleansing net, allowing more of your company’s data assets to become available for Big Data analytics. With these structures in place, you can rapidly march toward the level of analytical capabilities that are right for your organization.