Big Data is fraught with challenges: untested and emerging technologies, difficulty finding capable employees and partners, and unclear or unrealistic expectations around what this technology can actually accomplish. In addition to all these worries, one of the major problems with Big Data is readying your Enterprise Applications to actually supply the data needed for analysis. Before dashing off RFPs for a Big Data initiative or signing purchase orders on fancy new hardware and software, consider the readiness of your Enterprise Applications.
A major mistake of Big Data efforts is not getting a handle on the location, structure, and quality of your existing data. At its core, Big Data is an analytical process designed to produce actionable information. Just as knowing the process of baking bread will never result in a tasty loaf without a sack of flour, Big Data will never result in anything actionable without appropriate data.
While it seems trite, the first step in analyzing data is knowing where to find the data. In large companies, there may be tens of thousands of enterprise applications connected in a spaghetti-like web of interfaces, middleware, and data stores. Seemingly simple information like customer or sales data may be scattered across loosely correlated systems, and merely identifying where to find all the attributes of one of these objects may take a project in itself.
Even if you have no immediate plans for Big Data projects, it’s worth locating and rationalizing your data. At a minimum, understand how critical data are stored and processed through your environment, and consider application rationalization or data warehousing projects to streamline and centralize your data.
Finding the truth
If you’ve ever participated in an enterprise application implementation, you’ve probably heard the term “one source of the truth,” as a major problem these types of systems are meant to solve is providing a single location for critical data. Despite the intentions of these systems, most enterprises have multiple sources of the same data, at different stages of their lifecycle. Engineering might view a product as “released” in its systems, while supply chain systems consider the product long discontinued.
If you’re serious about Big Data and analytics, eliminating these multiple sources of the truth will go a long way in streamlining data gathering. If you lack confidence in whether the datum you’re considering is “the truth,” how can you have any confidence in your analysis based on that datum?
Crawl before you walk
Telling your business counterparts that your organization is simply not ready for analytics is likely not going to be a popular option, so in the short term consider “3-C initiatives” that consolidate, cleanse, and centralize data. These need not be solely data focused; by virtue of putting in an enterprise-wide accounting package, you’ll get functional and process improvements and be forced to clean and consolidate your data along the way. Before launching a major Big Data initiative, consider some more traditional enterprise analytics to gauge your organizational readiness for gathering, cleaning, and analyzing a massive trove of data.
These efforts aren’t as exciting as the evolving world of Big Data, but you’ll be better equipped and more effective in the long run if you’ve prepared your enterprise applications for this new world, rather than attempting to wrangle your data for each and every analytical effort that comes your way.