Mary Shacklett offers some tips for how to make the transition from tracking traditional transaction data to the unstructured and rich data formats that come in from the web.
By 2015, research cited by Red Hat predicts that global Internet traffic alone will reach the zettabyte threshold, a four-fold increase in data under enterprise management from 2010. Research firm IDC reports that the volume of data under management in enterprises is doubling every 18 months.
Despite this, few enterprises regard storage and how data is stored as strategic areas of concern--and unlike IT networks, applications and databases, there are almost no storage certifications offered by private or public institutions.
This is the result of decades of IT thinking that has viewed storage as a "commodity decision area." Storage professionals are seldom invited to IT planning sessions. The reason? Hard disks have always been inexpensive, not even eliciting a blink in the budget when they are ordered up like so many flapjacks in order to store the data.
But with the growth of Web-facing applications and data, the look of data is transitioning in enterprises-from concise, structured record formats that are indicative of traditional transaction data-to the unstructured and rich data formats that come in from the Web. Now add to this the fact that almost every business it asking its IT to add business analytics to its workloads.
What to do?
Define your analytics/Big Data business cases
If you are a CIO, you need to get you finger quickly on the pulse of what the business needs the analytics for. Is it to predict consumer demand for products? Or to preempt operational issues by detecting trends (say, in shipping, or the supply chain) before they impact production? Once these needs are defined, it is possible to "slim down" the data that will be needed for the analytics.
Ask yourself if your data is ready to go
Business analytics that keys off of present and historical transaction data is easy to work with--but the picture gets murky if the business intelligence you seek is from unstructured, big data that is not stored or organized in any particular way. This data must ultimately be mined for analytics-and before it is ready for that, it must be "cleaned" so that duplicate or erroneous data is eliminated. The effort can require massive housekeeping, which even with data deduplication tools can take time while the business waits.
Ask yourself if your management team is prepared to wait
Like software and computer conversions, data cleaning projects are massive, tedious and unrewarding. Even when they are finished, dollars have been spent and the business has yet to see any value from the work. This is an area where the CIO must obtain both time and "buy in" from the business for the clean-up-and also be in a position to secure the best set of data deduplication and cleaning tools and consulting for the job-because you don't want to leave your business folks without an analytics solution for long.
Assess your IT infrastructure for big data readiness
Processing big data analytics in real time or near real time doesn't work on traditional transaction servers. You're going to need different kinds of processors, processing logic and storage approaches. There is an upfront cost for this-and also for getting IT trained. If your choice is to in-source business analytics, you've got to get the support for these investments from the business and the CEO. This requires plain English explanations of why the resources are needed and what their roles will be in harvesting the company's big data.
Revisit the role of storage professionals
Ultimately, big data needs processing-but it also needs the right storage strategy. IT will be hard-pressed to achieve the latter if it doesn't start inviting storage professionals to IT infrastructure meetings. This is an area where CIOS need to reassess priorities.