IT departments preside over a mother lode of unstructured data, which has largely been overlooked as a source of insight in the gold rush for big data. Unstructured data comprises documents like PowerPoint decks describing company strategy, lead list spreadsheets, emails between coworkers, and social media interactions of customers, and therefore represents valuable information.
Knowledge workers spend a good portion of their days during the workweek creating and managing PowerPoint presentations, spreadsheets, email, and other unstructured data; this is a big investment in time, and the products of that labor are invaluable. But all too often, the right data doesn't reach the right people at the right time, and insights are missed or the work is re-created. This translates into lost time and money for organizations.
Mining the value from this data is not easy. Unstructured data can't be sorted, searched, visualized, or analyzed in the same way as, say, stock prices; it frequently requires new tools and processes to extract intelligence, share information, and deliver value. Organizations need a better way to mine for this gold. To get to that value, you will need to consider these four steps.
1: Draw a map to the gold - Identify sources
The first crucial step is to identify the sources of unstructured data of importance to your organization. These typically include file servers, collaboration tools like SharePoint, and even virtual machines run by the IT department. Keep in mind that each of these sources may have its own security settings when analyzing and sharing this data.
2: Create a legend - Add context and automate
Small and midsize organizations in particular need ways to cut the time it takes to answer strategic business questions based on unstructured data. Marketing teams want to know what content, customer, and collaboration trends are evident in their social channels. Security and compliance leaders need faster discovery methods. Healthcare professionals want to improve patient care by learning which workflows are effective in other parts of their organizations. Lawyers want to make connections between events to piece together complete pictures of past activities. In short, business users need all the help they can get to efficiently identify the information they need.
Context and metadata (i.e., information about the data that describes who created it, what it's about, and what other documents it references) are key to this process. They allow you to cross data silos and provide stakeholders with cohesive, contextual, and complete answers. Generating this context can't be done manually at any kind of significant scale; instead, organizations need the means to automatically track or calculate this in real time and do it in a way that doesn't overwhelm file servers, destroy storage budgets, or depend upon expert data scientists.
3: Use your mining equipment - Visualize data
Any solution for unstructured data analysis will need to incorporate visualization if users are to have any hope of deriving useful intelligence from their data. Structured data analysis tools -- or the mining equipment in this analogy -- have done this well for some time. Users get an executive-level view of information, and then they have the option to drill down through greater and greater levels of detail. Similar options are becoming available for unstructured data analysis, as well. Through visualization tools, users can easily spot anomalies and determine the information they need to respond quickly and accurately.
Visualization technology designed for unstructured data analysis must be able to synthesize information from multiple sources and deliver the results in a unified fashion. To increase its value to end users, such tools should be able to overlay results with structured data, as well. That kind of capability could, for example, allow an analyst to evaluate social media messages, news stories, and other unstructured information during a certain time period and then relate it to structured information about something quantifiable, such as stock price fluctuation.
4: Enable profit - Act on the intelligence
IDC estimates that information workers now burn about 20 percent of their time on the job just tracking down data. It's easy to see why. In most companies, data is created and accessed ad hoc, and it isn't organized to encourage accessibility. Without clearly defined schemas for the vast majority of the data companies are generating, end users can't massage it, visualize it, or manipulate it in any meaningful way.
Buying clusters of servers or hiring data scientists are not realistic solutions for most companies. Instead, organizations need solutions capable of automatically analyzing this unstructured data and presenting it visually for more effective analysis. Once they have those tools, organizations can begin to act on these insights and profit from their valuable stores of data and the intelligence that was previously buried within.
Steve Kearns is the director of product management for DataGravity, focused on defining and delivering data intelligence. He has spoken at conferences around the world about the power of search and analytics and has worked with many of the world's most successful companies and government agencies implementing these technologies.