Big data exploration usually starts at a high level of data abstraction, and then gradually plumbs into the depths of the data as companies learn more from it.
The approach has worked well, and is operative in many different types of applications.
SEE: Special report: Turning big data into business insights (free PDF) (TechRepublic)
For instance, GIS and mapping systems use data to visualize a big picture map and then to focus in on a specific point or location. As the data analyst drills down to this location, they can then look at other related data that might be appended to the location such as the demographics of individuals who live at that location, or the number of traffic accidents at that location.
However, there is also another ground up approach that has the ability to unlock hidden values of big data. This approach actually starts at the lowest level of the data and then works its way up to more sophisticated data structures to deliver data insights that are helpful to management and staff.
Here is an example:
"A single pixel display can reveal the visible color of a point, but also the infrared value, which can be used to measure vegetative health," said Layton Hobbs, research and development director and vice president at Woolpert, an architecture, engineering and geospatial solutions firm.
Hobbs is talking about the potential of agriculture and forestry companies to go beyond basic geospatial data that they collect and unlock hidden treasures that are buried in geospatial data such as data on topography, soil, ground cover, plant health, and tree canopies.
"Most geospatial data is created for one specific reason or need, but there is so much more information in geospatial data that is underutilized or not recognized," added Woolpert's associate and geospatial discipline leader, Joe Cantz. "Particularly with the newer technologies, the data-rich information is growing exponentially, but we are using only a small percentage at this point."
According to Woolpert officials, geospatial data pixels are capable of storing a much wider range of values than the traditional 256 values of an 8-bit image. "These modern systems often store four bands of data (red, green, blue and infrared) at up to 12 bits or around 4,000 values for each band," said Hobbs. "Combining those four bands for image interpretation creates 256 trillion possible combinations at one spatial location! This is definitely overkill for most applications but shows the potential for big-data applications of imagery."
SEE: IT leader's guide to big data security (Tech Pro Research)
Why does this matter for company big data projects?
IoT data, such as data captured and emitted by sensors, immediately comes to mind.
With IoT, you can start with your own top-down big data initiatives and analytics when it comes to utilizing data and imagery that gets sent from sensors on board drones—but what if you looked into each individual pixel of data that the drone was sending back—and discovered that there was additional data value captured that could answer questions that you weren't interested in today, but could be in the future?
Here's how you can optimize data for both current and future use:
Analyze what is possible to extract from a given unit of data (e.g., a pixel), even though you may not care about all of this information today.
This can be easily done. Referencing Layton Hobbs' example, maybe you don't care about the health of the forest floor today, but if you one day want to restore this forest after a harvest, understanding something about forest health will help. At that point, knowing everything you can obtain from your big data under management becomes significant.
Catalog the information capture that is possible at the lowest unit of big data.
If you are dealing with a pixel and you know that forest health and topography is possible to analyze from this data and you make a record of it, it is much easier to remember the information potential of your data and to activate it if and when you need to.
Don't lose yourself in the details
While it is important to catalogue the information potential of your big data at the lowest level of data, it is equally important not to lose yourself in the details. If your job today is simply to map a forest and to identify stands of harvestable timber, stick with that. Don't get off course with other types of data explorations that aren't relevant to the task at hand.
Anticipating lessons learned
When I was running a marketing department for a bank, we used demographics for one of our checking campaigns by identifying persons in certain locations by age group, and then linking checking products to the various life cycle stages that customers were in. Later, we wanted to improve results, and we added occupation as well as age for targeting our checking products.
This is a common scenario for companies. They want to go back to the data to see if they can add more information so they can improve results.
By assessing and cataloguing the potential information yield of big data at the lowest level of the data, data analysts can be poised to open up the data to more comprehensive analytics that can unlock the answers to questions that the company will want to ask next.
- Infographic: Most companies are collecting data, but aren't using big data solutions (TechRepublic)
- Apache Hadoop: The smart person's guide (TechRepublic)
- How to keep your data lakes from becoming cesspools (TechRepublic)
- Big data policy (Tech Pro Research)
- Turning big data into business insights: The state of play (ZDNet)
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.