For many IT organizations, data storage is an afterthought and not a strategic concern. However, when it comes to big data management, storage should occupy center stage.
Unstructured data is used to pictorially document key events, capture paper-based documents in a digital free-form format and report on company operations through sensors and other Internet of Things devices. Yet, a 2020 survey of C-level executives conducted by NewVantage revealed that only 37.8% of companies surveyed felt they had created a data-driven culture, and over half (54.9%) felt that they could not compete with other companies in the areas of data and analytics.
SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)
"About 43% of all data that organizations capture goes unutilized, representing enormous untapped value in regard to unstructured data. The importance of understanding, integrating and exploiting that unstructured data is critical to business efficiency and growth. Unstructured data serves little purpose unless it is put to good use," saidJeff Fochtman, senior VP of marketing at Seagate, which provides S3 storage-as-a-service. Fochtman was talking about the challenge of managing unstructured, big data, which he said represents 90% of all data worldwide in 2020 according to research conducted by IDC.
A major issue is data management. To get on top of data management, companies need data architectures, tools, processing and expertise, but they also need to think through their big data storage strategy.
To do this, unstructured data must be catalogued and analyzed; but the burden of cost for companies often prevents them from performing these processing-intensive operations, which require large data centers and cloud architectures that deploy very high-capacity data storage systems that are powered by hard drives. Secondly, once this data is processed, it must be able to be replicated and repurposed so it can be sent to the many different departments and sites throughout an enterprise that needs different types of data.
"The need to access unstructured data near its source and to move it, as needed, to a variety of private and public cloud data centers to be used for different purposes, is driving the shift from closed, proprietary and siloed IT architectures to open, hybrid models," Fochtman said.
SEE: Bridging the gap between data analysts and the finance department (TechRepublic)
In these hybrid models, data storage must be orchestrated so that different types of data are stored at different points in the enterprise. For instance, IoT data that in real time tracks operational effectiveness might be stored on a server at a manufacturing plant at the edge of the enterprise, whereas data that is stored for compliance and intellectual property reasons might be stored on premises in the corporate data center.
Since unstructured data is what it is—unstructured—the data needs to be tagged for meaning and purpose before subsets of it can be disseminated to different points of the enterprise that have varying needs to know.
The magnitude of data storage, cataloging, security and dissemination operations is daunting. It is making more enterprises turn to cloud-based storage that can be procured as needed without the cost-prohibitive need to upgrade corporate data centers with high-power storage drives.
"Every industry handling mass data sets from 100TB to multiple petabytes faces data transport and analysis challenges," Fochtman said. "For instance, consider the healthcare industry. The 100TB+ of data the industry collects is integral to protecting and treating the mental and physical health of communities. Hidden within the raw format of those massive data sets may be correlations between illnesses we may not otherwise understand, a more accurate analysis of cancer data or other learnings that could save lives. But with such quantities of unstructured data, what's the first step to derive value from this data? Often, it's putting that data in motion."
SEE: How to effectively manage cold storage big data (TechRepublic)
This makes sense when you want to derive the maximum value from your big data, which every company wants to do. It also brings the conversation back to storage, which is so often left off of IT strategic planning agendas when it shouldn't be.
Instead, a strategic focus should be on cost-agile and data-agile storage that can be expanded (or reduced) as needed. Cloud-based storage is best suited for this task, with a more circumscribed role for storage in on-prem data centers, which would focus on retaining highly sensitive data for corporate compliance and IP.
Attention should also be placed on how the data under management is distributed.
"We live in a data-driven world," Fochtman said. "Successful enterprises realize that if their mass data sets cannot move in an agile, cost-effective manner and if the data cannot be easily accessed, business value suffers."
- Geospatial data is being used to help track pandemics and emergencies
- Akamai boosts traffic by 350% but keeps energy use flat thanks to edge computing
- How to become a data scientist: A cheat sheet (TechRepublic)
- Top 5 programming languages data admins should know (free PDF) (TechRepublic download)
- Data Encryption Policy (TechRepublic Premium)
- Big data: More must-read coverage (TechRepublic on Flipboard)