Cold storage is the term for infrequently accessed but necessary data. Learn how cold storage works, and how it can help with big data.
There has been significant enterprise focus on tiered storage algorithms that route data to the most appropriate storage media based upon how frequently the data needs to be accessed, with particular attention paid to tier one, the most rapid access storage for data that requires quick and frequent retrieval.
But back in the "data dungeon" where up to 85% of all enterprise data resides in storage that is seldom accessed, there is an equally looming crisis of how this data can be optimally managed and maintained at the lowest cost, with appropriate data storage, retrieval, security, and access policies in place.
The name for this infrequently accessed but nonetheless necessary data is "cold storage." Determining whether data is "hot" (frequently accessed), "warm" (moderately accessed), or "cold" (infrequently accessed) is often the job of a storage administrator who assesses how long it has been since various categories of data have been accessed. In some cases, data centers are even beginning to use automated storage tiering software to make these data storage decisions.
Big data factors into the discussion because there is so much of it. For purposes of governance (where data is required to be retained even though it isn't regularly being used), business continuation (where big data as well as "regular" data needs multiple data repositories for disaster recovery failovers), and just general sanity reasons of needing to know where everything is, sites have to look for low-cost, slower cold storage so they can affordably keep this seldom accessed data under management.
"We recognized the need for cold storage with big data when we studied the market and saw that streaming all the big data out that organizations were accumulating was a growing problem," said Jeff Flowers, Founder and CEO of Storiant, a cold data service provider for private cloud environments.
Flowers and Storiant say they can retain data securely behind a firewall for as little as $.01 per gigabyte per month. Large-scale public cloud service providers like Amazon also offer cold storage services, but the difference is that Storiant delivers it to enterprises for use in enterprise private clouds; this appeals to many companies that are leery of using public cloud services for their data.
How cold storage works, and how it can help with big data
"Big data comes in large blocks, and big data analytics are often required to process against large data objects that are terabytes in size," said Flowers. "Using cold storage, we become a 'data lake' mass of storage that can be scanned through by a Hadoop compute node."
Solutions like Storiant containerize large data objects that contain the unstructured data that characterizes most big data, and also Internet of Things (IoT) data (like website log files) that increasingly comprise big data. An enterprise can sort through and classify all of this data when corporate IT uses a cold storage solution and decides which storage containers that specific chunks (or objects) of big data are going to be sorted into. At the same time, permissions can be assigned to each container that establish who has access to the data in the container.
Flowers says that internet services providers are moving quickly to implement this style of cloud-based cold storage for big data, because it is elastic with its ability to expand or contract as needed. Cloud-based cold storage is also financially agile, because it eliminates the need to long-term amortize data center capital expenses (CAPEX) in favor of a more flexible conversion to a pay-for-use operating expense (OPEX) approach to cold storage that enterprises can control in the short term.
"We believe that the Internet of Things will continue to exponentially increase the amount of big data that enterprises will need to manage, with millions of devices and data sources from around the world feeding in large volumes of big data," said Flowers. "Financial institutions, pharmaceutical companies, and government are already in need of large-scale, low-cost cold storage for big data. In the end, enterprises are going to have to find a way to safely and securely run low-cost analytics, and cold storage services in a private cloud setting provides this."