If you’re responsible for managing enterprise data, you know that storage can be both a blessing and a curse. On the one hand, having access to large amounts of data can be immensely helpful in making business decisions. On the other hand, storing all that data can be expensive and downright chaotic to manage.
SEE: Cloud data storage policy (TechRepublic Premium)
That’s where hierarchical storage management comes in. HSM is a system for storing data in a secure, cost-efficient manner. In this guide, we give you a short crash course on HSM, what it is, how it works and some of the benefits it can offer your organization.
What is HSM?
HSM, or hierarchical storage management, is a system for storing data in a secure, cost-efficient manner. The basic idea behind HSM is to store data on the most appropriate type of storage media, depending on how frequently the data is accessed.
For example, data that is accessed frequently can be stored on more expensive, higher-performance storage media such as solid-state drives, while data that is accessed less frequently can be stored on less expensive, lower-performance storage media such as hard disk drives.
SEE: Quick glossary: Data storage (TechRepublic Premium)
HSM, though a long-standing idea, has changed drastically since its conception due to the advancement of technological storage and communication methods. However, even as aspects such as size and access time are now unrecognizable compared to the past, many of the original concepts remain popular today — albeit on a much grander scale when dealing with big data.
How do HSM systems work?
HSM systems work by automatically moving data among different storage tiers based on how often that data needs to be accessed. Data that is accessed frequently will be stored on fast, expensive storage media like SSDs, while data that is accessed infrequently will be moved to slower, cheaper storage formats. This ensures that users always have quick access to the data they need while minimizing storage costs and power usage.
HSM is comparable to the cache on most computer CPUs. Like the CPU cache, frequently used data is stored on small and fast SRAM memory while less frequently used data gets moved to slower but larger DRAM when new data needs to be loaded.
HSM components and algorithms
HSM systems typically consist of three key components: A data migration policy, algorithms for managing data, and a mechanism for tiering or caching data. The data migration policy defines how data should be moved between different storage devices based on factors such as frequency of use or importance. The algorithms used by HSM systems help to determine which data should be stored on which device based on criteria such as how often the data is accessed or its size.
SEE: What is data migration? (TechRepublic)
Some of these algorithms include the Least Recently Used replacement, which moves data that has not been accessed recently to lower-performing storage tiers. Size-Temperature Replacement is another commonly used algorithm that uses both temperature and size thresholds to determine when to migrate data. The Heuristic Threshold is a newer algorithm that uses machine learning technologies to predict more accurately when data should be migrated.
Tiered and cached HSM
The debate regarding tiering versus caching is one that HSM system designers face when trying to determine the best way to utilize lower-performing storage tiers. While tiering offers better long-term performance, caching can provide better short-term performance.
Regardless of whether you choose to tier or cache your storage, these mechanisms ensure that frequently accessed data is stored on faster, more accessible devices, while less frequently accessed data is stored on slower, cheaper devices.
Benefits of HSM
- Cost savings: HSM systems allow users to store data on less expensive storage media whenever possible, allowing enterprises to save money on overall storage costs.
- Increased performance: HSM systems allow you to store frequently accessed data on higher-performance storage media such as SSDs, thus improving your system’s overall performance.
- Increased security: Hierarchical storage management systems allow you to store sensitive data on more secure storage media such as SSDs or HDDs with encryption capabilities. These options can help increase your system’s overall security.
- Improved manageability: HSM systems allow you to store data on different types of storage media depending on how frequently it is accessed. They can help improve your system’s overall manageability by making it easier to find and retrieve specific files when needed.
Top HSM solutions
There are many different HSM solutions available on the market today. Some of the top HSM solutions for big data include:
- IBM Spectrum Scale
- EMC Celerra/VNX
- NetApp FAS/AFF
- HPE 3PAR StoreServe
- Huawei OceanStor Dorado
- Qumulo Core
- Red Hat Ceph Storage
Read next: Top data integration tools (TechRepublic)
Note that this list is not exhaustive and is in no particular order. Many different HSM solutions are available on the market today, so make sure your storage engineers and other data professionals take some time to research several options to find the best solution for your company’s needs.
Subscribe to the Data Insider Newsletter
Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays