Data Centers

An enterprise storage dictionary for non-experts

This list of must-know enterprise storage definitions might be especially useful for non-technical executives or users who want to learn the basics.

The Storage Networking Industry Association's recently updated technical dictionary has a potentially critical flaw—readers almost need to be experts to understand some of it—so we decided to make our own storage glossary for everyone else.

Where the SNIA smorgasbord offers more than 100 terms for the letter 'A,' we give you a tasty nosh of less than 40 must-know definitions spanning the alphabet. Learn these storage terms, and you'll sound smart enough to make the IT director listen.

SEE: Power checklist: Managing and troubleshooting cloud storage (Tech Pro Research)

Active:active / active:passive: All modern enterprise storage arrays include controllers (servers) that move data between their drives and the external network. If all controllers are working all the time, then it's an active:active system. If half the controllers are working all the time and the other half are on standby, then it's an active:passive system. Inside baseball: Controversy rages at storage companies when their marketing and sales staff disagree over which parts of the controllers are active or passive in various situations.

Array: A storage array is a chassis filled with drives and a controller. The drives are usually enterprise-class hard disks or solid-state (flash) disks, but they can also be serial-ATA style. The array controller is a server with the role of traffic cop between all data going in and out of the array. Enterprise arrays tend to be expensive, and through the past decade companies started realizing that sometimes you can build your own for less cost and complexity.

Backup: A backup is an extra copy of your data for those just-in-case moments. On an individual computer, copy your files onto another drive or a cloud. On a corporate network, things are exponentially more complicated. You can use traditional backup software, which puts a small application on every device, PC, and server and then initiates full or incremental backups at predetermined intervals. You can use a backup appliance or a so-called hyperconverged appliance, which might make two copies of everything and save it automatically. You can back up to on-site servers, a tape farm across a wide-area network, or a cloud. You'll need to routinely test those backups to ensure they are valid when the time comes to restore them.

SEE: Data backup policy (Tech Pro Research)

Block storage: You know what a file is, but what's a block? Put simply, a block is raw data, a collection of bytes, a single chunk with a machine-readable address. Unlike a file, it doesn't have a human-readable name, extension, or metadata. Where a file is directly assigned to a local file system on the same device, a block (and many other blocks which collectively make each of your files) is assigned as needed to a file system located externally. The advantage of block storage (such as on a SAN) is that it's faster to move around.

Cloud storage: There is no magical cloud floating in the sky that holds computer data. In the internet meme a boy asks his father, "Daddy, what are clouds made of?," and the father replies, "Linux servers, mostly." Clouds are simply computer networks on which you rent processing or storage space, not unlike renting time on a mainframe back in the 1960s-1970s. A cloud can be private or public. Beware companies advertising "on-premise" clouds—that's a marketing spin on your own network. The big advantage of cloud computing or cloud storage is that someone else manages it for you. The big disadvantage is also that someone else manages it for you. Which do you value more, convenience or control?

SEE: The future of Everything as a Service (free PDF) (ZDNet/TechRepublic special report)

Cold backup: Sometimes you want your backup data instantly available at your fingertips. Sometimes you just need to know there is a backup, but it can safely reside off-site in a data center far away in case it's ever needed. Backups close to you, at the ready, are hot. Backups far from you, waiting in reserve, are cold. Generally you put hot backups on faster, more expensive storage, and cold backups on slower, cheaper storage.

Deduplication: If a company is only you and one other person, there's a chance you each have a few of the same files on your hard drive. If a company is 500, 1,000, 10,000, or 100,000 people strong, then there's a high likelihood of vast storage resources being wasted on redudant files. Deduplication—"de-duping"—is software that finds and reduces this problem by eliminating the extra copies.

Data lake: This is a set of a unstructured data. Every company has some whether they know it or not—the key is how to manage it and what to do with it. A common approach is to wrangle unstructured data into cheap storage or onto a cloud, manage it with the Apache Hadoop file system, and analyze it for useful trends that can solve corporate problems. In that sense a data lake is something you do, not something you buy.

SEE: Big Data 2018: Cloud storage becomes the de facto data lake (ZDNet)

Exabyte: You know about bits, bytes, megabytes, kilobytes, gigabytes, and terabytes. You may know about petabytes (1,024 terabytes) if your organization is massive enough. What comes next? Exabytes (1,024 terabytes), zettabytes (1,024 exabytes), yottabytes (1,024 zettabytes), and brontobytes (1,024 yottabytes). You'll probably never need more, but that's what we thought about double-sided floppy disks.

Fabric: This is the buzzword du jour for a storage network and its components.

Fiber channel: The official trademarked spelling is fibre. Fiber channel is a fiber-optic networking system for high-end storage networks. It became popular in the 1990s because fiber optics are very fast and Ethernet at the time was quite slow. Ethernet has a foothold in modern storage networks because it's cheaper, it's closing the speed gap, and it's easier to manage. But the consensus is that fiber channel will remain for high-performance uses.

Flash array: Until a few years ago, storage arrays contained a controller (server) and gobs of hard drives. Now those hard drives are being replaced partially or completely with solid-state drives using flash memory. The big knock on flash arrays, just like the downside of putting an SSD in your computer, is that the pathways between the drives and the processor are still older types built with spinning disks in mind.

HBA: A host-bus adapter is the storage equivalent of a network card for fiber-channel networks.

Infiniband: This is a less-common alternative to fiber channel or Ethernet in storage networking systems. It's popular in supercomputers and other very high-end applications.

JBOx: Sometimes your application doesn't need a formal storage array—it simply needs a bunch of disks, ergo JBOD. Gradually, the fourth letter in JBOx is becoming 'f' for enterprise flash drives (which ship in the same form factor as conventional drives—we're not talking USB sticks).

LTO: Linear tape-open is the name of industry-standard commodity tape drives. IBM and Oracle notably make non-LTO tape drives for higher-capacity applications.

Mirror: This is a method of automatically making two copies of data every time your application saves something. If you mirror data, then conventional backups and even RAID become less important because you've got copies from the start.

NAS: Network-attacked storage is basically the concept of using a storage array in place of a conventional file server. Most modern arrays from major companies such as Dell EMC, HP, Hitachi, IBM, and NetApp can connect to your network in your choice of NAS or a storage-area network (SAN) configuration. These same companies also sell versions of the hardware tailored for one option or the other. NAS is generally easier to set up and costs less than a SAN; because of this, NAS is more popular for smaller businesses and in departments or branch offices of larger organizations. NAS also tends to store files, not blocks.

NVME: Non-volatile memory express is an emerging standard for using PCI connections rather than the older, slower Advanced Host Controller Interface to connect hard drives to processors and memory. By changing these connections to PCI, processors/memory can make better use of the solid-state (flash) drives' speed advantage vs. conventional hard disks. There's also a fiber channel version in the works.

Object storage: Object storage is a way of saving files and copious metadata as single units and arranging them in a flat address space. The result is a storage system that's slower but more useful for certain kinds of applications, particularly for unstructured data or in markets such as healthcare.

RAID: Redundant arrays of independent disks is a system emerging from the 1980s that ensures data safety by copying or splitting your information across many drives. In a RAID environment, one or more drives could crash, and in theory your data would still be available.

Retention: Data retention is the concept of how long to retain backed-up information. Retention can be determined based on application requirements, company policies, government regulations, legal discovery laws, and/or user preferences.

SAN: Storage-area networks are the king of data for high-end applications. A traditional SAN involves storage arrays connected through a fiber-channel network to relevant servers. Ethernet is beginning to make inroads for SAN use, however most experts believe your SAN should still remain isolated on its own network regardless of the cabling.

Snapshot: What happens if an application tries to save a file, block, or object while that data is currently being backed up? Hint: It's not pretty. A snapshot is a way for a backup program to take a virtual picture of data and then make the official backup from that rather than working with the live data, in order to maintain high availability.

Software-defined: It's all the rage for IT companies to advertise "software-defined" this or that—the term doesn't have any particular meaning different from "virtual" a decade ago. As such, software-defined storage means an information space that's defined, created, and/or destroyed as needed by your application. The space may be on a single drive, multiple drives in an array, or spanning many arrays in a SAN—the idea is that it doesn't matter. But what's new is the term moreso than the concept.

SSD: Currently solid-state drives (SSDs) are built from flash memory, which is non-volatile like a conventional drive but all-electronic like RAM. It's still much slower than RAM, which eventually will take over such drives. SSD storage is becoming very popular, and there are predictions that it'll soon be as cheap as regular drives; when that happens, regular drives may quickly become obsolete. The irony is tape storage for long-term cold backup remains popular.

SEE: Will NVRAM replace SSDs or DRAM? (ZDNet)

Storage management: Dozens of companies obtained venture capital in the 2000s to make storage management software. They correctly predicted that big data (pre-buzzword) would necessitate applications to keep your storage in line and online. But it turned out that all the storage management features you need would be mostly included with your array no matter its configuration as NAS, SAN, object storage, or good old direct-attached server storage; backup specialists such as Veritas got the rest of the storage management pie. There were industry attempts to build storage management standards, but these were successful only in name, not in practice.

Switch: Fiber-channel networks need switches just like Ethernet networks. Brocade was the original fiber-channel switch company, McData followed, Cisco entered the game, and Brocade bought McData. It remains to be seen how long Brocade can hang on against the much larger Cisco.

Tape library: These products are the stalwarts of big-business data backup. Libraries hold thousands of tapes, each containing a bar code and computer chip. Tapes are managed by software and moved around by robotic arms. For all the hype about cloud storage, serial ATA drives, backup appliances, and more, most big corporations are using tape libraries for the grunt work of long-term data storage.

Tiering: This is the concept of using software policies to keep your data on the medium that's most appropriate to its nature. Fresh and important data might be readily available on enterprise-class SSDs in your SAN, in-between data may live on serial ATA drives on an older, slower SAN or NAS (or on your cloud), while long-term retention is relegated to the tape library. This is only an example; companies can tier their data up, down, or sideways in many combinations. Tiering software helps storage managers devise policies and puts them into motion.

Zone: Your big, expensive SAN can be virtually divided into different compartments called zones. Different zones can be assigned based on criteria such as applications, departments, types of data, or security levels. Data stored in one zone can't access another zone.

Share your feedback

If you think other enterprise storage terms are absolutely essential, or if you feel we really don't understand something correctly, then post a comment or send us an email.

Also see

serverdatastorage-istock-625304666-vladimirtimofeev.jpg
Image: Vladimir_Timofeev, Getty Images/iStockphoto

About Evan Koblentz

Evan became a technology reporter during the dot-com boom of the late 1990s. He published a book, "Abacus to smartphone: The evolution of mobile and portable computers" in 2015 and is executive director of Vintage Computer Federation, a 501(c)3 non-p...

Editor's Picks

Free Newsletters, In your Inbox