Data Centers optimize

Storage virtualization: Pros and cons

Scott Reeves takes a look at the advantages and disadvantages of storage virtualization.

virtualization.in.the.data.center.jpg
Storage virtualization is not new, and there are several different approaches to it. One virtualization method might mean clumping all your storage into a big virtual storage pool and provisioning these storage pools to a server that then handles the provisioning of storage to other servers. No matter the method, the main difference now is the scale.

Host-based

One type of virtualization is host-based. This means using software on the hosts such as Logical Volume Manager (LVM). LVM is a way of mapping the physical disks to logical pools of storage. Host-based virtualization can use software such as LVM to create a pool of disks. The pool of disks can then be allocated to servers. It does not matter where the disk is provisioned from; it could be a SAN, or a NAS, or direct attached storage.

Storage devices have had abstraction of the physical to the logical for many years. Storage systems can usually be configured in RAID 0, 1, 10 or any other RAID configuration you may require. Once the configuration is completed, the creation of logical volumes can be carried out. When the logical volumes are assigned to a server, the server “sees” them as a disk. In reality, the data on the disk could be spread over several physical disks. However, the actual logical volume is presented to a host as one disk, and appears that way to the host. A further layer of virtualization can be added at the host by using host-based virtualization such as LVM.

Appliance-based

What about if you want a disk pool that spans two or more (maybe many more) storage arrays? A further layer of storage virtualization allows this. This is the appliance type model of storage virtualization. In this model, a node controls the provisioning of storage from the storage devices to the servers. This node is usually two or more servers. One server actually does the I/O type work, whilst the others act as failovers.

The idea of appliance storage virtualization is similar to a volume group in LVM. The difference is the scale; one or more storage arrays are clumped together to create storage pools. Depending on the vendor, you can then create virtual disks (IBM call them vdisks, for example). The virtual disks can then be allocated to the servers (which still see them as a disk) and host based virtualization can be used on the server side.

What advantages does this have? For one thing, you can have more than one type of storage array attached to your SAN, but it does not look any different to the servers. For another, you can often do migrations swiftly. Virtualization on this scale usually comes with software mirroring, making migration of data fast and with little or no downtime to the applications using the storage.

Another aspect of using this type of storage virtualization is manageability. Instead of someone having to manage multiple arrays, it can all be done on one console, possibly by one person. This is an advantage if there are many mid-range to enterprise level storage devices.

There are some disadvantages however. Whilst having a single console to manage all your storage sounds good in theory, it does have a significant disadvantage when it comes to upgrading the software on the node controlling the storage. Whilst software upgrades normally proceed smoothly, and failovers occur with no fuss, there is still a possibility that an upgrade fails, and the failover node hangs for some reason. This could lead to problems with applications using the provisioned storage.

Another problem is that the storage virtualization software may not scale in some areas, such as the number of virtual disks you can allocate. There are hard limits on the number of virtual disks you can allocate, which may not be an issue early on, but as the environment grows and more servers are added, you can start to bump into these limits.

Despite the disadvantages outlined above, there are still advantages to using appliance storage virtualization. It may be that the nodes that provision storage can be split up. This may negatively affect manageability, but it does make for a more robust environment, especially around software upgrade time on the nodes.

Storage has been virtualized for some time now. In particular, the technology around RAID and host based storage virtualization is quite mature. Whilst appliance virtualization is relatively new, it is making some inroads and can ease the management of storage.

About

Scott Reeves has worked for Hewlett Packard on HP-UX servers and SANs, and has worked in similar areas in the past at IBM. Currently he works as an independent IT consultant, specializing in Wi-Fi networks and SANs.

1 comments
eclypse
eclypse

I would definitely be sad if I had to give up my SVC and/or go back to non-virtualized storage. While I imagine that this is not as much of a problem if you have IBM or HP or similar servers, the main problem we used to run into was that when we used internal disk, by the time one would fail, you couldn't buy an exact replacement for it. You would end up having to waste half (or more of) the space on the replacement disk.


When updating/upgrading the SVC, there is still the "pucker factor" just because things can always go wrong. However, it does only update one node at-a-time and that node has to come back up and run for 30 minutes before it will update the second node. Now, if the first node dies or breaks or whatever, then yes, you are screwed if you only have a two-node cluster. However, if the failover doesn't work properly, then you probably did it wrong to begin with and never tested failover.