Storage

Calculate IOPS in a storage array

What drives storage performance? Is it the iSCSI/Fiber Channel choice? The answer might surprise you. Scott Lowe provides insight into IOPS.

When it comes to measuring a storage system's overall performance, Input/Output Operations Per Second (IOPS) is still the most common metric in use. There are a number of factors that go into calculating the IOPS capability of an individual storage system.

In this article, I provide introductory information that goes into calculations that will help you figure out what your system can do. Specifically, I explain how individual storage components affect overall IOPS capability. I do not go into seriously convoluted mathematical formulas, but I do provide you with practical guidance and some formulas that might help you in your planning. Here are three notes to keep in mind when reading the article:

  • Published IOPS calculations aren't the end-all be-all of storage characteristics. Vendors often measure IOPS under only the best conditions, so it's up to you to verify the information and make sure the solution meets the needs of your environment.
  • IOPS calculations vary wildly based on the kind of workload being handled. In general, there are three performance categories related to IOPS: random performance, sequential performance, and a combination of the two, which is measured when you assess random and sequential performance at the same time.
  • The information presented here is intended to be very general and focuses primarily on random workloads.

IOPS calculations

Every disk in your storage system has a maximum theoretical IOPS value that is based on a formula. Disk performance -- and IOPS -- is based on three key factors:

  • Rotational speed (aka spindle speed). Measured in revolutions per minute (RPM), most disks you'll consider for enterprise storage rotate at speeds of 7,200, 10,000 or 15,000 RPM with the latter two being the most common. A higher rotational speed is associated with a higher performing disk. This value is not used directly in calculations, but it is highly important. The other three values depend heavily on the rotational speed, so I've included it for completeness.
  • Average latency. The time it takes for the sector of the disk being accessed to rotate into position under a read/write head.
  • Average seek time. The time (in ms) it takes for the hard drive's read/write head to position itself over the track being read or written. There are both read and write seek times; take the average of the two values.

To calculate the IOPS range, use this formula: Average IOPS: Divide 1 by the sum of the average latency in ms and the average seek time in ms (1 / (average latency in ms + average seek time in ms).

Sample drive:

  • Model: Western Digital VelociRaptor 2.5" SATA hard drive
  • Rotational speed: 10,000 RPM
  • Average latency: 3 ms (0.003 seconds)
  • Average seek time: 4.2 (r)/4.7 (w) = 4.45 ms (0.0045 seconds)
  • Calculated IOPS for this disk: 1/(0.003 + 0.0045) = about 133 IOPS

So, this sample drive can support about 133 IOPS. Compare this to the chart below, and you'll see that the value of 133 falls within the observed real-world performance exhibited by 10K RPM drives.

However, rather than working through a formula for your individual disks, there are a number of resources available that outline average observed IOPS values for a variety of different kinds of disks. For ease of calculation, use these values unless you think your own disks will vary greatly for some reason.

Below I list some of the values I've seen and used in my own environment for rough planning purposes. As you can see, the values for each kind of drive don't radically change from source to source.

Sources:

Note: The drive type doesn't enter into the equation at all. Sure, SAS disks will perform better than most SATA disks, but that's only because SAS disks are generally used for enterprise applications due to their often higher reliability as proven through their mean time between failure (MTBF) values. If a vendor decided to release a 15K RPM SATA disk with low latency and seek time values, it would have a high IOPS value, too.

Multidisk arrays

Enterprises don't install a single disk at a time, so the above calculations are pretty meaningless unless they can be translated to multidisk sets. Fortunately, it's easy to translate raw IOPS values from single disk to multiple disk implementations; it's a simple multiplication operation. For example, if you have ten 15K RPM disks, each with 175 IOPS capability, your disk system has 1,750 IOPS worth of performance capacity. But this is only if you opted for a RAID-0 or just a bunch of disks (JBOD) implementation. In the real world, RAID 0 is rarely used because the loss of a single disk in the array would result in the loss of all data in the array.

Let's explore what happens when you start looking at other RAID levels.

The IOPS RAID penalty

Perhaps the most important IOPS calculation component to understand lies in the realm of the write penalty associated with a number of RAID configurations. With the exception of RAID 0, which is simply an array of disks strung together to create a larger storage pool, RAID configurations rely on the fact that write operations actually result in multiple writes to the array. This characteristic is why different RAID configurations are suitable for different tasks.

For example, for each random write request, RAID 5 requires many disk operations, which has a significant impact on raw IOPS calculations. For general purposes, accept that RAID 5 writes require 4 IOPS per write operation. RAID 6's higher protection double fault tolerance is even worse in this regard, resulting in an "IO penalty" of 6 operations; in other words, plan on 6 IOPS for each random write operation. For read operations under RAID 5 and RAID 6, an IOPS is an IOPS; there is no negative performance or IOPS impact with read operations. Also, be aware that RAID 1 imposes a 2 to 1 IO penalty.

The chart below summarizes the read and write RAID penalties for the most common RAID levels.

Parity-based RAID systems also introduce other additional processing that result from the need to calculate parity information. The more parity protection you add to a system, the more processing overhead you incur. As you might expect, the overall imposed penalty is very dependent on the balance between read and write workloads.

A good starting point formula is below. This formula does not use the array IOPS value; it uses a workload IOPS value that you would derive on your own or by using some kind of calculation tool, such as the Exchange Server calculator.

(Total Workload IOPS * Percentage of workload that is read operations) + (Total Workload IOPS * Percentage of workload that is read operations * RAID IO Penalty)

Source: http://www.yellow-bricks.com/2009/12/23/iops/

As an example, let's assume the following:

  • Total IOPS need: 250 IOPS
  • Read workload: 50%
  • Write workload: 50%
  • RAID level: 6 (IO penalty of 6)

Result: You would need an array that could support 875 IOPS to support a 250 IOPS RAID 6-based workload that is 50% writes.

This could be an unpleasant surprise for some organizations, as it indicates that the number of disks might be more important than the size (i.e., you'd need twelve 7,200 RPM, seven 10K RPM, or five 15K RPM disks to support this IOPS need).

The transport choice

It's also important to understand what is not included in the raw numbers: the transport choice -- iSCSI or Fibre Channel. While the transport choice is an important consideration for many organizations, it doesn't directly impact the IOPS calculations. (None of the formulas consider the transport being used.)

If you want more proof that the iSCSI/Fibre Channel choice doesn't necessarily directly impact your IOPS calculations, read this article on NetApp's site.

The transport choice is an important one, but it's not the primary choice that many would make it out to be. For larger organizations that have significant transport needs (i.e., between the servers and the storage), Fibre Channel is a good choice, but this choice does not drive the IOPS wagon.

Summary

In order to intricately understand your IOPS needs, you need to know a whole lot, including specific disk technicalities, your workload breakdown as a function of read vs. write, and the RAID level you intend to use. Once you implement your solution, you can use tools that are tailor-made to IOPS analysis, such as Iometer, to get specific, real-time performance values. This assumes that you have a solution in place that you can measure.

If you're still in the planning stages or a deep level of analysis simply isn't necessary for your needs, the generalities presented in this article will help you figure out your needs.

Want to keep up with Scott Lowe's posts on TechRepublic?

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

20 comments
merdos
merdos

Hello, FYI - you know that the formula that came from yellow bricks has read iops twice and no write iops. it is the correct formula on the yellow bricks web site. Mark

sentral
sentral

Hi there, We are looking for a high IOPS device; looking at a Hybrid device, outlined in the article with options for HBA???s and 10Gbs switch-fabric. Has anyone tested Open-E SAN Storage and have any IOPS results: http://www.sentralsystems.com/open-e-dss-storage-servers/ IBM Storwize V7000 Unified Storage does provide higher IOPS and I would like to compare the performance of Open-E system with the IBM. Ideally, performance of 100k IOPS or higher will be good.

tdpl
tdpl

There are also plenty of other points of contention within the host->array stack which makes this whole subject more complicated, however, when comparing different drive speeds, calculating a worst case scenario gives a good indication of how differing drives will perform. internet marketing blog

JohnRMartin
JohnRMartin

Disclosure - NetApp Employee - It's a personal pet peeve of mine, but most of the latency rules of thumb are at best only marginally informative, and at worst, downright misleading. I'm not having a go at the author or his sources, but this kind of stuff has been so often repeated that it has become a self supporting "truth". These "one figure" IOPS figures ignore the effect of the advanced queuing and elevator algorithms that have been present in disk drives for the last 10 odd years, which means a SATA drive has about 10 IOPS at a 13ms latency and 130 IOPS at 100ms latency. I covered that and the impact of various raid and caching configurations in one of my blog posts here http://storagewithoutborders.com/2010/07/19/data-storage-for-vdi-part-2-disk-latencies/ Also it should be noted that in enterprise arrays, because of SATA's unchangable 512byte block sizes, the usual checksum mechanisms used on FC disks wont work (which use 520 byte blocks which includes an 8 byte checksum). These checksum mechanisms (e.g. slip mask on EMC, Zoned Checksum on NetApp) all impose some extra level of IO which impacts both reads and writes. Rules of thumb are all well and good, but they're no substitute for consultation between the people who truly understands the performance characteristics of your workload, and the array architecture you are going to implement Regards John Martin

lloyd.havekost
lloyd.havekost

Shouldn't formula read (total workload * % Read workload) + (total workload * % Write workload * Raid Write Penalty). Formula above doesn't reflect % write workload. good information

oldbaritone
oldbaritone

The IOPS formula also ignores the benefits of OS Optimization. There's no reason why the OS can't handle I/O operations in "disk order" or "availability order" instead of "request order." With a little planning and foresight, the average request time can be improved substantially from the "average" performance in the article.

jekidd
jekidd

There is no mention of the size of the IOP in this equation. A 2k IOP is a lot different than a 32k, 128k, 256k IOP. 2000 2k IOPS (4000k of data)is a lot less data than 2000 256k IOPS (51200k of data).

The Igneous Group
The Igneous Group

This is great information - but there are VERY few tools that help you understand how many IOPs your system is actually using and if you are hitting that "wall". Vendor of the hardware always give you IOPs numbers and vendors of software almost NEVER do. Even high end enterprise arrays don't always provide a direct "iop-o-meter" especially when the lun is virtualized across many drives (and shared) this is helpful but not the holy grail of performance.

micheldufrenoy
micheldufrenoy

You have fallen victim to the common misunderstanding about MBTF, which is perpetuated by the industry. The error is partly due to the fact that the units are incorrectly stated. The units of MBTF are something like "unit/hours". MTBF is a measure of how many units will fail in a given amount of time. Thus, a MBTF of 100,000 says, "out of 100,000 units, one unit will fail per hour." This is a useful measure for planning the number of replacement parts necessary (in rather large environments perhaps). Some examples, a battery which might only last a couple hours, might still have an MBTF of 100,000. Of 100,000 batteries, one battery will "fail". All of the (other) batteries still will only last a couple hours. Another example, men, aged 30, have a MBTF of 1000 (roughly). Out of 1000 men, one man will die. However, to the best of my knowledge, no man will live to be 1000 years old. As stated, MBTF is a useful measure for stocking purposes, or in this case with men, women need to keep approximately 1000 men "on hand."

micheldufrenoy
micheldufrenoy

This is yet another article written for the database administrator which claims to have universal importance to all server administrators. In our environment, random I/O is not the primary concern. Working with photographic images ranging in size from hundreds of megabytes to a few gigabytes, sequential read/write performance is key. For sequential read/write performance, server system memory and network transport (we use bonded gigabit Ethernet at server and workstation) become highly critical to serving files to workstations. Indeed, workstation disk performance can be a limiting factor, as the server can push files faster over the network than the workstation can save them. Other factors are clearly important as well. I mention these as they are ignored in the article.

thattommyhall
thattommyhall

I think it is worth pointing out that you calculate the latency time from the RPM value 10000/60 = 166.6R 1/166.6R = 0.006 so 1/2 a turn = 0.003s As the seeking by the head and spinning of the disk occur concurrently I would just take the slower of the too. I remember reading Jeff Bonwick at Sun saying you can get the seek to keep pace with the platter spinning but its the linear velocity of the outside edge of 3.5" HDs that stops you spinning much faster (prompting a move to 2.5" enterprise disks) http://blogs.smugmug.com/don/2007/10/08/hdd-iops-limiting-factor-seek-or-rpm/ (first comment)

bogd.no.spam
bogd.no.spam

@micheldufrenoyI know I'm resurrecting a very old discussion, but I really don't want people who come across this article to be misled by the comments. The MTBF is just what the name says - a Mean Time Between Failures (that is to say, an average of the time it takes the units to fail). And just like time, it is measured in hours, not in number of units.

Yes, you can use it to predict how many units you need to keep in stock, but that doesn't mean it's measured in "units". 

Let's take your analogy further - you say that 30-year old men have an MTBF of 1000 hours. That only means that if you take a large number of men and watch them, they will (on average) die 1000 hours after reaching 30 years of age. In no way does this mean that every woman must keep 1000 men "on hand" (otherwise, our species would have been extinct a long time ago....

Scott Lowe
Scott Lowe

Would it have been more accurate if I had described it as the time until an unrecoverable read error occurs? That's what I was going for... Scott

Scott Lowe
Scott Lowe

In order to keep the article sane, I did choose to focus on random workloads since that's what most of the IO is in our data center. We don't do massive amounts of work with sequential workloads. I didn't intend for the work to be considered as a general "all purpose" piece targeted at every administrator out there; I apologize if that wasn't clearly written. Scott

zackers
zackers

While you're right that the type of IOs from the application level affect overall storage performance, what goes on at the actual storage device may not resemble the application's view. Writes on RAID5, for instance, require reads and calculation of parity as well as the actual writes. What you think is a sequential write at the application level is a whole lot of IOs at the storage device that involve factors such as extra rotational latency, etc. Virtual storage systems do all kinds of slicing and dicing of what the application thinks is a sequential write. The reason for this is that storage subsystems have to worry about space management, backup, overall performance, etc., things that the application programmer seldom worries about. Most enterprise storage subsystems can be designed so that the data paths are not the limiting factor. While historically it has happened, it's rare that the data transfer interface to a disk drive cannot keep up with the fastest media transfer speed of the disk. It's then a question of how much saturation of the common data paths further upstream you want to tolerate versus how much you want to spend. In general, it's been harder to speed up the media transfer speeds of disk versus their data interface. The transition to vertical disk recording, for example, took about a decade of R&D (roughly 1995 to 2005). In that time faster parallel SCSI and then fibre channel interfaces went through several iterations and easily kept up.

zackers
zackers

You seem to have made the implicit assumption that while the head is seeking the proper sector is always rotating closer to the head. The truth is that about half the time the sector is rotating *away* from the head (i.e., the head comes on track but just misses the sector). The two latencies are additive, but there's a random component to just how additive. And even if the disk controller has perfect knowledge of where the head and sector are at all times, it can't do away with all rotational latency even though it can optimize the order in which IOs are performed to minimize overall latency. You first have to move the head to the proper track and then you still have to wait for the proper sector to come under the head. This is what makes modeling disk IOs so difficult. Mr. Bonwick is correct about seek speeds versus rotational speeds. It comes down to moving a relatively light head stack versus spinning a relatively heavy platter stack.

david
david

This article is profoundly simplistic, and does not take into consideration numerous factors that include 1) Degraded performance - How does a SATA vs a SAS array perform when a bad block is encountered or when there is degradation due to a failed disk 2) Data corruption - SAS disks typically have 10X or greater number of ECC bits then SATA drives. With multi TB arrays, you pick up ECC errors quite frequently 3) Drive controllers matter. More intelligent controllers with larger cache will generate fewer I/Os to disk as they coalesce I/O requests. Furthermore, the added intelligence that the Cache control mode page (08h) provided in disks that use the SCSI protocol allow for tuning that one can not do with SATA disks. As such, for almost any given load, one can tune SAS disks to do fewer I/Os. This paper considers I/Os to be absolute and not affected by the drive technology. That is simply not the case. Finally IOMETER does NOT analyze I/Os that are actually performed by the disk drive. While it is a nice tool, it can not be used to measure physical disk I/Os. It only measures I/O requests sent by high level read/write requests. It is also incapable of measuring actual I/Os performed by the disk drives behind a RAID controller.

thattommyhall
thattommyhall

You are right! At the point the seek is done, you need to wait for the platter to spin into place, on average half a turn. I got all fired up, convinced I was right, damn hidden assumptions.

zackers
zackers

You're right to point out that reassignments take a huge amount of time, but they just don't happen that frequently to affect overall performance. The same goes for ECC errors. A lot of small ECC errors can even be corrected on the fly by the read logic without any extra action by the drive's microprocessor. Smart controllers with cache do have an impact. In particular, faster controller microprocessors with faster data paths are real advantages. However, most storage systems have layers of cache (disk drive, controller, system, etc.) and there are diminishing returns to be had by cache. And when manufacturers publish their own benchmarks, they often arrange the tests so as much of the IOs can occur out of cache as possible. Thus while cache does help in the real world, it often contributes to manufacturers overselling the performance of their storage subsystems.

Editor's Picks