Storage optimize

What kind of reliability can you expect from flash/solid state storage?


 How reliable will your flash or solid state storage device remain over time? This is the question I'll be exploring here, along with the basics of how these devices actually work. (I use the terms "flash" and "solid state" interchangeably.)

Your flash-based device will eventually die. And, it won't die a slow, horrible death like some hard drives. In most cases, the device will work one day and then, all of a sudden, it won't work anymore. The most common cause of flash death lies in the Achilles' heel of flash-based storage -- limited write cycles. That is, each cell or block of a flash-based storage device can be written to only so many times before "wearing out." After enough erase and write operations, the insulating oxide layer around the cell breaks down to a point after which the cell is unusable.

Wear leveling

Wear leveling is designed to spread erase and write operations out over the entire flash device rather than focusing on one specific area. For example, suppose you have a flash device on which you've stored a few dozen documents that you change on a regular basis. Without wear leveling, each time you modified your document, the data would, theoretically, be written to the exact same spot on your flash device. This write operation would, before the write actually happens, necessitate that the storage block be "flashed" or erased before the new data can be written to the same spot. This constant barrage on the same sectors will eventually lead to the failure of your flash device.

Wear leveling is a process designed to "spread the wealth." That is, instead of constantly writing information to the same locations over and over, erasures and writes are distributed across all blocks of the device, thus making sure that no single cell is constantly assaulted.

Most flash-based storage devices are rated for anywhere from 10,000 to 1 million write cycles although I've seen write cycles as high as 2 million, too. With wear leveling, your device will last a lot longer.

Sparing

What happens when a cell does eventually fall into the bit bucket and become unusable? Some flash-based storage devices have spare sectors that can be brought into play to replace dead ones, thus extending the useful life of your storage device. When a cell fails, the data is written to one of the spare cells.

Error correction

Flash-based storage uses ECC (Error Correcting Code) to prevent single-bit errors from laying waste to your data. There's not too much more to say about this data protection feature!

No moving parts

I'm not focusing only on what methods flash designers have implemented in order to protect flash storage. There is one feature inherent in flash-based storage that lends itself to reliability: the lack of moving parts. Moving parts create friction, increase heat output, and, in general, introduce an element of mechanical instability into a system.

  • No worries about platter damage.
  • Shock and vibration issues are non-existent.
  • Able to withstand a greater temperature range since the moving parts aren't subject to expansion and c contraction. This, of course, doesn't mean that you should put your flash drive into a fire, but it can probably stand the heat a little more.

One of the people I work with recently put his flash drive through the dryer... literally. The plastic is somewhat melted, but the device still works! That's resilience!

TechRepublic's free Storage NetNote newsletter is designed to help you manage the critical data in your enterprise. Automatically sign up today!

About

Since 1994, Scott Lowe has been providing technology solutions to a variety of organizations. After spending 10 years in multiple CIO roles, Scott is now an independent consultant, blogger, author, owner of The 1610 Group, and a Senior IT Executive w...

14 comments
ozhawk50
ozhawk50

At times we can forget the purpose of this type of storage, a temporary method, eg: transfer data between computers or users. I doubt they were ever designed to replace HDD's as a storage medium. I tell my users, use it for temp storage, not permanent. They need to be looked as just like the old floppy disk, a good short term storage medium.

micromediaabuja
micromediaabuja

Flash based storage seems to provide cheap solution to mobile storage independent of computer hardware drive bay. I think effort must be encouraged to increase durabilty and capacity of flash drives. They are becoming increasingly relevant to common users and even to the expert users. Not only to transfer data here and there, but to also serve as long-time personal mobile storage media. In conclusion, we need fash storage more than ever imagined!

tim
tim

Thanks for the good article on reliability. I am still curious about the longevity of flash memory, particularly just sitting on the shelf. I have read that it can be 10 years, but I haven't found any detail that supports that. I have clients that want storage in terms of decades for things like video productions and satellite data (most use optical, a couple use tape). They are interested in flash, but the lack of data on longevity scares them away.

Flash00
Flash00

Flash memory cell failure shows up as the memory transistor threshold gradually falling, until it gets to the point where it is not possible to determine if a 1 or a 0 was stored. This takes on the order of a million erase/write cycles. (See the Wikipedia entry for Flash memory.) Your article is so misleading as to be completely wrong. Flash memory failure will manifest itself as increasing read errors, not the sudden and catastrophic failure which is characteristic of hard disk drives and results in total loss of everything stored on the drive.

david
david

Unfortunately Scott doesn't address another failure point which is the instability of the devices when inserted and removed from the USB bus. There isn't much talk about this issue, but being one who has been involved in the design of these devices, users should be aware that it is a concern especially when using them for backup of data which more people are doing. While they are much more stable on insertion/extraction than early units, you often get what you pay for with the cheaper designs taking shortcuts protecting the data on power cycling. As for me, I don't consider them acceptable for data backup nor will I recommend them to my clients. Keep in mind, I am talking about reliability for use as a data backup device not data transfer which they are outstanding for.

gordonmcke
gordonmcke

Scott: You should add a distinguishing comment that DDR solid-state disk, such as is used in SAN cache or storage is not subject to the reliability issues that flash memory is. Gordon McKemie Ohio Valley Storage Consultants Anchorage, KY

ton
ton

But alas flash is not always the solution, I use an external memory to transport data between my many working places so that I always have the same data available everywhere. Simple, but it works even if there is no Internet. This data includes my MS Office2003 outlook .pst data file. Recently 4Gbit flash drives became affordable so I used that instead of a small hard disk. Now for some reason that I do not understand outlook becomes unstable when using a .pst file on a flash drive. So back the the old fashioned external hard disk... Ton de Liefde

James McP
James McP

There are two kinds of failures that flash can experience: the data cells or the controller. The controller is what takes the read/write requests and performs the low-level functions of accessing the data and performing the wear leveling. Data cell failure can, like a hard drive, result in increasing numbers of dead sectors that reduce total size but don't necessarily stop the drive from functioning or even causing data loss (yay ECC funcitonality). Cell failure is increased as the wear leveling is applied to an ever decreasing pool of cells. In contrast, controller failure causes a total crash. Life of a flash is more than likely limited to the controller than the cells. We can prove it rather quickly. Take a 1GB flash stick used as Ready Boost, or as the swap drive of a laptop. For simplicity sake we'll say the cells are 32kbyte sectors, giving us roughly 30,000 cells. Let's max out the USB 2.0 bandwidth at 60MB/s. The stick will completely cycle its cells roughly every 16 seconds. Let's make it a dud where the first cell fails after 100,000 writes and an additional cell fails every 100 writes after that. Note: The ECC/wear leveling kicks in during reads, writing the data to a new cell to ensure that repeat reads don't cause the cells' charge to degrade, so reads aren't particularly easier than writes on flash life. It will take 3 years of continuous operation to hit that first cell failure and reducing our 1GB unit by 32kB. (100,000 writes x 30,000 cells x 32kB / 60GBs) The unit will then begin dying, shedding cells in 32kB increments over the next week until it is completely dead. In reality, no flash device exists yet that works at full USB 2.0 speeds. The fastest I've heard of is around 30MB/s. So *IF* ReadyBoost could fully max out that 1GB unit it would take 6 years to reach the first failure and close to 2 weeks to reach total collapse. In reality you won't be at 100% utilization for 6 years straight, even on a server. Therefore the cells themselves should be good for more than a decade even with TTF of only 0.1 million writes. Given that, controller failure is the real problem. There's no work around other than providing plenty of airflow (heat is the enemy of circuitry), avoiding putting flash drives on the same controller as power-hungry USB-powered devices that can cause fluctuations (I'm looking at you, USB coffee warmer!) and having duplicates of key data. What, you've never heard of redundancy?

BALTHOR
BALTHOR

DIGITAL I have in my lab an audio tape recorder.I record a beep tone five seconds into the tape.The tape transport has a rate of travel and the recording electronics has a method of imparting my beep to the magnetic tape.That beep exists in one place only and that's five seconds into the tape.Now I would like to record on to a magnetic medium without any moving parts what so ever!Well maybe some movement is needed here.If I could magnetically record pathways and caches I might have it.I could direct the signal to a chip leg with a magnetic voltage path but I would have to prepare the magnetic surface to receive my data.The magnetic material would have instructions imbedded in it and the chip would receive a clock pulse.The clock pulse is like the tape transport and the cache would be the locations for the primal data bits.I could glue some magnetic material to a substrate and then pass the assembledge past a recording head.The bit locations would be like little boxes for magnetic bits to exist or not exist.The cache would contain a file that tells the bits how to receive information.The chip's input takes all information and records it into the bit boxes,the boxes advance because the chip is being clocked.Bits only can exist in the boxes that the file has created.If I raised to frequency the file would produce even more bit boxes.The legs on the chip could go to control various devices.There is no other material in this chip other than the magnetic material.The file does it all.Balthor

ITCowboy
ITCowboy

ON your remark that flash drives show up as read write errors, I would have to disagree. In my experiance, when flash drives fail, it is exactly as the article implies. Yes I may get a read or write error, but that is it, the data is useless, gone, irretrievable.. all of it. I have seen this every time with a failing flash drive. No warning, kaput, the end. A Hard Drive on the other hand, usually give me warnings, that if I listen to I can see. Somtimes errors, sometimes noise, sometimes excessive heats. Other times old fashioned Boot Failure. Then I can prepare the drive, retrieve data, and await it's destruction. Even when a drive suddenly dies, data is sometimes still retrievable. To say that the article "so misleading as to be completely wrong" is a little harsh. Although you may not have had the problems he metioned, I for one have seen exactly that, and believe it to be the general truth due to the numerous times it has happened.

James McP
James McP

Volatile memory (e.g. DDR) is prone to cell failures. Hence motherboards performing memory tests at each boot and error correcting circuitry (ECC) to compensate on the fly. Everything dies. Random electron migration, tin whiskers, freak neutrino collision, whatever, everything will fail. Buy quality components and the operating lifespan is likely to exceed the utility life for either volatile or non-volatile memories. The advantage DDR SANs have over flash is speed. Volatile memory tech (DDR & other RAMs) is so much faster than non-volatile flash and is generally cheaper. The counter is that a flash SAN will survive an extended power failure while a DDR SAN is only as good as the UPS protection (which adds cost).

James McP
James McP

Probably due to the bandwidth of the device or the sleep cycle on the USB chain. Given that you're using a 4GB drive, you probably have a pretty beefy PST file. I've seen Outlook act funny with large PSTs kept on a file server. Consider either breaking it up into multiple PSTs that don't take as much time to get into/out of RAM or getting one of the high speed USB 2.0 flash drives. It could be when you went to a 4GB drive that you wound up with a slower device. And don't trust the market data too much. Hit a hardware review site and see which models they show had high read/write speeds.

James McP
James McP

You suffered a controller failure, not a memory cell failure. Imagine that Flash memory cells are SCSI disks integrated to a SCSI RAID controller. If the SCSI controller goes haywire you can lose everything in the volume even though the disks are completely fine. Controller failures are harder to diagnose but if you've had failures with multiple flash devices of different make I would suspect your motherboard or USB hub is the issue. First make sure you always, always, use the "undock" tool in the OS to disconnect the flash drive before you unplug it. You could essentially be shorting the device out when you remove it. Consider getting a good hub with an independent power supply as it could be an issue with the quality of power over the USB ports causing a "brown out" of the controller.