This article is also available as a PDF download.
Oftentimes, storage isn't given enough attention in system architecture, but it can make or break the service level agreement (SLA) for your application response times. Understanding how to build a cost-effective, high-performance storage system can save you money not only in the storage subsystem, but in the rest of the system as well.
Storage is a huge topic, but this article will give you a high-level look at how it all fits together.
DAS, SAN, and NAS storage subsystems
Direct attached storage (DAS), storage area network (SAN), and network attached storage (NAS) are the three basic types of storage. DAS is the basic building block in a storage system, and it can be employed directly or indirectly when used inside SAN and NAS systems. NAS is the highest layer of storage and can be built on top of a SAN or DAS storage system. SAN is somewhere between a DAS and a NAS.
DAS — direct attached storage
DAS is the most basic storage subsystem that provides block-level storage, and it's the building block for SAN and NAS. The performance of a SAN or NAS is ultimately dictated by the performance of the underlying DAS, and DAS will always offer the highest performance levels because it's directly connected to the host computer's storage interface. DAS is limited to a particular host and can't be used by any other computer unless it's presented to other computers over a specialized network called a SAN or a data network as a NAS server.
Storage devices used to build a DAS storage subsystem
SCSI — Small computer system interface is one of the oldest forms of storage interfaces traditionally used in server or workstation class computers. It's been through many revisions, from SCSI-1 all the way up to Ultra-320 SCSI, which is the modern SCSI interface. (There is an Ultra-640 standard, but that isn't common.) The 320 and 640 numbers represent MB/s, megabytes per second. SCSI-1 started out 5 MB/s. SCSI is still used in modern servers, but the interface is starting to lose market share to SAS. Most recent versions of SCSI can handle up to 15 hard drives.
While the cable sharing mechanism is relatively efficient, there is a maximum theoretical cap of 320 MB/s, but that limit is reduced further by SCSI overhead. It's theoretically possible that 15 modern SCSI hard drives could have an aggregate throughput of 1350 MB/s, so they would be forced to share a 320 MB/s interface. But in the vast majority of applications, where there will inevitably be some random I/O in the hard drives, the mechanical latency of the hard drives seeking data means it's unlikely that an Ultra-320 interface will be fully saturated.
PATA — Parallel advanced technology attachment (originally called ATA and sometimes known as IDE or ATAPI) was the most dominant desktop computer storage interface from the late 1980s until recently, when the SATA interface took over. PATA hard drives are still being utilized today, especially in external hard drive boxes, but they're becoming rare. Some cheaper high-end server storage devices have also used PATA. Like SCSI, PATA has also gone through many revisions. The most recent version of PATA is UDMA/133 which supports a throughput of 133 MB/s.
Although PATA supports two devices per connector in a master/slave configuration, the performance penalty of sharing a PATA port is severe and not recommended if performance is important to the user. The 40-pin connector and cabling is also extremely wide, which is difficult to use in a high-density environment and tends to block proper airflow. The size of the connector also presents problems for smaller 2.5" hard drives, which require a special shrunken connector.
SATA — Serial advanced technology attachment is the official successor to PATA. So far, there have been two basic versions of SATA, with SATA-150 and SATA-300. The numbers 150 and 300 represent the number of MB/s that the interfaces support. SATA doesn't have any performance problems due to cable/port sharing, but that's because it doesn't permit sharing at all. One SATA port permits one device to connect to it. The downside is that it's much more expensive to buy an eight-port SATA controller than an Ultra-320 SCSI controller that allows 15 devices to connect to it. The upside is that each drive gets a theoretical 300 MB/s. Current SATA hard drives, however, barely get 80 MB/s, so the bus interface is a bit of overkill for now.
SATA uses a small seven-pin connector and a thin cable, which is more conducive to denser installations and airflow. That's important, especially inside a storage array with 15 hard drives, because you'll need one port and one cable for every drive, whereas SCSI lets you hook up one or two ports to the backplane that the drives attach to. SATA drives are used in smaller servers and some less expensive storage arrays.
SAS — Serial attached SCSI is the latest storage interface that's gaining dominance in the server and storage market. SAS can be seen as a merged SCSI and SATA interface, since it still uses SCSI commands yet it is pin-compatible with SATA. That means you can connect SAS hard drives or SATA hard drives or CD/DVD ROM or burner drives. SAS has a signaling rate of 185, 374, 750, and eventually, 1,500 MB/s. But storage controller technology has historically been rated by actual data throughput, which is lower than the signaling rate. To make these numbers comparable to the numbers listed above, the actual data rates are 150, 300, 600, and eventually, 1,200 MB/s. Note how the two lower data rates match up with SATA.
SAS connectors are keyed such that SATA devices can connect to SAS but SAS devices can't connect to SATA ports. The ports and cabling look similar, but SAS cables can be 8 meters long, whereas SATA cabling is limited to 1 meter. The longer cabling support is due to higher signal voltages, but the voltage is dropped to SATA levels whenever a SATA device is connected.
SAS is designed for the high-end server and storage market, whereas SATA is mainly intended for personal computers. Unlike SATA, SAS can be connected to multiple hard drives through expanders, but the protocol used to share a SAS port has lower overhead than SCSI. Coupled with the fact that the ports are faster to begin with, SAS offers the best of SCSI and SATA in addition to superior performance.
FC — Fibre channel is both a direct connect storage interface used on hard drives and a SAN technology. FC offers speeds of 100, 200, and 400 MB/s. Native FC interface hard drives are found in very high-end storage arrays used in SAN and NAS appliances, although the technology may ultimately give way to SAS.
Flash — Flash memory isn't a storage interface, but it is used for very high-end storage applications because it doesn't have the mechanical latency issues of hard drives. Flash memory can be packaged into the shape of a hard drive with any of the above interfaces so that it can be used in a storage array. The benefit of flash memory is that it can offer more than 100 times the read IOPS (input output per second) and 10 times the write IOPS performance of hard drives, which is extremely valuable to database applications.
The downside of flash memory is that it's very expensive per gigabyte (cost proportional to the performance advantage) and it has a limited number of writes and rewrites. Flash memory will begin to fail anywhere between 10,000 and 1,000,000 writes. To deal with this limitation, flash devices use a mechanism called wear leveling to spread out the damage so that the device will last longer, but even that has its limits.
RAM — Random access memory is also not traditionally seen as a storage medium, but it can be used as an ultra-fast storage device. RAM can be adapted to any of storage interfaces above to emulate traditional storage devices connected through SCSI or ATA, but it can also emulate a storage device through software called RAM drives. RAM doesn't suffer the same limited number of write cycles as flash memory, but it is by far the most expensive form of storage. For super high-end storage applications, its high cost may be justifiable.
SAN — Storage area network
SANs offer a higher level of functionality than DAS because it permits multiple hosts (server computers) to attach to a single storage device at the block level. It does not permit simultaneous access to a single storage volume within the storage device, but it does allow one server to relinquish control of a volume and then another server to take over the volume. This is useful in a clustering environment, where a primary server might fail and a backup server has to take over and connect to the same storage volume. Because a SAN offers block-level storage to the host, it fools the application into believing it's using a DAS storage subsystem, which offers a lot of compatibility advantages.
FC — Fibre channel is one of the older, established high-end forms of a SAN. It's common for FC SANs to use native FC hard drives, but they're not limited to it. There are FC SAN implementations that use SCSI or even ATA hard drives. FC SANs typically use 1, 2, or 4 gigabit fiber optic cabling, but less expensive copper cabling and interfaces are used for shorter distances.
FC storage arrays can be directly attached to a server. However, that defeats the ability to reconnect to other servers on the fly if one server fails, so they're typically attached via FC switches. The downside is that FC switches are very expensive per port, especially for the higher-end 4 gigabit variety. It's common for 16-port FC switches to cost tens of thousands of dollars. While the performance is high and the technology is well established, it requires a different knowledge set to manage an FC SAN.
iSCSI — Internet SCSI is a low-cost alternative to FC that's considered easier to manage and connect because it uses the common TCP/IP protocol and common Ethernet switches. Because any network engineer is familiar with TCP/IP and Ethernet switch configuration, and gigabit Ethernet adapters and switches are cheap, the cost advantages over FC SANs are compelling. A 16-port gigabit switch can be anywhere from 10 to 50 times cheaper than an FC switch and is far more familiar to a network engineer. Another benefit to iSCSI is that because it uses TCP/IP, it can be routed over different subnets, which means it can be used over a wide area network for data mirroring and disaster recovery.
Most iSCSI implementations use gigabit Ethernet 1000BASE-T, but speeds can be scaled to 10 gigabits per second with 10GBASE-CX4 and soon with the less expensive 10GBASE-T using twisted pair CAT-6 or CAT-7 copper cabling. It's possible to mix gigabit and 10 gigabit Ethernet such that a high-end storage array uses 10 gigabit Ethernet, but the multiple servers fed by the array connect to the switch using single gigabit Ethernet.
The downside to iSCSI is that it is computationally expensive for high storage throughput because it has to encapsulate the SCSI protocol into TCP packets. This means that it either incurs high CPU utilization (not much of a problem with modern multicore processors) or it requires an expensive network card with TOE (TCP offloading engine) capability in the hardware.
iSCSI targets (iSCSI servers — the source of the storage) can come in the form of hardware storage arrays that speak the iSCSI protocol or they can come in the form of software added to a server. A server with iSCSI target software loaded is functionally the same as a hardware iSCSI target, but you can build it on top of any major server OS from BSD to Linux to Windows Server. There are open source Linux iSCSI targets and there is commercial iSCSI target software for Windows. Using a software solution allows you to serve a wide variety of devices as iSCSI targets that can be remotely mounted by iSCSI initiators (iSCSI clients) over TCP/IP. Hardware iSCSI targets are merely dedicated servers specifically designed to act as an iSCSI target, and they sometimes simultaneously behave as NAS devices. iSCSI initiator software is natively included in almost every operating system.
AoE — ATA over Ethernet is the most recent SAN technology to emerge, created as an even lower-cost alternative to iSCSI. AoE is a technology that encapsulates ATA commands into low-level Ethernet frames and avoids using TCP/IP. That means it doesn't incur CPU penalty or require high-end TOE-capable Ethernet adapters to support high storage throughput. This makes AoE a high-performance, very low-cost alternative to either FC or iSCSI. Its proponents also boast that the AoE specification fits onto eight pages, compared with the 257-page iSCSI specification.
Because AoE doesn't use TCP/IP, it isn't a routable technology — but then again, neither are FC SANs. Most SAN implementations don't require routability, and the fact that you might use AoE on a particular initiator or target doesn't prohibit you from using iSCSI. A lot of add-on initiator/target software will support both iSCSI and AoE. Most WAN applications are low-bandwidth, so it won't incur a lot of CPU utilization anyway. This means you can use AoE for the high throughput LAN/SAN environment and use iSCSI for the WAN at the same time without TOE Ethernet adapters.
AoE software initiator support is now native in Linux and BSD, but it isn't natively included in Windows, and you'll have to purchase third-party initiators. Coraid, which is a major supporter/supplier of AoE, provided the original FreeBSD device drivers.
NAS — Network attached storage
NAS is a file-level storage technology built on top of SAN or DAS technology. It's basically another name for "file server." NAS devices are usually just regular servers with stripped down operating systems that are dedicated to file serving. NAS devices typically use SMB (server message block) for Microsoft compatibility or NFS (network file system) for UNIX compatibility.
The benefit of a NAS over a SAN or DAS is that multiple clients can share a single volume, whereas SAN or DAS volumes can be mounted by only a single client at a time. The downside to a NAS is that not all applications will support it because they're expecting a block-level storage device, and most clustering solutions are designed to run on a SAN.
Many modern NAS appliances will support SAN technologies like iSCSI, and you can basically build the same hybrid storage solution using a general purpose operating system like Linux, BSD, or Windows using your own hardware.