Storage

Anatomy of hard disk clusters

Understand the anatomy of hard disk clusters will help you interpret what goes on behind the scenes during your basic maintenance functions. Talainia Posey gives you the details.


Practically every day I get questions about hard drive care. Often these questions relate to topics like FAT versus FAT32 or fragmentation issues. Both of these topics are closely related to the cluster structure on your hard disk. In this Daily Drill Down, I’ll discuss the cluster structure. I’ll also explain how it relates to such issues as disk space and performance.

Clusters
Each partition on your hard disk is subdivided into clusters. A cluster is the smallest possible unit of storage on a hard disk. The size of a cluster depends on two things:
  • The size of the partition
  • The file system installed on the partition

Regardless of the size of your partition, the maximum number of clusters remains the same. For example, in a standard FAT file system, a partition can contain no more than 65,536 clusters. Because of this limitation, cluster sizes vary depending on partition size. Basically, a small partition uses small cluster sizes while a large partition uses large cluster sizes.

In a world where bigger is better and size does matter, you might assume that a big hard drive with huge cluster sizes is the way to go. However, this isn’t the case. As you’ll recall, clusters are the smallest unit of storage space on a hard disk. This means that you can’t share a cluster among multiple files. If you have a tiny file and a huge cluster, the portion of that cluster unused by the file is wasted.

To understand the above statement, it’s necessary to look at some real numbers. In a standard FAT file system, the smallest size that a cluster can be is 2 KB. Therefore, if you save a 1-byte file on a hard disk using a 2-KB cluster size, you’ll actually use the entire 2 KB (2,048 bytes). So, you could potentially waste over 2,000 times more space than you’d use if you had a hard disk full of 1-byte files.

What makes this fact even scarier is that, as I mentioned before, 2 KB is the smallest size that a cluster can be. However, don’t forget about that 65,536-cluster limit. This means if you want to use 2-KB clusters, your partition size can be no more than 128 MB (2-KB clusters multiplied by 65,536 total clusters equals 134,217,728 bytes, or 128 MB). If you want to have a FAT partition larger than 128 MB, the cluster size will have to grow.

As the size of your partition increases, the cluster sizes double to accommodate it. Below is a chart that outlines the number of clusters required to accommodate various partition sizes. Remember that the maximum size of a FAT partition is 2 GB.

Cluster Size
Partition Size (bytes)
Partition size (MB)
2 KB 134,217,728 128
4 KB 268,435,456 256
8 KB 536,870,912 512
16 KB 1,073,741,824 1,024 (1 GB)
32 KB 2,147,483,648 2,048 (2 GB)

In the above chart, we derived the partition size in bytes by multiplying the cluster size by 1,024 to convert it from kilobytes to bytes. For example, 2 KB multiplied by 1,024 equals 2,048 bytes. Next, we multiplied the byte count by the maximum number of clusters per partition—65,536—to get the partition size in bytes. For example, 2,048 multiplied by 65,536 is 134,217,728 bytes. Next, we divided this number by 1,024 to convert the byte count to kilobytes and then divided that number by 1,024 to convert kilobytes into megabytes. For example, 134,217,728 bytes divided by 1,024 equals 131,072 KB. By dividing the number of kilobytes by 1,024, we got 128 MB.

As you can see from our chart, on a standard FAT partition, the maximum cluster size is 32 KB. This means that a 1-byte file on a 2-GB partition will consume a massive 32 KB, or 32,768 bytes.

Fortunately, in the real world, there aren’t a whole lot of 1-byte files. So, you may be wondering how the file system handles larger files. For example, say you were using a 2-KB cluster size and needed to save an 11-KB file. In a situation like this, the hard disk would have to use multiple clusters for the file. Each cluster would contain a portion of the file along with information about which cluster contains the next segment of the file. An 11-KB file in a partition with a 2-KB cluster size would consume six clusters. Five of the clusters would be completely used, while 1 KB of the last cluster would be wasted.

By way of comparison, suppose you saved that same 11-KB file onto a partition that uses 32-KB clusters. Only one cluster would be used as opposed to the six clusters used in the smaller partition. However, because the file is only 11 KB and the cluster is 32 KB, 21 KB of space would be wasted.

FAT32
As you can see, the FAT file system has two basic problems. First, it’s limited to 2 GB of space, which is way too small for many of today’s common applications and tasks. Second, those 32-KB clusters can waste a lot of space. This is where FAT32 comes in. FAT32 works its magic by drastically increasing the maximum number of clusters available on a partition. This solves both problems: It increases the total allowable partition size while decreasing the cluster size. In fact, FAT32 uses such a large cluster size that an 8-GB or smaller partition will consist of 4-KB clusters. I’ve been unable to locate a documented number of clusters in a FAT32 file system. However, based on the fact that FAT32 uses a 32-bit file system rather than the 16-bit file system used by the standard FAT, I’ve calculated the total maximum number of clusters to be about 4,294,967,296.

It’s possible to write a very long article on FAT32, but I won’t cover all the FAT32 intricacies in this Daily Drill Down. I did want to tell you how to upgrade to FAT32, though. You need to keep two things in mind. First, the only operating systems besides Windows 98 that can read a FAT32 partition are Windows 95 OEM Service Release 2 and Windows 2000. So, if you’re dual-booting with an operating system other than these two, either you won’t be able to use FAT32 or the other operating system won’t be able to use the partition. Second, if you have any old 16-bit disk utilities you’ve grown fond of, check for newer versions, because 16-bit disk utilities will mess up a FAT32 partition. The utilities that come with Windows 98 will work fine.

To convert an existing partition to FAT32, use the Drive Converter program. You can access this program by selecting Start | Programs | Accessories | System Tools | Drive Converter (FAT 32). If you’re creating a new partition and want to use FAT32, you can do so by running the FDISK program at a command prompt. The first thing FDISK will ask you is whether you want to enable large disk support. If you answer Yes to this question, you’ll be able to create partitions larger than 2 GB and Windows 98 will automatically use FAT32 when you format the new partition.

Defragmenting your hard disk
The entire cluster structure that I’ve just discussed has a lot more to do with your system than just how much hard disk space is wasted and how big your partition can be. Cluster arrangements also affect overall hard disk performance.

As you may recall, earlier when I discussed wasted space within clusters, I mentioned that, if a file consumed multiple clusters, each cluster would contain information about where the next cluster in the data string was located on the hard disk. This is a very important piece of information. That’s because by the natural process of using your computer, these clusters can become scattered all over your hard disk.

To see how this occurs, follow my example. Say you have a newly formatted hard disk that uses 2-KB clusters (we’re using 2 KB to keep the math easy). Now, suppose you write a 4-KB file to the hard disk. As you might suspect, the file will consume the first two clusters on the hard disk. Even though the two clusters are right next to each other, they contain information about the physical location of each other on the hard disk. Now let’s say you write a 6-KB file to the hard disk. Because the first two clusters are already taken, the hard disk will search for the next available cluster, which is the third cluster on the disk. The 6-KB file will consume clusters 3, 4, and 5. Now suppose you write another 4-KB file to the hard disk. The hard disk will look for the next available cluster, which is number 6. Therefore, the file will consume clusters 6 and 7.

As the example above illustrates, any time a file is written to the hard disk, the hard disk uses the first available cluster and the clusters that follow it to write the file. This process works great until you (or the operating system) start erasing files. In the example above, your hard disk contains a 4-KB file, a 6-KB file, and another 4-KB file. But if you erase the 6-KB file, clusters 3, 4, and 5 will be marked as free. Now suppose you wrote an 8-KB file to the hard disk. The hard disk will see that the first available cluster is cluster 3. The hard disk will begin saving the file in clusters 3, 4, and 5 because they’re empty. However, the 8-KB file needs four clusters. Because the gap between files on the hard disk contains only three clusters, the hard disk will use those three clusters and then look for another free cluster elsewhere on the disk. In the case of our example, the next available cluster is cluster 8.

This means that our 8-KB file is not linear like the other files on the hard disk. Instead, it’s fragmented, meaning that the file is stored in two or more fragments scattered all over the hard disk. So, when Windows tries to read the file, it will initially read the first three clusters. Before it can read the fourth cluster, though, the linking information found in the third cluster will tell the drive that it must move the drive head to a different location before continuing to read the file.

By itself, the process of moving the drive head to a new location to read a portion of a file is no big deal. It takes only a few milliseconds to do. However, suppose you have a really big file scattered among hundreds or even thousands of fragments. All of the moving around that the drive head must do to read the file can take a very long time.

This is where defragmentation comes into play. Defragmentation is the process of organizing the various data fragments into a linear string. Once the fragments are organized into a linear string, the drive head doesn’t have to move around to read a file, thus dramatically decreasing the amount of time that it takes Windows to load the file.

There are many different algorithms for defragmenting a drive. These various algorithms organize files differently. But, to get a basic understanding of how the process might work, let’s return to our earlier example in which your hard disk contains a 4-KB file, the first three clusters of an 8-KB file, another 4-KB file, and the last cluster of the 8-KB file. In a situation such as this, the disk defragmenter will look at the first file and see that it’s already linear. The disk defragmenter will then look at the second file and determine that it’s fragmented. Because the file is fragmented, the disk defragmenter will temporarily move it to the end of the drive, thus freeing up the three clusters in the middle and the one cluster at the end. The disk defragmenter will then move the next file into the gap created by moving out the 8-KB file. Finally, the defragmenter will reassemble the 8-KB file in its entirety directly behind the other two files. The new disk layout will consist of the first 4-KB file, the second 4-KB file, and the 8-KB file. Of course, the disk layout isn’t directly visible to the end user, but as I mentioned, it does affect performance.

If you’ve never defragmented your hard drive, I highly recommend doing so. The primary tool for defragmenting your hard drive is the Disk Defragmenter. You can access it by selecting Start | Programs | Accessories | System Tools | Disk Defragmenter. I recommend defragmenting your hard drive at least weekly.

Conclusion
In this Daily Drill Down, I’ve discussed the basic anatomy of hard disk clusters. I also discussed what’s going on behind the scenes during some basic maintenance functions.

Talainia Posey learned to handle PCs the old-fashioned way: by reading manuals and doing on-the-job troubleshooting. Her experience also includes installing networks for several small companies. When she's not working on computers, Talainia loves to shop for toys, watch cartoons, or spend time with her cat, Beavis.

The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.

Editor's Picks

Free Newsletters, In your Inbox