Upon close examination of your hard drive in Windows, you will notice that the number of gigabytes actually available is not as much as advertised.
I always appreciate it when one article leads to another. It makes my job as a technical writer for TechRepublic so much easier. While writing my last article, "Maximize the Performance of Microsoft Vista and Intel Matrix RAID: Part 2, the issue of what hard drive capacity is and how it is reported in Windows reared its ugly head.
That article wasn't the place to discuss why my Samsung 750GB hard drive showed only 698.6GB in Windows. As you will see, the explanation is a little more complicated than you might guess. In this blog post, I want to face the topic head on in my own inimitable way.
This blog post is also available in PDF format in a free TechRepublic download.
The historyHow all this madness started is a fascinating stroll down the halls of computer history. Anyone who knows anything at all about computers knows that computers use binary numbers to count. Each piece of information used by a computer has one of two values, 0 or 1, called bits - short for binary digits. Eight bits are combined to create a byte. By the very nature of memory (Table A), more bytes are usually added by doubling them from 1 to 2 and then doubling them again nine more times until we reach 1,024 bytes.
Binary and decimal values up to 1,024 decimal
And it is the 1,024 number of bytes where our story and problems start. Since 1,024 is close enough to 1,000, everyone started calling 1,024 bytes one kilobyte -- kilo being the SI or metric prefix meaning 1,000. The computer industry "borrowed" the SI prefixes and continued to use them even though the SI standards were defined as decimal-only values in 1960. It was so much easier to say 1K rather than one thousand and twenty four bytes, or 1.024 kilobytes. Besides, no one seemed to care about a measly little 24-byte discrepancy.
Most people have probably heard of Moore's law. Memory and storage capacity have followed similar paths to the large sizes and capacities that are common today. My first computer was a Northstar Horizon microcomputer. It had four 16K S-100 RAM boards for a total of 64K RAM.
This Northstar 16K RAM board has four rows by eight columns of 4,096-bit dynamic MOS memory chips delivering a total of 16,384 bytes. The symbol for kilobytes in 1978 was K and not the familiar KB used today.
This 2008 1GB SDRAM DIMM provides 1,073,741,824 bytes of memory or 65,536 times more memory than the Northstar 16K RAM board manufactured 30 years earlier. 1024MB is clearly marked on the label.
Thousands of bytes soon turned to millions of bytes and now to billions of bytes of memory. Storage capacity is now measured in trillions of bytes.The problem isn't just that the numbers increase as capacity grows, but the percentage difference also increases as you move from MB to GB and GB to TB (see Table B). The difference between the binary and decimal values for the 16K Northstar RAM board is 2.4%. The difference between the binary and decimal values for the 1GB SDRAM DIMM is 7.37%.
As capacity increases, the percentage differences between binary bytes and decimal bytes also increases.
As the differential grows, it becomes more and more important that IT professionals and their team members fully understand exactly what number is being used.
Take a look at online forums discussing computer topics and you will find a lot of people asking why their 750GB hard drive shows only 698.6GB in Windows. A "loss" of 51.4GB is no small matter.
Consumers need to know exactly how many bytes, decimal or binary, they can store on a drive before they make a purchase. The paragraph under "Details" in this SSD ad is an excellent example of the kind of full disclosure the consumer should see. Also, consider how confusing this must be for the average consumer.
Conversion error woes
It's a more serious problem than consumers feeling that they have been deceived by hard drive manufacturers.
One conversion error that wasn't caught until it was too late occurred with the Mars Climate Orbiter.
"The MCO MIB has determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software file ...
"... At the time of Mars insertion, the spacecraft trajectory was approximately 170 kilometers lower than planned. As a result, MCO either was destroyed in the atmosphere or re-entered heliocentric space after leaving Mars' atmosphere.
"... the impulse bit data contained in the AMD file was delivered in lb-sec instead of the specified and expected units of Newton-sec."
Quoted from ftp://ftp.hq.nasa.gov/pub/pao/reports/1999/MCO_report.pdf
The failure was not due to a conversion error between decimal and binary bytes, but it is only a matter of time before a similar failure occurs due to the decimal byte and binary byte confusion.
The solution -- the IEC standardWikipedia details the International Electrotechnical Commission (IEC) IEC 60027 -2 standard that proposed new binary prefixes beginning in 1998 and 1999. The standard requires that all measurements in binary bytes be changed from the SI symbol KB to the IEC symbol KiB, MB to MiB, and so on down the list of IEC names. The IEEE (Institute of Electrical and Electronics Engineers) issued a nearly identical standard, IEEE 1541-2002 in 2005 (Table C).
The symbols and names that the standard defines
The IEC standard for binary byte nomenclature
Kibibytes? Mebibytes? You hear any computer types say mebibyte lately? I haven't seen it written, but I have seen some usage of the short notation symbols in the Internet forums. Personally, I have decided that mebibyte is one word that shall not pass through these lips.
And what about all the existing documentation? Who is going to change all those old documents? My guess is no one. If anything, adoption of the IEC standard could cause more confusion years from now when you have to review old documents and try to determine if KB meant 1,024 bytes or 1,000 bytes.
The IEC standard has done little to resolve the confusion. No doubt their motives were pure -- provide an alternative to the incorrect use of the SI symbols. It should be clear by now that the computer geeks aren't going to give back the SI prefixes. The damage, what damage there is, has already been done.
Leave the current binary-byte naming conventions in place. After all, it was the computer types who created and defined the terms kilobytes, megabytes, gigabytes, etc.
Since adoption of the IEC standard has been abysmal, the IEC standard is the solution that never was.
An alternative solutionThe IEC standard requires that all binary-based numbers be changed to KiB, MiB, etc. This seems like a rather backward way to set the standard. Why not let KB mean 1,024 bytes? Doesn't it make more sense to redefine the decimal-byte nomenclature? (See Table D.) You wouldn't have to worry about changing most existing documentation, and the standard could be adopted slowly over time as IT professionals, software manufacturers, and hardware manufactures began to use decimal bytes and adopt the new standard.
Proposed naming convention for binary bytes and decimal bytes. A much more interesting naming convention for decimal bytes that is easier on the tongue.Under this proposed system, KB would stand for KiloBinary bytes, or 1,024 bytes -- just as it has in recent history. The simple symbol K would stand for decimal Kilo bytes, or 1,000 bytes. There is the possibility of confusion in older documentation, but the impact would likely be negligible.
The symbols bit or bits for "bit" would be acceptable only for less than 1,000 bytes (8,000 bits) or 1K.
Just for fun, I added another column with some more interesting proposed names for your esteemed review.
Hardware and binary bytes / decimal bytesHardware follows a mixed bag of binary- and decimal-byte nomenclature. Data transfer speeds (Table E) are typically measured in decimal bits per second. There are some exceptions, that is when bytes/sec is used.
Common PC protocols and their data transfer rates in binary or decimal measurements
Memory sizes are measured in binary bytes. PC component data transfer rates are typically measured in decimal bytes. PC component capacities are measured in both binary bytes and decimal bytes.Hard drive and flash drive manufacturers routinely use decimal bytes to report capacity. Optical disc manufacturers use both binary bytes and decimal bytes to measure capacity. CDs are measured in binary bytes. BD and DVD capacity is measured in decimal bytes. Floppies use neither binary nor decimal bytes! Confusing? You bet, see Table F.
Common PC components and removable media and their speeds/data transfer rates, memory sizes/capacities and cache sizes in decimal or binary measurements. Data transfer rates are typically but not always measured in decimal bits per second.
Is it any wonder the average computer user is confused when CDs are measured in binary bytes but DVDs are measured in decimal bytes? I like to call this type of confusion consistently inconsistent confusion.
Software and binary bytes / decimal bytes
Software, like hardware, is a mixed bag of short notation names. Some software follows the IEC binary naming convention, most notably the Linux kernel and GNU Core Utilities.
Most software reports numeric values in binary bytes. In Windows, file sizes, memory sizes, storage device capacities, and partition sizes are all reported in binary bytes. Interestingly, my home network is reported as 1Gbps by Vista, which is a decimal gigabit.
Like much of the IT industry, Windows uses the SI naming convention improperly. For the most part, the symbols KB, MB, and GB used in Windows are binary symbols and should be changed to KiB, MiB, and GiB to conform to the IEC standard.
The IEC standards were released in 1999, yet Microsoft has not adopted the standards. Why not? To be fair I haven't adopted them myself nor have most IT professionals.
There is another solution. Windows and all software should report values in decimal bytes using decimal prefixes whenever short notation is used.
The four mysteries
I can't make a strong case for the adoption of the IEC standard, but I can make a strong case for the use of decimal bytes:
- The case of the missing gigabytes: "Can someone here please tell me why my 750GB hard drive only shows 698.6GB? I know that there is some overhead for the file system but more than 50GB? I paid for 750GB, but I got only 698.6GB!"
- The case of the mysterious partition size: "I tried to create a 40GB partition. I entered the number 40000 into the Simple Volume Size in MB text box, but when the format finished the new partition was only 39.06GB? What happened? Why isn't it 40GB?"
- The case of the unexpected coaster: "I needed to burn some files to a DVD. Explorer showed that I had 4,650MB of data that I wanted to burn. "Perfect," I thought. "That will almost completely fill up the DVD." When I tried to burn the files to the DVD, the burn failed because it ran out of space. Why? Now my DVD is good only for my oversized coffee cup."
- The case of the 3-digit mishap: "I submitted an article to my editor. It was several days later during some quiet time that I realized that I had made a mistake with some of the numbers. What was my mistake? I had taken numbers like 1,234MB from Explorer and converted them to 1.23GB by moving the decimal point three places to the left. I knew better, but it's an easy mistake to make."
The first three stories are fictional, but the scenarios are all too real. The last story happened to me when I submitted the "Automate Custom Vista Installs with vLite" article to Mark Kaelin. Fortunately the article hadn't been published yet, and Mark kindly fixed the numbers without any snide remarks.
There are some workarounds. For The case of the mysterious partition size, multiply 24 for every 1GB and add the expected partition size in MB to get the partition size in GB you want. For example: You want a 30GB partition, enter (24 * 30) + 30,000 or 30720 into the Simple Volume Size in MB text box.
The mistake I made in The case of the 3-digit mishap can be avoided by looking at the information pane at the bottom of the Explorer window. In Windows Vista and Windows 7, files 1,000KB to 999,999KB are shown in MBs. Files 1,000,000KB and up are shown in GBs. The same information can be displayed in XP by enabling Status Bar from Explorer's View menu item. You can also right-click on the file and select Properties to see the file size in bytes and in MB or GB. A third option is to move the mouse to the file name and hover over the name to see a pop-up window with the same information.
These four mysteries are no doubt no longer a mystery to you, patient reader. To the average computer user, they are a total mystery. The online forums are filled with stories just like these.
The case for base 10 in Microsoft Windows
As Paul Allen and Bill Gates knew, and IBM soon learned, "He who controls the operating system controls the computer." And so it seems that until Microsoft decides on a solution, the unconventional naming convention and the case of binary byte v. decimal byte will continue.
Civilized humans have been using the decimal system since the fifth century. It wasn't until computers came along, and personal computers in particular, that the binary system came into wide use.
Quick, how many GBs are 2,406.4 binary MBs? The answer, 2.35 binary GBs, isn't easy to determine without a calculator. I have bookmarked this handy Web site that does the calculation for me. The point is that I shouldn't have to use a calculator when the conversion is so easy in base 10.
Quick, how many GBs is 2,406.4 decimal MBs? The answer, 2.4064 decimal GBs, is easily determined. Simply move the decimal point three places to the left, and you have the answer.
Each new version of Windows is touted as a major release that improves productivity. Here is an opportunity for Microsoft to make a real improvement in productivity simply by reporting decimal bytes.
One more point: If Microsoft does change the way that Windows presents bytes to the user, they should also use new decimal prefixes so that there is no confusion with the historical usage of the SI prefixes.
I get a headache every time this topic comes up -- and it comes up more and more often these days. As I mentioned at the beginning of this article, it arose in my last article, "Maximize the Performance of Microsoft Windows and Intel Matrix RAID: Part 2." I seriously considered using the IEC standards and writing MiB and GiB where appropriate and then sanity returned and, like Microsoft, I decided against it.
I got a lot of headaches researching and writing this article too. We probably wouldn't be having this discussion were it not for the fact that the hard drive industry decided to use the accurate and true definition of the SI symbols M and G. Shame on them for that!
The modem industry and networking industry also followed a similar model by naming their product's performance in decimal numbers. They inflated the same numbers by using bits instead of bytes. Sneaky bit of marketing there?
But now that the topic has been forced to the light of day perhaps it is time to look again at how our software reports memory sizes, capacities, and speeds to us. As much as I hate to admit it, the hard drive marketers are right. Decimal bytes are the best way to report hard drive capacities. No doubt their motives weren't pure as the cynical side of me so loudly says. The "rosy glasses" side of me whispers that they were just ahead of their time.
You may have noticed that I have written memory sizes and the binary/decimal byte values in Table B in total bytes. I have also prefaced short notation bytes with binary or decimal throughout this article. That seemed to be the simplest way to differentiate between 1,073,741,824 binary bytes and 1,000,000,000 decimal bytes when the short notation 1GB was used. You may have found reading all those numbers to be irksome. I can tell you that reading nine zeros is much less tiresome than having to write all of them!
And that is the bottom-line reason why a new standard is needed. It is so much more convenient to write 12.34GB than 12,340,000,000 bytes or 12.34 decimal gigabytes. However, the person reading 12.34GB needs to fully understand, without confusion, exactly how many bytes 12.34GB really is.
Blame it on the hard drive manufacturers or the DVD manufacturers or whomever, it doesn't matter at this point. A solution needs to be found and accepted industry-wide soon. It should have been done years ago.
It is clear that the IEC standard has failed. Other than some UNIX and Linux devotees, its adoption has been poor. Microsoft's adoption of decimal bytes in all their software would go a long way toward solving the binary-byte and decimal-byte confusion.
I say gigabyte. You say gibibyte. Let's call the whole thing off.
Stay on top of the latest XP tips and tricks with TechRepublic's Windows XP newsletter, delivered every Thursday. Automatically sign up today!
Credit to George Gershwin and Ira Gershwin for the words borrowed from their song "Let's Call the Whole Thing Off."
For the sake of accuracy I would like to include the following notes:
- 750GB decimal bytes are 698.5GB binary bytes; My Samsung 750GB drives are reported as 698.6GB in Windows Vista. The reason for this discrepancy is that the drive has 1,465,149,167 LBAs * 512 bytes for a total of 750,156,373,504 bytes or 698.637565135956 binary gigabytes.
- Bytes are typically 8 bits, particularly in personal computing, but can vary depending on the operating system or hardware.
- For all practical purposes, KB has become the standard symbol in the computing industry for kilobyte. The symbol KB does not follow the SI or metric standard for kilo. The SI standard uses a lowercase "k" to denote kilo with a capital "K" reserved for Kelvin, a measurement of temperature.
- There is no SI standard for bits or bytes. IEEE 1541 recommends the symbol "b" for bits and "B" for bytes. IEC 60027-2 uses the symbol "B" for bytes but defines the symbol "bit" instead of "b" for bits.