Hardware

Learn why 250GB is not ever reported as 250GB

Upon close examination of your hard drive in Windows, you will notice that the number of gigabytes actually available is not as much as advertised.

I always appreciate it when one article leads to another. It makes my job as a technical writer for TechRepublic so much easier. While writing my last article, "Maximize the Performance of Microsoft Vista and Intel Matrix RAID: Part 2, the issue of what hard drive capacity is and how it is reported in Windows reared its ugly head.

That article wasn't the place to discuss why my Samsung 750GB hard drive showed only 698.6GB in Windows. As you will see, the explanation is a little more complicated than you might guess. In this blog post, I want to face the topic head on in my own inimitable way.

This blog post is also available in PDF format in a free TechRepublic download.

The history

How all this madness started is a fascinating stroll down the halls of computer history. Anyone who knows anything at all about computers knows that computers use binary numbers to count. Each piece of information used by a computer has one of two values, 0 or 1, called bits - short for binary digits. Eight bits are combined to create a byte. By the very nature of memory (Table A), more bytes are usually added by doubling them from 1 to 2 and then doubling them again nine more times until we reach 1,024 bytes.

Table A

Binary and decimal values up to 1,024 decimal

The problem

And it is the 1,024 number of bytes where our story and problems start. Since 1,024 is close enough to 1,000, everyone started calling 1,024 bytes one kilobyte -- kilo being the SI or metric prefix meaning 1,000. The computer industry "borrowed" the SI prefixes and continued to use them even though the SI standards were defined as decimal-only values in 1960. It was so much easier to say 1K rather than one thousand and twenty four bytes, or 1.024 kilobytes. Besides, no one seemed to care about a measly little 24-byte discrepancy.

Most people have probably heard of Moore's law. Memory and storage capacity have followed similar paths to the large sizes and capacities that are common today. My first computer was a Northstar Horizon microcomputer. It had four 16K S-100 RAM boards for a total of 64K RAM.

Figure A

This Northstar 16K RAM board has four rows by eight columns of 4,096-bit dynamic MOS memory chips delivering a total of 16,384 bytes. The symbol for kilobytes in 1978 was K and not the familiar KB used today.

Figure B

This 2008 1GB SDRAM DIMM provides 1,073,741,824 bytes of memory or 65,536 times more memory than the Northstar 16K RAM board manufactured 30 years earlier. 1024MB is clearly marked on the label.

Thousands of bytes soon turned to millions of bytes and now to billions of bytes of memory. Storage capacity is now measured in trillions of bytes.

The problem isn't just that the numbers increase as capacity grows, but the percentage difference also increases as you move from MB to GB and GB to TB (see Table B). The difference between the binary and decimal values for the 16K Northstar RAM board is 2.4%. The difference between the binary and decimal values for the 1GB SDRAM DIMM is 7.37%.

Table B

As capacity increases, the percentage differences between binary bytes and decimal bytes also increases.

As the differential grows, it becomes more and more important that IT professionals and their team members fully understand exactly what number is being used.

Take a look at online forums discussing computer topics and you will find a lot of people asking why their 750GB hard drive shows only 698.6GB in Windows. A "loss" of 51.4GB is no small matter.

Consumers need to know exactly how many bytes, decimal or binary, they can store on a drive before they make a purchase. The paragraph under "Details" in this SSD ad is an excellent example of the kind of full disclosure the consumer should see. Also, consider how confusing this must be for the average consumer.

Conversion error woes

It's a more serious problem than consumers feeling that they have been deceived by hard drive manufacturers.

One conversion error that wasn't caught until it was too late occurred with the Mars Climate Orbiter.

"The MCO MIB has determined that the root cause for the loss of the MCO spacecraft was the failure to use metric units in the coding of a ground software file ...

"... At the time of Mars insertion, the spacecraft trajectory was approximately 170 kilometers lower than planned. As a result, MCO either was destroyed in the atmosphere or re-entered heliocentric space after leaving Mars' atmosphere.

"... the impulse bit data contained in the AMD file was delivered in lb-sec instead of the specified and expected units of Newton-sec."

Quoted from ftp://ftp.hq.nasa.gov/pub/pao/reports/1999/MCO_report.pdf

The failure was not due to a conversion error between decimal and binary bytes, but it is only a matter of time before a similar failure occurs due to the decimal byte and binary byte confusion.

The solution -- the IEC standard

Wikipedia details the International Electrotechnical Commission (IEC) IEC 60027 -2 standard that proposed new binary prefixes beginning in 1998 and 1999. The standard requires that all measurements in binary bytes be changed from the SI symbol KB to the IEC symbol KiB, MB to MiB, and so on down the list of IEC names. The IEEE (Institute of Electrical and Electronics Engineers) issued a nearly identical standard, IEEE 1541-2002 in 2005 (Table C).

Table C

The symbols and names that the standard defines

The IEC standard for binary byte nomenclature

Kibibytes? Mebibytes? You hear any computer types say mebibyte lately? I haven't seen it written, but I have seen some usage of the short notation symbols in the Internet forums. Personally, I have decided that mebibyte is one word that shall not pass through these lips.

And what about all the existing documentation? Who is going to change all those old documents? My guess is no one. If anything, adoption of the IEC standard could cause more confusion years from now when you have to review old documents and try to determine if KB meant 1,024 bytes or 1,000 bytes.

The IEC standard has done little to resolve the confusion. No doubt their motives were pure -- provide an alternative to the incorrect use of the SI symbols. It should be clear by now that the computer geeks aren't going to give back the SI prefixes. The damage, what damage there is, has already been done.

Leave the current binary-byte naming conventions in place. After all, it was the computer types who created and defined the terms kilobytes, megabytes, gigabytes, etc.

Since adoption of the IEC standard has been abysmal, the IEC standard is the solution that never was.

An alternative solution

The IEC standard requires that all binary-based numbers be changed to KiB, MiB, etc. This seems like a rather backward way to set the standard. Why not let KB mean 1,024 bytes? Doesn't it make more sense to redefine the decimal-byte nomenclature? (See Table D.) You wouldn't have to worry about changing most existing documentation, and the standard could be adopted slowly over time as IT professionals, software manufacturers, and hardware manufactures began to use decimal bytes and adopt the new standard.

Table D

Proposed naming convention for binary bytes and decimal bytes. A much more interesting naming convention for decimal bytes that is easier on the tongue.
Under this proposed system, KB would stand for KiloBinary bytes, or 1,024 bytes -- just as it has in recent history. The simple symbol K would stand for decimal Kilo bytes, or 1,000 bytes. There is the possibility of confusion in older documentation, but the impact would likely be negligible.

The symbols bit or bits for "bit" would be acceptable only for less than 1,000 bytes (8,000 bits) or 1K.

Just for fun, I added another column with some more interesting proposed names for your esteemed review.

Hardware and binary bytes / decimal bytes

Hardware follows a mixed bag of binary- and decimal-byte nomenclature. Data transfer speeds (Table E) are typically measured in decimal bits per second. There are some exceptions, that is when bytes/sec is used.

Table E

Common PC protocols and their data transfer rates in binary or decimal measurements

Memory sizes are measured in binary bytes. PC component data transfer rates are typically measured in decimal bytes. PC component capacities are measured in both binary bytes and decimal bytes.

Hard drive and flash drive manufacturers routinely use decimal bytes to report capacity. Optical disc manufacturers use both binary bytes and decimal bytes to measure capacity. CDs are measured in binary bytes. BD and DVD capacity is measured in decimal bytes. Floppies use neither binary nor decimal bytes! Confusing? You bet, see Table F.

Table F

Common PC components and removable media and their speeds/data transfer rates, memory sizes/capacities and cache sizes in decimal or binary measurements. Data transfer rates are typically but not always measured in decimal bits per second.

Is it any wonder the average computer user is confused when CDs are measured in binary bytes but DVDs are measured in decimal bytes? I like to call this type of confusion consistently inconsistent confusion.

Software and binary bytes / decimal bytes

Software, like hardware, is a mixed bag of short notation names. Some software follows the IEC binary naming convention, most notably the Linux kernel and GNU Core Utilities.

Most software reports numeric values in binary bytes. In Windows, file sizes, memory sizes, storage device capacities, and partition sizes are all reported in binary bytes. Interestingly, my home network is reported as 1Gbps by Vista, which is a decimal gigabit.

Like much of the IT industry, Windows uses the SI naming convention improperly. For the most part, the symbols KB, MB, and GB used in Windows are binary symbols and should be changed to KiB, MiB, and GiB to conform to the IEC standard.

The IEC standards were released in 1999, yet Microsoft has not adopted the standards. Why not? To be fair I haven't adopted them myself nor have most IT professionals.

There is another solution. Windows and all software should report values in decimal bytes using decimal prefixes whenever short notation is used.

The four mysteries

I can't make a strong case for the adoption of the IEC standard, but I can make a strong case for the use of decimal bytes:

  1. The case of the missing gigabytes: "Can someone here please tell me why my 750GB hard drive only shows 698.6GB? I know that there is some overhead for the file system but more than 50GB? I paid for 750GB, but I got only 698.6GB!"
  2. The case of the mysterious partition size: "I tried to create a 40GB partition. I entered the number 40000 into the Simple Volume Size in MB text box, but when the format finished the new partition was only 39.06GB? What happened? Why isn't it 40GB?"
  3. The case of the unexpected coaster: "I needed to burn some files to a DVD. Explorer showed that I had 4,650MB of data that I wanted to burn. "Perfect," I thought. "That will almost completely fill up the DVD." When I tried to burn the files to the DVD, the burn failed because it ran out of space. Why? Now my DVD is good only for my oversized coffee cup."
  4. The case of the 3-digit mishap: "I submitted an article to my editor. It was several days later during some quiet time that I realized that I had made a mistake with some of the numbers. What was my mistake? I had taken numbers like 1,234MB from Explorer and converted them to 1.23GB by moving the decimal point three places to the left. I knew better, but it's an easy mistake to make."

The first three stories are fictional, but the scenarios are all too real. The last story happened to me when I submitted the "Automate Custom Vista Installs with vLite" article to Mark Kaelin. Fortunately the article hadn't been published yet, and Mark kindly fixed the numbers without any snide remarks.

There are some workarounds. For The case of the mysterious partition size, multiply 24 for every 1GB and add the expected partition size in MB to get the partition size in GB you want. For example: You want a 30GB partition, enter (24 * 30) + 30,000 or 30720 into the Simple Volume Size in MB text box.

The mistake I made in The case of the 3-digit mishap can be avoided by looking at the information pane at the bottom of the Explorer window. In Windows Vista and Windows 7, files 1,000KB to 999,999KB are shown in MBs. Files 1,000,000KB and up are shown in GBs. The same information can be displayed in XP by enabling Status Bar from Explorer's View menu item. You can also right-click on the file and select Properties to see the file size in bytes and in MB or GB. A third option is to move the mouse to the file name and hover over the name to see a pop-up window with the same information.

These four mysteries are no doubt no longer a mystery to you, patient reader. To the average computer user, they are a total mystery. The online forums are filled with stories just like these.

The case for base 10 in Microsoft Windows

As Paul Allen and Bill Gates knew, and IBM soon learned, "He who controls the operating system controls the computer." And so it seems that until Microsoft decides on a solution, the unconventional naming convention and the case of binary byte v. decimal byte will continue.

Civilized humans have been using the decimal system since the fifth century. It wasn't until computers came along, and personal computers in particular, that the binary system came into wide use.

Quick, how many GBs are 2,406.4 binary MBs? The answer, 2.35 binary GBs, isn't easy to determine without a calculator. I have bookmarked this handy Web site that does the calculation for me. The point is that I shouldn't have to use a calculator when the conversion is so easy in base 10.

Quick, how many GBs is 2,406.4 decimal MBs? The answer, 2.4064 decimal GBs, is easily determined. Simply move the decimal point three places to the left, and you have the answer.

Each new version of Windows is touted as a major release that improves productivity. Here is an opportunity for Microsoft to make a real improvement in productivity simply by reporting decimal bytes.

One more point: If Microsoft does change the way that Windows presents bytes to the user, they should also use new decimal prefixes so that there is no confusion with the historical usage of the SI prefixes.

Conclusion

I get a headache every time this topic comes up -- and it comes up more and more often these days. As I mentioned at the beginning of this article, it arose in my last article, "Maximize the Performance of Microsoft Windows and Intel Matrix RAID: Part 2." I seriously considered using the IEC standards and writing MiB and GiB where appropriate and then sanity returned and, like Microsoft, I decided against it.

I got a lot of headaches researching and writing this article too. We probably wouldn't be having this discussion were it not for the fact that the hard drive industry decided to use the accurate and true definition of the SI symbols M and G. Shame on them for that!

The modem industry and networking industry also followed a similar model by naming their product's performance in decimal numbers. They inflated the same numbers by using bits instead of bytes. Sneaky bit of marketing there?

But now that the topic has been forced to the light of day perhaps it is time to look again at how our software reports memory sizes, capacities, and speeds to us. As much as I hate to admit it, the hard drive marketers are right. Decimal bytes are the best way to report hard drive capacities. No doubt their motives weren't pure as the cynical side of me so loudly says. The "rosy glasses" side of me whispers that they were just ahead of their time.

You may have noticed that I have written memory sizes and the binary/decimal byte values in Table B in total bytes. I have also prefaced short notation bytes with binary or decimal throughout this article. That seemed to be the simplest way to differentiate between 1,073,741,824 binary bytes and 1,000,000,000 decimal bytes when the short notation 1GB was used. You may have found reading all those numbers to be irksome. I can tell you that reading nine zeros is much less tiresome than having to write all of them!

And that is the bottom-line reason why a new standard is needed. It is so much more convenient to write 12.34GB than 12,340,000,000 bytes or 12.34 decimal gigabytes. However, the person reading 12.34GB needs to fully understand, without confusion, exactly how many bytes 12.34GB really is.

Blame it on the hard drive manufacturers or the DVD manufacturers or whomever, it doesn't matter at this point. A solution needs to be found and accepted industry-wide soon. It should have been done years ago.

It is clear that the IEC standard has failed. Other than some UNIX and Linux devotees, its adoption has been poor. Microsoft's adoption of decimal bytes in all their software would go a long way toward solving the binary-byte and decimal-byte confusion.

I say gigabyte. You say gibibyte. Let's call the whole thing off.

Stay on top of the latest XP tips and tricks with TechRepublic's Windows XP newsletter, delivered every Thursday. Automatically sign up today!

Author's note:

Credit to George Gershwin and Ira Gershwin for the words borrowed from their song "Let's Call the Whole Thing Off."

For the sake of accuracy I would like to include the following notes:

  • 750GB decimal bytes are 698.5GB binary bytes; My Samsung 750GB drives are reported as 698.6GB in Windows Vista. The reason for this discrepancy is that the drive has 1,465,149,167 LBAs * 512 bytes for a total of 750,156,373,504 bytes or 698.637565135956 binary gigabytes.
  • Bytes are typically 8 bits, particularly in personal computing, but can vary depending on the operating system or hardware.
  • For all practical purposes, KB has become the standard symbol in the computing industry for kilobyte. The symbol KB does not follow the SI or metric standard for kilo. The SI standard uses a lowercase "k" to denote kilo with a capital "K" reserved for Kelvin, a measurement of temperature.
  • There is no SI standard for bits or bytes. IEEE 1541 recommends the symbol "b" for bits and "B" for bytes. IEC 60027-2 uses the symbol "B" for bytes but defines the symbol "bit" instead of "b" for bits.

About

Alan Norton began using PCs in 1981, when they were called microcomputers. He has worked at companies like Hughes Aircraft and CSC, where he developed client/server-based applications. Alan is currently semi-retired and starting a new career as a wri...

59 comments
danwat1234
danwat1234

Awesome article. Why Microsoft lies and says a 1000GB hard drive is 931GB in Explorer, when really they mean 931GiB, is retarded. They should fix their error in notation. Then all these hard drive manufactures will stop having to deal with angry customers.

oldbaritone
oldbaritone

Have fun, call it whatever new word they coin - Bottom line, 2^10 ~= 10^3 no matter what you call it, it's not the same. Sure, changing the name of 22/7 to "Pi" might make some student's life easier, but it wouldn't be the same number. And changing the name of 1,024 to 1,000 will merely add confusion to an already-confused convention. (and yes, I still have a couple of tubes of 4116 DRAMS)

oldbaritone
oldbaritone

Most Partitioning software supports MB -OR- %. If you have a 30GB Drive, and you want to save 9GB for something else, just set up one partition as 30% and the other as 70% - and you'll get what you expect.

TRster
TRster

Read KB as KiloBinary bytes and K as Kilo bytes? Are you kidding? It creates even more mess. Kilo is only multiplier (x1000). It's like "ten" (x10). Ten of what or thousand of what exactly? You have to tell to what unit it applies. If it applies to meters - use Km. To grams - use Kg. To bytes - use KB (the last character B in upper case indicates that it's Bytes and not bits). So, if you make that correction and start using proper abbreviation you have to use KBB (as KiloBibary Byte) and KB (as Kilo Byte). Is it better then in SI standard (respectively) KiB and KB? I don't think so...

helder_figueira
helder_figueira

If there was any conclusive proof of the existence of a stupidity gene in the human genome, this must be it. It represents the worst example of human intelligence according to the Theory of Multiple Intelligences: - Logical - Mathematical intelligence (Maths): zero, total inability to identify the difference between 1024 and 1000. - Visual - Kinesthetic Intelligence (Building): zero, inability to understand the simplest of numbering building conventions. - Verbal - Linguistic Intelligence (Reading): zero - plainly incorrect use of a simple word. - Visual - Spacial Intelligence (Art): reasonable effort in creative effort, but no valid use, aesthetic or otherwise. - Music Intelligence: Zero, still no valid use - Intrapersonal Intelligence (Personal Management): Zero - just serves to confuse oneself. - Extrapersonal Intelligence (Working Together): Zero - just serves to confuse others and cause communication problems. - Naturalistic Intelligence (Nature Awareness): Reduces human computer science communication to the level of apes... The most alarming thing though, is that it is found at the heart of our most "perfect" of sciences....lol. Yes, if there is a God, he/she does have a wicked sense of humour. Lets get rid of this idiotic convention... binary kilobyte... really !!! The only honourable mention it really needs is a nomination for the Nobel Prize for Stupidity and Ignorance.

BlueCollarCritic
BlueCollarCritic

I?ve not read thru every post so someone else may have already said something to this effect ?. I doubt that the drive manufacturers will ever change this labeling of Drive Size until they are forced to do so and by some governing body that can enforce that change. I hate to say it because the government already is too involved in our lives and in business in general but until something like a government body that can penalize a manufacturer for not making a change, gets involved I highly doubt this will change. Why is this the most likely outcome? Because in the end its all about money and for the drive manufacturers to be honest and truthful in labeling on their products how much usable space a drive has would automatically raise the perceived price of every hard drive. By that I mean that if every hard drive label (both the box labeling and those on the actual drive) were to include ?Usable Drive Space? then every drive would automatically cost more $$ per MB/GB because the amount of drive space the user gets for $x will have decreased simply by listing the true useable drive space.

CaptOska
CaptOska

tell me, what else in the computer hardware industry does not report 1024 bytes as 1KB?? floppies conform, thumb drives conform, system memory conform, OS reporting of file sizes conform, and every type of silicon based memory type conforms. Hard drives are the only type of memory that does not conform. I would guess that some marketing whiz kid figured that reporting "1K = 1000 bytes" allowed them to put larger numbers on the drive and charge that couple extra dollars for their product. If the hard drive industry started to conform now, they would admit the ruse. I suggest that if the hard drive industry does not wish to conform to the 1024 bytes = 1KB convention, they use scientific notation to indicate hard drive size. so in hard drive speak, 1000 bytes = 1.0e3 bytes, and 250MB = 2.5e8 bytes.

dav532000
dav532000

The Manufactures allready no that there are 1.024kb in a GB and not 1.000kb, so I see no reason why a 750GB Hard Drive should not be 750GB and not 698.5GB, and as you state the larger the Drive the more the Discrepancy in what Windows sees.

conceptual
conceptual

Bytes aren't decimal. Microprocessors aren't decimal. Address and data busses aren't decimal. Using decimal notation allows hard drive makers to weasel and waffle about the capacity of their devices. Gigawatts are decimal, Digabytes aren't and never were.

drewvous
drewvous

Really interesting stuff to learn this, especially the maths behind it.

Gis Bun
Gis Bun

Forget about the 250GB. I think the manufacturers should be forced to use the proper conversion of 1024 and none of this [lazy assed] 1000 stuff. On the other hand, I think most computer professionals know already that 750 GB ain't 750 GB but less than that. I'm sure the hard disk manufacturers are pissed off getting comments about the missing disk space just like when an OS is using part of the shared memory for video [i.e. you see 510 MB of memory instead of 512 MB]. But at least they're giving the full 512 MB of memory.

firstaborean
firstaborean

If one uses Power Desk software, the sizes of files are shown in bytes, using decimal numbers, and no conversion is needed. This helps only with the confusion one encounters with Windows Explorer, but any little help is useful, isn't it?

mmorganIBM
mmorganIBM

Kudos to Alan Norton for an entertaining and yet scholarly treatment of a rather pedestrian topic!

ideason88
ideason88

This was a mind bender for me - I had to get out my calculator and check out the website calculator you book marked but I finally got it. I'm building a new system today and armed with this new info I should be able to get the size partitions I want. Thanks!

john3347
john3347

Would it be terribly difficult or impractical to advertise a 250 GB harddrive as "250 GB (nominal)" and state in the fine print somewhere that actual capacity is 232.x GB? A simple label change would fix the whole issue. The typical harddrive purchaser chooses a harddrive with some amount of "growing room" anyway, and the techs already know the difference. The "nominal" designation should satisfy the lawyers. To the author: This would have been a good time to have included a couple of sentences describing the difference between "B" and "b". This causes much confusion to the non-technical world when they read about a certain Mb/s transfer rate and expect a MB/s transfer rate. ISP's always advertise bits because it makes for a bigger number. Kinda like a grocer labeling a 5 lb. bag of sugar as 80 oz.

dwood
dwood

I'm not a Mac Fan, but they have already changed their OS to base 10. See the excerpts below: In Mac OS X v10.6 Snow Leopard, storage capacity is displayed as per product specifications (base 10). A 200 GB drive show 200 GB capacity (for example, if you select the hard drive's icon and choose Get Info from the Finder's File menu, then look at the Capacity line). This means that, for example, if you upgrade from an earlier version of Mac OS X, your drive may show more capacity than in the earlier Mac OS X version. The storage drive in your Apple product, like all storage drives, uses some capacity for formatting, so actual storage available for applications will be less. In addition, other factors, such as pre-installed systems or other software and media, will also use part of the available storage capacity on the drive. In Snow Leopard, Apple has changed this convention and redefined "kilo", "mega", and "giga" in the system software to be their English definitions. As such, drive and file sizes will be calculated to be a slightly larger number than what they currently are. For example, if you had a drive that your old Leopard system said was 100GB, based on the computer notation convention it would actually have the capacity of 107,374,182,400 bytes (100*2^30) and not the inferred 100,000,000,000 bytes (100*10^9) as the prefix "giga" describes in English. With a drive that has exactly 100GB available on it in OS X 10.5 "Leopard", upgrading to Snow Leopard would have the system report the drive as 107.3GB (making it appear to be a full 7 GB larger), even though nothing has been done to the drive itself.

rwbyshe
rwbyshe

Why in the world do drive manufacturers have to label the drive in whole GB or TB numbers. Wouldn't the simpler resolution be if manufacturers simply adopted your first "Authors Note" and simply label the product as 695.8GB instead of using the wrong mathematical calculation of 750GB. This appears to be another "metric system fiasco" like we went through years ago. I vote for the KISS standard (keep it simple stupid)and TRUTH IN ADVERTISING. Label the darn drive with it's actual capacity. Let's face it, 695.8GB is certainly more honest than 750GB.

toomas.mottus
toomas.mottus

I was a product manager for a computer retailer back then. I do not remember what manufacturer it was, but it changed numbering so that 1MB was 1000 KB-s. It could report higher capacity for its drive. I suspect it was SanDisk. After that everyone else changed their marketing materials. Before that it was 1KB = 1024B and 1MB = 1024MB. 1GB is still 1024MB in RAM memories.

mail
mail

I note Alan's comment about capital K being used to denote Kelvin. I also appreciate that this is slightly 'off topic'. However, something that seems to be getting some traction in the UK is the use of case. This is especially so with comms speeds where 10KBps will mean 10 kilobytes per second and 10Kbps will mean 10 kilobits per second. Personally, I must admit with memory capacities, I like the idea of 'gigglebytes' & 'tribblebytes'. Something does need to be done about the problem and I think that the IEC/IEEE proposal is probably the best solution. However, I would officially make the current names of megabyte, gigabyte & terabyte as the decimal names. I do this simply because the average person in the street already uses these words and it is too late to change them. It probably isn't too late for pebibyte though.

dogknees
dogknees

I disagree with the idea of reporting "useable" size as it's basically dishonest. The disk has a certain capacity based on sectors/tracks and sides. This is the true capacity of the disk. The amount available is dependent on the file system and will vary, but that's the OS vendors problem, not the disk manufacturer. Once again we're getting advised to call something other than what it is to appease the ignorant. Great!

dogknees
dogknees

Why don't we just keep things as they are, and educate each group of new users that comes along, the same as we've been doing for 20 years? It's simply a matter of interpreting the K's and M's and so on appropriately in each situation. It's what we've been doing for decades, why change now?

Tony Hopkinson
Tony Hopkinson

1 Kb is 1024 bytes, it can be fully addressed with ten bits. The confusion engendered by a decimal number of bytes is a deliberate ploy, not a misunderstanding. No one capable of making a hard drive that works, is unaware of basic digital electronics.

LyleTaylor
LyleTaylor

Why would a hard drive manufacturer market and sell a 698.5GB hard drive when they can make it sound bigger (and, hence, better) by listing the size in decimal? Personally, I couldn't care less about the abbreviation standard. I think it would make more sense to standardize on whether things are listed using binary or decimal values.

pjboyles
pjboyles

The issue here is that hard drive manufactures polluted the defacto standard that xB was understood to be binary (1024) in the name of advertising and bragging rights. Older drives reported size on the base 2 standard. (MFM / RLL anyone?) What the International standards committee should have done was codify the defacto standard rather than make up a new one. And ding the hard drive manufatureres. So: xB should be base 2 and bytes xb should be base 2 and bits And for modems it was not base 10, they used bits and not bytes in their data transmissions rate standards. xB = xb/8 Thus your 2400 baud (bit/sec) connection is 300 Byte/sec. Now for your next blog you can do the rest of the story for net usable Bytes on a hard drive and net data transmission throughput after error correction and checksums.

QA_In_Vegas
QA_In_Vegas

Rather than trying to accomodate the shortcoming, why not just oversize the drives so that once the diminishing return sets in, we have the size that is actually stated on the box! And not for us so much as for the less-techie folks out there who ask the very question you state about "Why?". Then when you get your drive and it says 2TB, you can be spared doing the increasing-value-of-diminishing-returns calculations. Moore's Law states this won't be a financial concern of any consequence soon enough, so hard drive manufacturers can just step up and say "2TB FULL CAPACITY HARD DRIVE" and you know you're getting 2TB. I know I'd shell out a few more bucks to get what it SAYS I'm paying for.

rameeti
rameeti

Do note that Apple did accomplish the change to the IEEE standard when they released OS X Snow Leopard and the sky did not come falling down. The users did not really even notice much beyond the fact that they appeared to gain hard drive space. Apple now reports a 1 TB drive as having 1 TB of available space. It really isn't all that difficult. The user can do a Get Info and find out exactly how many bytes a file takes on the drive if they need to know.

jherring
jherring

Even before computers, when we let the manufactures TELL US a piece of lumber 3 and 1/2 inches wide and 1 and 1/2 inches thick was a 2X4 we lost it.

Alan Norton
Alan Norton

Hello John. I did address this in the very last item under Author's note:. Take a look at the link I provided for the 'full disclosure' disclaimer. http://www.memoryc.com/products/description/128GB_G_Skill_Falcon_SSD_Solid_State_Disk_MLC-64MB_cache-230MB_read_190MB_write_speed_/index.html It gets quite confusing trying to explain to the potential customer that the SSD capacity is measured in decimal bytes and cache is measured in binary bytes. Not exactly a KISS solution. And then there is the data transfer rate.... Edit - fix formatting

Alan Norton
Alan Norton

Thank you for that fascinating information. I only wish I had found it before completing the article. I'm not a Mac fan either but only because I haven't used Apple's products. I have a lot of respect for Apple's innovation and plain good common sense. Once again Apple has beaten Microsoft to the punch. I like the use of decimal bytes but I don't like the idea of using the same SI symbols.

wojnar
wojnar

I think you missed 1 important item, the original designations were caused by addressability and the binary architecture. Now, the best way is always the simplist - accuracy in the naming conventions will fix all. Ever since my EE days, I have wondered why we would rather 'say' 1M and mean 1024 k. Since 64 bit addressability is just as valid as the old 8 bit, lets all just use the correct designations according to measurable results - 1M is exactly 1000k. (and just think, you can upgrade your older hard drives by simply changing to measured capacity in many cases !)

Alan Norton
Alan Norton

I wondered when someone would comment on these. You get a gold star for being the first. I created my own naming convention just for fun. All right, they are silly. And then I looked at the names a second time and thought, "why not?" Quark and quasar are amusing terms and why should physicists and astronomers have all the fun? I can just hear Scotty saying it now, "Cap'n I can nae beam you up. We are havin' trouble with the tribblebytes!" "Something does need to be done about the problem and I think that the IEC/IEEE proposal is probably the best solution. However, I would officially make the current names of megabyte, gigabyte & terabyte as the decimal names. I do this simply because the average person in the street already uses these words and it is too late to change them. It probably isn't too late for pebibyte though." It might end up working just as you say.

Alan Norton
Alan Norton

Hi Tony, I suspect that if you asked IT techs they would agree with you. If you asked the average computer user they would say "What?" :-) But if they fully understood the issue I strongly suspect that they would choose decimal bytes. The question for Microsoft should be which is best for their end consumer.

Alan Norton
Alan Norton

Hello Lyle, You are probably right about how hard drive manufacturers report their drive capacities. There is no going back now. The utility of short notation using symbolic prefixes is that it is short. Adding the terms 'decimal' or 'binary' defeats the whole purpose of prefix nomenclature. Other than that I agree that it is not a bad solution.

alan
alan

To paraphrase Star Trek's Dr. Mckoy It's a baud, but not as we know it Jim ! ! Before there were PC's and DOS, there was BAUD. In those innocent days 2400 Baud was also known as 2400 dibits per second. That was restricted to a data + overhead transfer rate of 2400 bits per second maximum. The overhead could be minimal given Synchronous communication protocol on the digital interface, assuming negligible breaks in the flow of data (and bit stuffing inserted sync overheads as required). My experience was only with Asynchronous protocol which added to each 8 bit chunk of data a single start bit plus 1 or more stop bits. Using a single stop bit, then 10 dibits conveyed 8 data bits, or 1 byte, surrounded by 2 overhead bits, so 2400 dibits per second would only carry 240 Bytes per second along with 480 overhead bits per second. A 25% loss on what might have been hoped for. Actually it was much worse in real life industrial situations. Data is NOT information. An 8 bit data byte of INFORMATION needs at least two stop bits - otherwise any slight break in a data link would cause the serial interface to re-synchronous upon the next transition from Stop bit level to Start bit level, BUT any data byte which included 0 to 1 bit level transitions would also qualify, and there were data patterns that could be totally and CONTINUOUSLY mis-understood following a disruption with only a single stop bit, but a double stop bit could rapidly resync. I submit that 2400 Baud was typically capable of handling 240 bytes per second of data, or 218 bytes per second of INFORMATION. In my view a 2.4% discrepancy is an irritation, but I have known worse ! ! Alan

Alan Norton
Alan Norton

Hi Peter, Thank you for the well thought out feedback. "What the International standards committee should have done was codify the defacto standard rather than make up a new one. And ding the hard drive manufatureres." This would certainly have made more sense if they had done so back in 1999 when the IEC standards were released. And now?? "And for modems it was not base 10, they used bits and not bytes in their data transmissions rate standards." Modems use bits per second as you say but they do use decimal bits not binary bits whenever short notation is used to measure data transfer rates - 56Kb/sec = 56,000 bits per second. http://en.wikipedia.org/wiki/Bit_rate#Prefixes http://www.scotsnewsletter.com/best_of/dtrct.htm "Now for your next blog you can do the rest of the story for net usable Bytes on a hard drive and net data transmission throughput after error correction and checksums." Thank you. Every good idea for a new article is gratefully accepted. :-) You are right though, actual numbers do not match advertised numbers due to the very reasons you state. There IS even more to the story.

rwbyshe
rwbyshe

This would be treating the symptom and not the disease. The disease is the relative dishonesty and marketing practices that we allow manufacturers to use in presenting the info on their products to us. Simply eliminate all the exceptions that are allowed to the concept of "Truth in Advertising" and make them label the product accurately. That is the simplest solution. Instead we are suddenly trying to educate everyone on how to interpret the false info the manufacturers give us. Let's face it the folks in the Marketing and Sales departments are happy with the status quo because it's easier for them to sell a 750GB hard drive than it is to sell a 698.5GB drive.

Alan Norton
Alan Norton

Western Digital was sued over this issue. What was the final resolution? The Western Digital lawsuit settlement required a disclaimer noting the number of bytes in 1GB or 1TB. Their comment about a baker's dozen is rather amusing: http://en.wikipedia.org/wiki/Binary_prefix#Legal_disputes Seagate was also sued over the same issue: http://blogs.zdnet.com/Ou/?p=850 The general practice in the hard drive industry today is to provide a similar disclaimer noting the number of bytes in 1GB or 1TB.

SirWizard
SirWizard

Just for accuracy: the rough-cut 2 x 4 ends up as 1-5/8 x 3-5/8. We don't want to shortchange the standard any worse than it is already.

Tony Hopkinson
Tony Hopkinson

didn't plane down from 4x2 to achieve those dimensions. :p

Tony Hopkinson
Tony Hopkinson

the units of measure could be freisian heiffers. If you are technical and you don't understand it, wrong career path. If I'm doing a capacity check for 256 ints, I sure as heck am not thinking 1k plus another 24 bytes. Start down route and the code to report the size will fall over... Besides even if you did it with windows, what about other OSes? As mentor of Arisia was want to say. Loose and muddy thinking young Norton...

don.howard
don.howard

Remember when monitors and TVs were sold on a total diagonal measurement, even though part of that produced no image and was actually under the bezel? It is the same thing with hard drives. Pure marketing numbers. Consumers expect some fudging of the numbers, it is just a that ever expanding capacities compound the error. But you know, if they took that 750 GB drive and labeled it as 700 GB, few would care that there are only 698.6 GB on the system. Then they would still have the round numbers for marketing purposes.

oldbaritone
oldbaritone

The typical 9600,n,8,1 transmission lost another 20% of "real" data throughput because of start and stop bits. In this example, each byte sent through the serial line had two additional bits added to the data by the UART, so for every 8 bits of data, there were an additional two bits (20%) of overhead. Those bits were sent through the line, but not really part of the data; they were stripped and discarded upon successful receipt. So the effective data throughput of the line was only 7680 bps. Transmissions with parity were even worse, typically sending 7 data bits with 3 overhead bits (30%) - 9600,e,7,1 - or an effective data rate of 6720 bps. But the vendors wanted people to think it was "fast" - and actually the extra bits WERE being sent, they just weren't worth anything. And boy, did you EVER notice that extra 20% or 30% in the A-J and Novation Cat 103J days! (that's 300 baud to you youngsters...)

QA_In_Vegas
QA_In_Vegas

The 750GB drive (to take your example) would, instead of having 698.5GB of usable space, deliver the full 750GB of space. I know I will sound stupid asking this, so I'll be the first to acknowledge my ignorance, but why can't the oversize the drives so that there's (minimum) 750GB of space (to continue the example) and be honest about it on the box? Okay, so 750GB would really be rounding down (instead of UP!) from what...762GB or something? You think anyone is going to mind getting extra? Did we before? Isn't there SOME reasonable (and yes, cost-effective) way to go just over the stated capacity and sell it at that size? Again, Moore's Law states it will catch up and pass those numbers in time anyway, so why not someone be the first company to say "2TB! REALLY!" and deliver it? I know that between that and one that just says "2TB" which one I'd buy...and for an incremental difference in cost as well...sure...when you need the space, you NEED THE SPACE! :)

SirWizard
SirWizard

Consider the broadcasting industry as another prime example where truth is ignored completely. The great unwashed masses accept standard television broadcasts of movies and series that are chopped into fragments, time compressed, excised of content, and squashed to illegibility in places. A higher standard of false "truth" applies to most cable and satellite television premium feeds that claim to show movies, but do not show them in their complete form. Rather, part of the screen gets obliterated by a logo or obnoxious advertizing, and the closing credits are overrun with loud bombast about other programming. To anyone who thinks that's okay because it's only a small portion of the broadcast, would any sensible person accept similarly adulterated orange juice sold with 4% of the juice obviously replaced by seawater?! A still higher standard of false "truth" applies to such channels as Independent Film Channel (IFC), which claims on its website to show programming "uncut and uncensored" but censors a portion of every movie with an omnipresent IFC logo and interruptions of other text detritus. Even the exalted Turner Classic Movies (TCM), which shows movies "uncut and commercial free" flashes text commercials for their website during every movie in the form of a "TCM.com" text label. ("TCM" would be a logo because that's an initialism for their name, but "TCM.com" is a commercial website.) The "TCM.com" advertisement persists for the entirety of some short features. It's time for a lawsuit against the broadcasters who sell adulterated products.

QA_In_Vegas
QA_In_Vegas

Thanks, Allan. I'll take a read on the articles when time better permits...sounds kinda like reversed-logic on the surface.

fnanfne
fnanfne

Manufacturers should sell their products as their consumers would find it. Instead of advertising their hard drive as 1TB, they should advertise it as 946GB instead of just converting 1TB to bytes. Easy peasy, lemon squeazy.

NickNielsen
NickNielsen

Back in '82, on the way to a new assignment in the USAF, I went to a class on the primary equipment at my new assignment. The equipment was a high-power radio used primarily for voice transmission, but it was "capable of low-speed data transmission at 300 or 600 baud." As I had just worked two years on a system with a standard data rate of 50 baud (and a max of 75!), I found this statement hilarious. etu