Professor Ronald Rivest of MIT created the MD5 cryptographic hash function in 1991 to replace the earlier MD4 algorithm. It employs a 128-bit hash value, typically expressed as a 32-character hexadecimal number. For instance, an MD5 hash generated from an OpenOffice.org download (v2.3.0 for Win32, English language) looks like this:
beda08800f9505117220b6db1deb453a
Since that time, MD5 has become an Internet standard (see RFC 1321 for details), and has come to be used for a great many purposes. While I am not aware of any statistical studies that support or dispute this, I believe the two most common uses are:
- hash comparison for password authentication
- hash comparison to verify file integrity
In either case, the MD5 algorithm is used to generate a hash value from the known good data — either the original password in the first case or the original file in the latter case. For password authentication, then, whenever the password is entered by someone attempting to log in, a hash is generated from the entered password and compared against the stored hash. If they match, authentication is determined by the system to be successful. For file integrity verification, such as when downloading an application installer, there is often an MD5 hash (often called a “checksum”) provided along with the download. To verify the file is the original, uncorrupted file you wanted, generate a new hash from the file and compare it against the MD5 hash provided with the download.
There are at least a couple of reasons to verify the integrity of a software download, such as with an MD5 hash:
- The file may have been corrupted during download, such as by lost packets if there is significant network latency.
- It’s always a good idea to make sure someone has not somehow arranged for your download to be compromised so that you get a modified or different file that can be used to crack security on your computer when executed.
When working with the software management system of most open source Unix-like OSes, such as portupgrade for FreeBSD or APT for Debian GNU/Linux, it should handle hash comparisons for you automatically, behind the scenes. That is one of the reasons for a modern software management system: it simplifies the end-user’s part of the process of making sure that software installation is as secure as it reasonably can be.
If you are a developer, an alpha tester, or a user of an OS that does not provide this sort of protection for most software installation, you may find you need to install software that is not handled by a software management system. In such cases, it is still (at least usually) a good idea to verify hashes to make sure you are getting exactly what you expect.
The OpenOffice.org website provides some instructions on how you can verify MD5 hashes on a variety of platforms. As of this writing, it provides instructions for verification using the MD5 Hash Tool extension for the Firefox browser regardless of OS, the digestIT tool for MS Windows, and the md5sum command line tool for Linux systems.
A command that exists on BSD Unix systems like FreeBSD is simply called md5, and it works in much the same way as md5sum on Linux systems like Debian GNU/Linux. An example of generating an MD5 hash from a file called “test.txt” follows, where > is the shell’s command prompt:
> md5 test.txt
MD5 (test.txt) = d76b04fbbf392f6917e119bedf78d2ef
As you can see by comparing this with the OpenOffice.org Using MD5 Checksums page, the FreeBSD md5 utility can be used the same way as the Linux md5sum utility. The only difference is the format of its output.
While MD5 is not the strongest cryptographic hash tool in the world these days, it is still generally useful for verifying file integrity when downloading software. Because so many open source software development projects use MD5 hashes for verification, it is a good idea to learn how to use it and keep an MD5 hash generating tool handy if you ever need to go outside of a secure software management system when installing software.