Data Centers

Verify integrity with md5deep

Almost everyone who's downloaded software has no doubt come across MD5 sums to verify a package or programs integrity. For those of you who haven't used them the MD5 signature is a 32 character hexadecimal number that is usually displayed next to a filename. While it's theoretically possible that two different files could produce the same hash it is almost impossible for a file to be modified and have its MD5 signature remain unchanged. This makes the crosschecking of MD5 signatures a great way to verify a files authenticity. If you download an executable file from a third party website and it's MD5 signature doesn't match up with the one provided by the developer, don't run it!

It's very easy to generate an MD5 signature of a file using a small program called md5sum which is available for both Windows and Linux.

Usage is simple, to create an MD5 signature for a file:

# md5sum filename > md5sums.md5

Inside md5sums.md5 you will see:

76c6dafd6569222312357fdfdbace3e5 filename

To check the MD5 signature against the file:

# md5sum -c md5sums.md5

filename: OK

This is all very useful for one or two files but the scope of md5sum is limited to files within one directory. Clever scripting could of course overcome this limitation but if somebody else has already done the hard work then why bother! md5deep is a neat little open source tool which can be run in recursive mode allowing it to traverse entire file systems and generate a report including every file in every directory.

There are a few more optional arguments for md5deep, these enable you to include file size data in the output or chose between different match options during verification.

To generate a list of file signatures for a directory and all subdirectories:

# md5deep -r /data0 > data0md5.sum

And then to check integrity later on:

# md5deep -rx data0md5.sum

Which will output a list of files that do not match their MD5 signature in the provided data file.

md5deep is available to run onalmost all platforms including Windows/Linux/BSD.


We are all aware of "hash collisions" with MD5 and other algorithms. I, however, have a different question that I have not yet seen posed. If, instead of just tracking 2 coordinates: (file_name, file_hash) If the world were to, instead, track 3 coordinates: file_name, file_hash, file_length) Then, would all these expressions of "The Sky is Falling !" go away ? In summary, what is the Technical Commentary of Mathematicians who otherwise caution about Hash Collisions, when you suggest comparing the 3 coordinates I described (not just the 2) ? Mathematically, does additionally tracking/publishing the length of the file greatly reduce the threat of hash-collisions ?


Thanks. Great way to know a file has *not* changed or to know when it did. VMS (1980') used to keep versions which was not the whole file but a base and those commands to change from one version to the next. Hence you also had an edit history. Sincerly anthoer old dinosaur us PCs :o)


For those with a gui affliction, you can use a product called MD5Summer on Windows. You can find it at


MD5sum is broken. Now Unix communities use sha1sum to replace it