Software Development

Use cryptographic hashes for validation

You can use cryptographic hash functions to provide a little more security when exchanging files.

A lot of functionality of software that helps us maintain secure computing environments depends on cryptographic hash functions. The idea behind the design of most cryptographic hash functions is to use a block cipher algorithm that, given an input string, produces a different output string that is unique to that input. For instance, if you input the string "Keep it simple!" and get "a" as your output, this is only really useful if no other input string produces "a" as its output.

The output of a cryptographic hash function is of fixed length: no matter how long the input string, whether three characters long or three million, the output string will always be the same length. This output should, as already mentioned, be unique to the input so that changing a single character produces a different output string (known as a "hash" or "checksum"). Furthermore, predicting the output hash of a given string should be effectively impossible.

These cryptographic hash functions serve myriad purposes. They are often used for password authentication without having to handle plaintext passwords while validating the user's input, for instance. They are also often used to quickly and easily ensure that a software download wasn't corrupted in transit or compromised by a malicious security cracker before it got to you. The hash of a message is typically what actually gets "signed" when one digitally signs an email with an OpenPGP tool, too, for performance purposes among others.

The most widely used and well-known cryptographic hash functions are probably MD5 and SHA-1. Significant weaknesses in both the MD5 and SHA-0 algorithms were discovered in 2005. SHA-1 is a strengthened version of the SHA-0 algorithm, but the SHA-0 weakness suggests a weakness in SHA-1 as well. Many people still consider MD5 and SHA-1 strong enough that, in combination with their ubiquitous availability, they are even today used quite often. More security-aware parties try to stick to stronger algorithms, such as SHA-256, which seems to be free from the weaknesses that plague SHA-0 and SHA-1.

For some low-priority, relatively uncritical purposes, there is no problem with using MD5 or SHA-1 for generating cryptographic hashes, also known as "digests". Most Unix-like operating systems include utilities for generating and comparing MD5 and SHA-1 hashes with their default, core toolsets. Similar utilities are available for less Unix-like OSes such as Apple MacOS X and even Microsoft Windows. Most high-level programming languages include such functionality in their standard libraries as well, including Perl, PHP, Python, and Ruby.

Additional libraries provide stronger cryptographic hash function capabilities to all the more common, modern programming languages, including not only the dynamic languages mentioned above but lower-level and statically typed languages such as C, C++, Java, and C# (and other .NET languages).

When downloading software for use on your system, you should favor stronger cryptographic hash algorithms to check whether the downloaded files are unmolested and uncorrupted, and if at all possible you should refuse to use software without having access to a checksum for which you can get your hands on the appropriate hashing utility.

More importantly, if you offer software and other files that may be subject to corruption or compromise as downloads, there is no excuse for failing to provide a good checksum using a common cryptographic hash utility. A simple Perl or Ruby script, for instance, doesn't need such validation as much as a compiled C program of course, because it's just human readable, plain text characters -- but larger, more complex files and file formats such as executable binaries, jar files, and OpenOffice.org word processor documents should always be checksummed for the protection of the recipients of these files.

Cryptographic hash functions won't ensure that downloaded software is necessarily safe to use, of course. They will, however, ensure that it is at least as trustworthy as the person who created the file and the hash.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

5 comments
apotheon
apotheon

Since today is Thanksgiving, and almost nobody in the US (which I assume is home to most of my readership) will want to divert too much blood from their stomachs to their brains, I decided to keep today's article pretty simple. Next week, maybe I'll take a whack at explaining how one might use cryptographic hash functions to develop simple security related software.

apotheon
apotheon

MD5 certainly is far from perfect, but it's better than nothing -- and since I'm talking about validating via whatever validation mechanism others make available, among other things, I figured it'd be remiss of me to fail to mention it. In my follow-up article, directed more at programmers who might use cryptographic hash libraries, I might mention the desirability of using something other than MD5 in particular.