Data Management

Always check for whitespace

Last month Justin talked about his quest for fundamental programming theory resources. One fundamental I keep running into is people who forget to check for whitespace — or don't even know how to check for whitespace.

I can't tell you the number of times I've seen some poor engineer spend countless hours trying to debug a problem that came down to extraneous whitespace. I don't care what programming language you're using, the string " hello " will not equal the string "hello". The little extra whitespace before and after doesn't affect the readability to us humans but is generally fatal to string comparison operators.

For some reason, the biggest culprit I've found is extra whitespace stored in a database string. I think it happens more often there because when you query a table in a tool like TOAD, the result comes back in a neat little spreadsheet-type display. It's difficult to tell at a glance that you might have some extra trailing whitespace on some of the data.

The simplest trick I've found (which I use pretty much everywhere) is to wrap the string in some special characters like square brackets; that way, I can see when there's extra whitespace. Now I am primarily an Oracle developer these days, so the syntax for that might look something like this:

select '[' || keyword || ']' from keyword_table;

This wraps square brackets around every keyword so I can tell at a glance that there's extra space. I assume there's similar syntax in other variants of SQL.

If the strings don't have internal whitespace, you could also just search for anything that has a space in it — though this method only catches regular spaces and would miss stuff like tabs. Plus, it could be really slow since it might result in a straight old-fashioned table scan:

select keyword from keyword_table where keyword like '% %';

Whenever some engineer comes and asks me to help debug why their system isn't logging into the database correctly, the first thing I do is check for whitespace in the login credentials they're using. (The second, of course, is to check the credentials themselves.) I've found that erroneous whitespace is the culprit more often than you might think. 

Editor's Picks