The safest way to sanitize input: avoid having to do it at all

Sanitizing user input is a critical part of secure software development, but software can be made more secure by avoiding having to sanitize input altogether.

Last weekend, I finished some work on a small development project for a new client. Thanks to that, I found myself in a frame of mind that gave me the urge to write more code. I turned that to good use by finally getting to work on the long neglected task of writing a new contact page back end for my professional Website. I had a contact page there, of course, but it was essentially an ugly hack of a contact page back end I had written in PHP for a completely different Website a few years ago. Worse, it bore no resemblance to the rest of the site, and I had not bothered to give it a navigation element (i.e., what users tend to call a "menu").

The back end for this contact page would be written in Ruby, of course. I wrote out a rather pretty script, if I do say so myself, that made use of the TMail library, a tool that abstracts away a lot of the behind-the-scenes drudgery of specifying email headers and content and preparing it for transmission using SMTP. I wrote it such that it would work equally well from the browser and the command line. Then, I tested it.

It worked brilliantly on my laptop when I executed it from the shell. It worked brilliantly on the server when I executed it from the shell, too. It failed utterly when I tried entering the URL for it in the browser. I spent entirely too long beating my head against the intractable problem of getting it working from the browser. I also tried RubyMail, an alternative to TMail, and ran into the same problems. As it turns out, someone saw fit to change some configuration option on the Webserver so that installing and using gems -- that's the term for Ruby libraries and utilities packaged up for use through Ruby's own software management system -- no longer works. This is the sort of thing that makes me think I should get myself a virtual server account for my professional Website, but it's difficult to justify spending more than required by a shared hosting account considering the rather minimal technical requirements I have for the site.

I won't get into the sordid details of how exactly Ruby gems no longer install and work properly on my shared hosting account. After a day plus of finding out that no amount of finesse, dirty hackery, or pleading with the server on my hands and knees would do any good, I gave up. Ruby, like many high level dynamic languages that are either interpreted or JIT compiled, provides extremely easy to use functionality for accessing other programs through outside shell processes. For instance, if you want to access the mail command from within Ruby, you can just wrap the command in backticks or execute it by way of the Kernel#system or Kernel#exec method, each of which provides subtly (but importantly) different functionality.

It's surprisingly easy to send a command to the shell via one of these methods. I chose backticks. I started out writing something like this for the actual mail sending code:

`echo "#{body}" | mail -s "#{subject}" #{recipient}`

In that one line of code, I had a way to cram the body of a message (stored in the body variable) into an email that would arrive in my inbox with a subject that had been stored in the subject variable. My email address (since this was for a contact page) was specified by the recipient variable, since the point was to create a way for people to send emails to me without having to make that email address available to the world at large (and to make contact page emails stand out in my inbox). It was all quick, easy, and neat. Now comes the messy part: because the body of the email contained user supplied input, I needed to sanitize it so malicious input wouldn't result in a vulnerability in my code that, oh, deleted everything in my directory on the server, for instance.

I ran around in circles on that for a little while, soliciting some outside help with figuring out how to make sure all the holes were plugged in the output sanitization code (thanks Sterling and ruby-talk). It was a lot of work, but I was making progress. Then, of course, someone on ruby-talk (thanks, James Gray) pointed out the obvious -- I shouldn't send the content of the body variable to the shell at all. What I should do instead is open a pipe to the mail command as an IO stream and just write to it like any other file handle.

The new code, then, ended up looking something like this:

open( %Q{| mail -s "#{subject}" #{recipient} }, 'w' ) do |msg|

msg << body


Perhaps even more important, in terms of handling user input, is the fact that the `subject` and `recipient` variables are set by me, and not by user input.

As a result of taking this approach with both the content and headers of emails handled by this contact page, the need for me to sanitize input before I sent it to the shell simply evaporated. I was no longer sending any user input to the shell with my own code, at all.

The Moral of the Story

I was reminded, in a forehead slapping moment, of one of the cardinal laws of input handling. The first thought that occurs to most people when they think about security and accepting arbitrary input from unknown users -- such as on a Website contact form -- is that all input must be sanitized. That's really a secondary rule, though.

One might get a little closer, in concept, to the first law of sanitizing data by observing the rule that one should not reinvent wheels, when it comes to security especially, without very good reason. In other words, if you have to sanitize data, use someone else's well tested code to do it if you can, because writing it from scratch will be initially prone to error. I would have done just that, if I found something in the core language or standard library for Ruby that would do the kind of input sanitizing I actually needed. Alas, what I need is not something like URL escape characters, which is in the standard library via the CGI module.

The actual first rule of thumb for input sanitizing is in some ways much more obvious than the preceding two guidelines, but simultaneously far less well observed. I, myself, forgot it until a helpful soul on the ruby-talk mailing list reminded me (in a roundabout way) that:

It's safer to write code that doesn't require input sanitizing than to try to sanitize it.

It's not a lesson I'll forget again, any time soon. I'm just glad I didn't learn it the hard way -- by writing broken input sanitizing code and suffering the consequences when I exposed it to the Internet.


Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.


Very Interesting... and simple... and secure :) That is the best way to have it, IMHO, quick, simple, AND secure. Keep it simple enough, like your example, and it helps to minimize the chances of "hey, that's MY code you thief". That is the other edge of using someone else's code instead of re-inventing the wheel, and a secondary reason to examine the given code and its licences.


Not only does it tell me another interesting thing about writing code (I'm just a spectator), but I think the central concept here may be abstracted and put to use elsewhere. Being a spectator, I also found these interesting, although only one of the links is remotely related to this article. Off-topic, one might say. Tokeener case study serves as an example of writing low-defect highly-reliable code researchers claim Secure OS Gets Highest NSA Rating, Goes Commercial - DarkReading


That was my thought, too. I couldn't figure out why, in retrospect, that didn't occur to me in the first place.


This story is an example of the art in the (wannabe) science of programming. With "newer" languages there are so many features that there usually is more than 1 (or 2 or 3 or ...) ways of getting tasks done. Depending on the specifics of the situtation the "best" one may vary, and as shown in this case, one may even be "wrong". I can hardly wait until the "scientific" approach of programming catches up with us "artistic" (...???autistic???... ;-) ) types. The Tokeneer case may be an example of the scientific approach. It certainly is interesting reading. The only other "Zero Defect" coding project I've heard of was the code written for NASA Apollo and Shuttle projects. I also heard that with all of the checking, it cost approx $1,000 (1960's dollars) PER LINE> PS: quoting the 39 lines per day "overall" is not a "fair" measure. On the other hand, 203 lines, better than 3 pages, per day during coding phase is pretty darn good.


I meant to write an article about the Tokeneer thing a while ago, and completely forgot about it. I'll have to look into that and decide whether it's still worth writing -- and, if so, you'll get a look at (some of) my thoughts on it.


I haven't read a lot about it yet, but I'd definitely be interested in reading your thoughts on Tokeneer. (Or your thoughts on Green Hills' Integrity, or the Open Group's SKPP standard.)

Editor's Picks