Validating form input in Perl with CGI::Validate

Input validation is a necessary safeguard to the integrity of your application database. If you use Perl, this task is made simpler by the CGI::Validate module, which provides a bunch of built-in methods to verify the data entered by the user.

Consider a simple Web form that asks users for their age and e-mail address, then saves the data into a database when they submit the form. Most genuine users will enter correct data in these fields, it will go to your database, and everything will be hunky-dory. But a single malicious user or prankster might enter a string into the age field, and an incomplete e-mail address into the address field. If you don't have a data validation routine guarding the entrance to your database, this incorrect data might get saved and cause you all manner of heartache later (can't you just see MySQL spitting out "illegal data type" errors?).

Input validation is a necessary safeguard to the integrity of your application database. If you use Perl, this task is made simpler by the CGI::Validate module, which provides a bunch of built-in methods to verify the data entered by the user.

Getting started

If you've used Getopt::Long before, you know it can be used to read and validate command-line arguments to a Perl script. The CGI::Validate module brings this capability to the Web, combining the parsing and validation routines of Getopt::Long with the methods in the CGI module.

You can set rules for each form field, defining its data type, and whether it is required or optional. Values which do not match these predefined rules are flagged and placed in an exception list; which can be used to display an error page.

Detailed installation instructions are provided in the download archive, but by far the simplest way to install it is to use the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install CGI::Validate

If you use the CPAN shell, dependencies will be automatically downloaded for you (unless for some strange reason you've set your shell to ignore dependencies). If you manually download and install the package, you may need to download and install all required dependencies before CGI::Validate can be installed.

A simple example

Once you have the package installed, create the following sample form in your sandbox area under your Web server's document root:


<form action="/cgi-bin/validate.cgi" method="post">
Enter your name <input type="text" name="name">
age <input type="text" name="age" size="2" maxlength="2"> and
email address <input type="text" name="email">
<input type="submit"  value="Validate">


Notice that it includes three separate data types: string, integer and e-mail address (this last one is not a traditional data type, but because it's one of the most commonly tested patterns on the Web, CGI::Validate treats it as though it were a distinct type).

Suppose we want to make all three fields required. If the form is submitted with any of its fields empty, or with any field containing an invalid datatype, form processing should stop and an error message should be generated asking the user to correct the bad values.

Listing A is the Perl script that implements these requirements through CGI::Validate. Remember to save the script under your Web server's CGI-BIN directory and to give it execute permission, or else it won't work correctly.

This script is somewhat involved, so let's take it step by step. Right at the top, I've imported the CGI::Validate module and (since it inherits from the base CGI module) created a CGI() object. This CGI() object provides built-in methods to send the HTML page header and initial HTML declarations.

Once the basic page framework is ready, you invoke the GetFormData() method to validate the data entered into the form. This method forms the core of CGI::Validate's functionality, and it accepts a hash of form variable names and type identifiers. This hash lays down the basic validation rules for the form data.

Each key of the hash consists of a form variable name, followed by a specifier. The specifier uses an equality symbol (=) to indicate that the field is required, or a colon (:) to indicate that it is optional.

Following this comes a type specifier, which can be any one of six possible values: "s" (string), "i" (integer), "f" (float), "w" (word), "e" (e-mail address) or "x" (user-defined type).

The corresponding hash value is a reference to a Perl variable that will hold the input value for that field. This variable can be used to access the value further along in the script. Thus, the key-value pair

("pin=i" => \$pin)

would imply that the form variable "pin" is required, must be an integer value, and will be assigned to the Perl scalar $pin.

Note: In the absence of any type rules, CGI::Validate assumes ":s"—that the value is an optional string.

Now let's look at how you handle validation errors with CGI::Validate.

Handling validation errors

Errors that occur when validating the form get stored in one of four global hashes, named %Missing, %Invalid, %Blank, and %InvalidType (see the script in Listing A for what each one catches). Errors also get stored in a catch-all global error variable, $Error. By checking for this error variable, you can find out if any errors occurred during validation, and then obtain the error messages from the four global hashes.

That's what the if() tests in Listing A do—they check for errors and print the corresponding error messages.

To see how this works in practice, submit the form with no name and e-mail address, and with a string assigned to the age field. Here is the error message you should see:

Could not process form because of the following errors:
     * Blank form elements: email name
     * Invalid data types for fields: age

If you add a new field to the form—say, a hidden field called userid—and submit it to the same script without first telling CGI::Validate about the field, you'll see an error message like this:

Could not process form because of the following errors:
     * Invalid form elements: userid

If no errors occur, it means that the data was valid and can be used for further processing.

If you'd like to have CGI::Validate *not* return an error in the event of a mismatch between the actual form fields and the variables listed in GetFormData(), tell it so with the following line of code:

$CGI::Validate::IgnoreNonMatchingFields = 1;

Dealing with arrays

You can also use CGI::Validate with multiple-select form fields, by reading them into an array. Take the following simple form,


<form action="/cgi-bin/validate.cgi" method="post"> Pick your favourite colors:
<select name="colors" multiple>
            <option value="strawberry">Strawberry</option>
            <option value="orange">Orange</option>
            <option value="azure">Azure</option>
            <option value="chrome">Chrome</option>
            <option value="grape">Grape</option>
<input type="submit" value="Validate">


and its validating script in Listing B.

In this case, the GetFormData() function checks to see if one or more elements of the multi-select form field have been selected, and reads them into the @colors array. This array is then processed with a foreach() loop.

In the event that no value is selected, an error will be generated, as the field is explicitly marked as a required field with the equality (=) symbol.

Using custom data types

If the five built-in data types are too primitive for your needs, CGI::Validate also allows you to define a custom data type with its special "x" type specifier. To make it work, you need to call the special addExtensions() function and pass it two values as a hash—a name for your custom data type, and the conditional test to use when validating it. You can then use your custom data type in the GetFormData() function simply by calling the type by its name with the prefix "x" before it.

Let's look at an example:


<form action="/cgi-bin/validate.cgi" method="post">
Pick a number: <input type="text" name="num" size="4">
<input type="submit" value="Validate"> </form>


Suppose you want to restrict the input values between 1 and 999. Take a look at the code in Listing C. Here, I've created a new data type called Range, and specified that the values must lie between 1 and 999, both inclusive. This information is passed to CGI::Validate via the addExtensions() function.

Once the type has been defined, I can use it in the normal way, by naming it inside the GetFormData() function. Note that since this is a user-defined type, the name must be prefixed with an "x".

To see how it works, try entering a number into the form. If it's over 999 or below 1, you should see the following error:

Could not process form because of these errors:
Invalid data types for fields: num

Using CGI::Validate can significantly reduce the amount of time you spend on creating custom input validation routines. Make it a part of your standard Web development toolkit, and save yourself some time the next time you have forms to validate!

Editor's Picks

Free Newsletters, In your Inbox