Consider a simple Web
form that asks users for their age and e-mail address, then saves the data into
a database when they submit the form. Most genuine users will enter correct
data in these fields, it will go to your database, and everything will be hunky-dory.
But a single malicious user or prankster might enter a string into the age
field, and an incomplete e-mail address into the address field. If you don’t
have a data validation routine guarding the entrance to your database, this
incorrect data might get saved and cause you all manner of heartache later
(can’t you just see MySQL spitting out “illegal data type” errors?).

Input validation is a necessary
safeguard to the integrity of your application database. If you use Perl, this
task is made simpler by the CGI::Validate module, which provides a bunch of
built-in methods to verify the data entered by the user.

Getting started

If you’ve used
Getopt::Long before, you know it can be used to read and validate command-line
arguments to a Perl script. The CGI::Validate
module
brings this capability to the Web, combining the parsing and
validation routines of Getopt::Long with the methods in the CGI module.

You can set rules for
each form field, defining its data type, and whether it is required or
optional. Values which do not match these predefined rules are flagged and
placed in an exception list; which can be used to display an error page.

Detailed installation
instructions are provided in the download archive, but by far the simplest way
to install it is to use the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install CGI::Validate

If you use the CPAN
shell, dependencies will be automatically downloaded for you (unless for some
strange reason you’ve set your shell to ignore dependencies). If you manually
download and install the package, you may need to download and install all
required dependencies before CGI::Validate can be installed.

A simple example

Once you have the
package installed, create the following sample form in your sandbox area under
your Web server’s document root:

<html><head></head>
<body>

<form action=”/cgi-bin/validate.cgi” method=”post”>
Enter your name <input type=”text” name=”name”>
age <input type=”text” name=”age” size=”2″ maxlength=”2″> and
email address <input type=”text” name=”email”>
<input type=”submit”  value=”Validate”>
</form>

</body></html>

Notice that it includes
three separate data types: string, integer and e-mail address (this last one is
not a traditional data type, but because it’s one of the most commonly tested
patterns on the Web, CGI::Validate treats it as though it were a distinct type).

Suppose we want to
make all three fields required. If the form is submitted with any of its fields
empty, or with any field containing an invalid datatype, form processing should
stop and an error message should be generated asking the user to correct the bad
values.

Listing A is the Perl script that implements these requirements through CGI::Validate.
Remember to save the script under your Web server’s CGI-BIN directory and to
give it execute permission, or else it won’t work correctly.

This script is
somewhat involved, so let’s take it step by step. Right at the top, I’ve imported
the CGI::Validate module and (since it inherits from the base CGI module)
created a CGI() object. This CGI() object provides built-in methods to send the
HTML page header and initial HTML declarations.

Once the basic page
framework is ready, you invoke the GetFormData() method to validate the data
entered into the form. This method forms the core of CGI::Validate’s
functionality, and it accepts a hash of form variable names and type
identifiers. This hash lays down the basic validation rules for the form data.

Each key of the hash
consists of a form variable name, followed by a specifier. The specifier uses
an equality symbol (=) to indicate that the field is required, or a colon (:)
to indicate that it is optional.

Following this comes a
type specifier, which can be any one of six possible values: “s”
(string), “i” (integer), “f” (float), “w” (word),
“e” (e-mail address) or “x” (user-defined type).

The corresponding hash
value is a reference to a Perl variable that will hold the input value for that
field. This variable can be used to access the value further along in the
script. Thus, the key-value pair

(“pin=i” => \$pin)

would imply that the
form variable “pin” is required, must be an integer value, and will
be assigned to the Perl scalar $pin.

Note: In the absence
of any type rules, CGI::Validate assumes “:s”—that the value is an
optional string.

Now let’s look at how
you handle validation
errors with CGI::Validate
.

Handling validation errors

Errors that occur when
validating the form get stored in one of four global hashes, named %Missing,
%Invalid, %Blank, and %InvalidType (see the script in Listing A for what each one
catches). Errors also get stored in a catch-all global error variable, $Error.
By checking for this error variable, you can find out if any errors occurred
during validation, and then obtain the error messages from the four global
hashes.

That’s what the if()
tests in Listing A do—they check for errors and print the corresponding error
messages.

To see how this works
in practice, submit the form with no name and e-mail address, and with a string
assigned to the age field. Here is the error message you should see:

Could not process form because of the following errors:
     * Blank form elements: email name
     * Invalid data types for fields: age

If you add a new field
to the form—say, a hidden field called userid—and
submit it to the same script without first telling CGI::Validate about the
field, you’ll see an error message like this:

Could not process form because of the following errors:
     * Invalid form elements: userid

If no errors occur, it
means that the data was valid and can be used for further processing.

If you’d like to have
CGI::Validate *not* return an error in the event of a mismatch between the
actual form fields and the variables listed in GetFormData(), tell it so with
the following line of code:

$CGI::Validate::IgnoreNonMatchingFields = 1;

Dealing with arrays

You can also use
CGI::Validate with multiple-select form fields, by reading them into an array.
Take the following simple form,

<html><head></head>
<body>

<form action=”/cgi-bin/validate.cgi” method=”post”>
Pick your favourite colors:
<select name=”colors” multiple>
            <option
value=”strawberry”>Strawberry</option>
            <option
value=”orange”>Orange</option>
            <option
value=”azure”>Azure</option>
            <option
value=”chrome”>Chrome</option>
            <option
value=”grape”>Grape</option>
</select>
<input type=”submit” value=”Validate”>
</form>

</body></html>

and its validating
script in Listing B.

In this case, the
GetFormData() function checks to see if one or more elements of the
multi-select form field have been selected, and reads them into the @colors
array. This array is then processed with a foreach() loop.

In the event that no
value is selected, an error will be generated, as the field is explicitly
marked as a required field with the equality (=) symbol.

Using custom data types

If the five built-in
data types are too primitive for your needs, CGI::Validate also allows you to
define a custom data type with its special “x” type specifier. To
make it work, you need to call the special addExtensions() function and pass it
two values as a hash—a name for your custom data type, and the conditional test
to use when validating it. You can then use your custom data type in the GetFormData()
function simply by calling the type by its name with the prefix “x”
before it.

Let’s look at an
example:

<html><head></head>
<body>

<form action=”/cgi-bin/validate.cgi” method=”post”>
Pick a number: <input type=”text” name=”num”
size=”4″>
<input type=”submit” value=”Validate”> </form>

</body></html>

Suppose you want to restrict
the input values between 1 and 999. Take a look at the code in Listing C. Here, I’ve created a new
data type called Range, and specified
that the values must lie between 1 and 999, both inclusive. This information is
passed to CGI::Validate via the addExtensions() function.

Once the type has been
defined, I can use it in the normal way, by naming it inside the GetFormData()
function. Note that since this is a user-defined type, the name must be
prefixed with an “x”.

To see how it works,
try entering a number into the form. If it’s over 999 or below 1, you should see
the following error:

Could not process form because of these errors:
Invalid data types for fields: num

Using CGI::Validate
can significantly reduce the amount of time you spend on creating custom input
validation routines. Make it a part of your standard Web development toolkit,
and save yourself some time the next time you have forms to validate!