A very common task when building Web sites involves
validating user-supplied e-mail addresses. This is of particular importance to
sites which require a valid e-mail address for transactions—e-commerce sites,
Web mail sites, mailing lists and so on.

If your Web site uses PHP,
however, you’re in luck. It’s extremely easy to validate user-supplied e-mail
addresses in PHP, thanks largely to a very powerful regular
expression engine
built into the language. In this article, I’ll
demonstrate how easy it is.

To begin, assume you have the following Web form, which asks
the user to enter an e-mail address. (Listing

Listing A

<form action=”validate.php” method=”post”>
Enter e-mail address: <input type=”text” name=”e-mail”>

As the code above shows, this form is submitted to the PHP
script validate.php. Assuming the
e-mail address is an important input into the next transaction, it’s very
important to verify that it is valid before using it.

The best way to accomplish this is with a regular
expression, which checks the format of the e-mail address and ensures that it
conforms to the standard format of user@domain.ext. Here’s an example (Listing B):

Listing B

// check e-mail address
// display success or failure message
if (!preg_match(“/^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]
+)+/”, $_POST[‘e-mail’])) {
    die(“Invalid e-mail address”);
echo “Valid e-mail address, processing…”;

Try it for yourself and see. The script will flag all e-mail
addresses that are not in the format user@domain.ext. This is accomplished with the regular
expression /^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/. Let’s look
at each bit of it in detail:

  • The caret (^) indicates
    the beginning of the string.
  • The expression ([a-zA-Z0-9])+ indicates the
    range of allowed characters for the user part of the e-mail address. The plus (+) symbol appended to the end of
    this range indicates that at least one character from this range is mandatory.
  • The @ symbol is
    exactly what it looks like—the literal @ symbol used
    in an e-mail address
  • The expression ([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+ represents
    the domain.ext
    part of the e-mail address. Notice that the first range does not include a
    period (.), while the second one does. This is to ensure that the
    domain part of the address contains at least one character. Again, the plus(+) symbols scattered throughout
    the pattern indicate that at least one valid character is required.

Of course, this expression isn’t perfect—it will fail on
addresses in the format first.last@domain.ext,
and pass invalid domain extensions. You can tighten up the regular expression a
little, by allowing periods in the username part and restricting the length of
the domain extension. Here’s an example (Listing

Listing C

// check e-mail address
// display success or failure message
if (!preg_match(“/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-
])+(\.[a-zA-Z0-9_-]+)*\.([a-zA-Z]{2,6})$/”, $_POST[‘e-mail’])) {
    die(“Invalid e-mail address”);
echo “Valid e-mail address, processing…”;

Some of the interesting enhancements here are:

  • The username part of the e-mail address now has two ranges,
    one containing alphabetic, numeric and dash characters and the other also
    supporting periods (.). This allows
    usernames of the form first.last@domain.ext.
  • The extension part of the e-mail address, ([a-zA-Z]{2,6}),
    now has
    a size specifier enclosed in curly braces. This forces the extension to be
    between 2 and 6 characters long. All currently valid domain extensions fall
    within this range.
    Caution: Obviously,
    this restriction, while reducing the incidence of too-long or too-short domain
    extensions, doesn’t solve the problem entirely; users can still input invalid
    extensions between 2-6 characters long. This can be rectified by replacing the
    final part of the expression with a rigid list of valid domains (it hasn’t been
    done here because it significantly increases the length and processing
    efficiency of the expression).
  • The dollar ($) symbol is
    the end-of-string delimiter.

These are just two examples of regular expressions you can
use to validate e-mail addresses. Many more variants exist, each with its own
advantages and drawbacks. Remember that, given efficiency constraints, no
pattern is completely foolproof, and so you should choose a pattern that has an
appropriate combination of rigidity and performance for your needs. Happy