A very common task when building Web sites involves
validating user-supplied e-mail addresses. This is of particular importance to
sites which require a valid e-mail address for transactions—e-commerce sites,
Web mail sites, mailing lists and so on.
If your Web site uses PHP,
however, you’re in luck. It’s extremely easy to validate user-supplied e-mail
addresses in PHP, thanks largely to a very powerful regular
expression engine built into the language. In this article, I’ll
demonstrate how easy it is.
To begin, assume you have the following Web form, which asks
the user to enter an e-mail address. (Listing
A)
Listing A
<html>
<head></head>
<body>
<form action=”validate.php” method=”post”>
Enter e-mail address: <input type=”text” name=”e-mail”>
</body>
</html>
As the code above shows, this form is submitted to the PHP
script validate.php. Assuming the
e-mail address is an important input into the next transaction, it’s very
important to verify that it is valid before using it.
The best way to accomplish this is with a regular
expression, which checks the format of the e-mail address and ensures that it
conforms to the standard format of user@domain.ext. Here’s an example (Listing B):
Listing B
<?php
// check e-mail address
// display success or failure message
if (!preg_match(“/^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]
+)+/”, $_POST[‘e-mail’])) {
die(“Invalid e-mail address”);
}
echo “Valid e-mail address, processing…”;
?>
Try it for yourself and see. The script will flag all e-mail
addresses that are not in the format user@domain.ext. This is accomplished with the regular
expression /^([a-zA-Z0-9])+@([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+/. Let’s look
at each bit of it in detail:
- The caret (^) indicates
the beginning of the string. - The expression ([a-zA-Z0-9])+ indicates the
range of allowed characters for the user part of the e-mail address. The plus (+) symbol appended to the end of
this range indicates that at least one character from this range is mandatory. - The @ symbol is
exactly what it looks like—the literal @ symbol used
in an e-mail address - The expression ([a-zA-Z0-9_-])+(\.[a-zA-Z0-9_-]+)+ represents
the domain.ext
part of the e-mail address. Notice that the first range does not include a
period (.), while the second one does. This is to ensure that the
domain part of the address contains at least one character. Again, the plus(+) symbols scattered throughout
the pattern indicate that at least one valid character is required.
Of course, this expression isn’t perfect—it will fail on
addresses in the format first.last@domain.ext,
and pass invalid domain extensions. You can tighten up the regular expression a
little, by allowing periods in the username part and restricting the length of
the domain extension. Here’s an example (Listing
C):
Listing C
<?php
// check e-mail address
// display success or failure message
if (!preg_match(“/^([a-zA-Z0-9])+([\.a-zA-Z0-9_-])*@([a-zA-Z0-9_-
])+(\.[a-zA-Z0-9_-]+)*\.([a-zA-Z]{2,6})$/”, $_POST[‘e-mail’])) {
die(“Invalid e-mail address”);
}
echo “Valid e-mail address, processing…”;
?>
Some of the interesting enhancements here are:
-
The username part of the e-mail address now has two ranges,
one containing alphabetic, numeric and dash characters and the other also
supporting periods (.). This allows
usernames of the form first.last@domain.ext. -
The extension part of the e-mail address, ([a-zA-Z]{2,6}),
now has
a size specifier enclosed in curly braces. This forces the extension to be
between 2 and 6 characters long. All currently valid domain extensions fall
within this range.
Caution: Obviously,
this restriction, while reducing the incidence of too-long or too-short domain
extensions, doesn’t solve the problem entirely; users can still input invalid
extensions between 2-6 characters long. This can be rectified by replacing the
final part of the expression with a rigid list of valid domains (it hasn’t been
done here because it significantly increases the length and processing
efficiency of the expression). -
The dollar ($) symbol is
the end-of-string delimiter.
These are just two examples of regular expressions you can
use to validate e-mail addresses. Many more variants exist, each with its own
advantages and drawbacks. Remember that, given efficiency constraints, no
pattern is completely foolproof, and so you should choose a pattern that has an
appropriate combination of rigidity and performance for your needs. Happy
coding!