If you’re coding in Perl, it’s pretty obvious when there’s
an error in your code — the parser will spew all kinds of error messages over
your screen, alerting you to the problem and letting you take immediate action
to fix it. If you’re developing HTML pages, though, no such early warning
system exists — any error in your markup will usually be silently ignored by
the browser. Worse, some browsers even attempt to “automatically”
correct for common markup errors, introducing a whole set of new problems in
the process.

The simplest solution, then, is to check (or
“lint”) your HTML before putting it live. And that’s where a useful
little CPAN module called HTML::Lint comes in.
This Perl module, built atop the popular HTML::Parser module, is designed to verify your
markup against W3C standards and point out errors that could cause it to
“break” or render badly in client browsers.

This document explores some of HTML::Lint’s capabilities,
using it to check HTML pages and display the errors it finds. To begin with,
download and install the module (if you don’t already have it) by running the
following commands at your Perl prompt:

perl> perl -MCPAN -e “install HTML::Lint”

Linting Files

Once you’ve got it installed, create and save the following HTML
file (as abc.html):

<html>
<head></head>
A is for apple, B is for baby
</body>
</html>

As you can see, this file has a deliberate error — the
opening <body> tag is
missing. It’s pretty obvious here, but if you had a larger and more complex
file, such missing tags would be harder to detect. That’s why the next step is
to write some Perl code to detect this error using HTML::Lint.

Create and save the following script (as linter.pl):

#!/usr/bin/perl

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file(“abc.html”) or die(“Cannot find file!”);

# check for errors
($lint->err) ? print “Your code stinks!” : print “Your code rocks!”;

This is pretty simple: the script initializes a HTML::Lint object and
then uses the object’s parse_file() method to parse the HTML file created
previously. Errors, if any, are stored in the @err array and an error message is printed
to the console.

Here’s the output you might see:

shell> ./linter.plYour code stinks!

Of course, this is somewhat impractical if you have a large
number of files to lint. In such cases, you’d want to pass the HTML file name
and path to the script at run-time, rather than hard-coding it into the script.
Listing A is a revision of the previous script which lets you do just this.

Listing A

#!/usr/bin/perl

# read file name from command line
if (!$ARGV[0]) { die (“ERROR: No file name provided”); }

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file($ARGV[0]) or die(“ERROR: Cannot find file”);

# check for errors
($lint->err) ? print “Your code stinks!\n” : print “Your code rocks!\n”;

# print error count
print “Errors found: “, scalar($lint->err);

In this case, the script expects a file path as the first
argument to the script; this is stored in the special Perl @ARGV array. The script then looks for this
file, parses it and displays a message depending on whether or not it found an
error. The last line in the script is new: it prints a count of the errors
found by the parser, based on the size of HTML::Lint’s @err error array.

And here’s how to use it:

shell> ./linter.pl /tmp/abc.html
Your code stinks!
Errors found: 1

Handling errors

While it should now be clear how HTML::Lint can find
errors in your code, there’s still one problem — printing messages about how
much your code stinks is amusing, but not really helpful in diagnosing the
problem. What you’d really like are detailed error messages that indicate both
the nature of the error and the line number on which they occurred.

Fortunately, HTML::Lint stores this
information in the @err array, and
it’s easy to extract and display it. The next example (Listing B) builds on the
previous one to display more detailed error information.

Listing B

#!/usr/bin/perl

# read file name from command line
if (!$ARGV[0]) { die (“ERROR: No file name provided”); }

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file($ARGV[0]) or die(“ERROR: Cannot find file”);

# check for errors
($lint->err) ? print “Your code stinks!\n” : print “Your code rocks!\n”;

# print detailed error list
foreach $e ($lint->err) {
      print $e->where(), “: “, $e->errtext() , “\n”;
}

In this case, the @err array is
processed using a foreach() loop. Each individual error object is
then extracted and the location and detailed error information is printed using
the object’s where() and errtext() methods.

Here’s an example of what you might see:

shell> ./linter.pl /tmp/abc.html
Your code stinks!
(4:1): </body> with no opening <body>

Of course, you can modify the script above to lint multiple
files at once, log errors to a log file instead of displaying them, or even
filter out all but the most critical errors. For examples of these and other
tricks, visit the HTML::Lint page. Take a look
when you have a minute, and happy coding!