Nothing is more annoying than clicking a link on a Web site and getting an error indicating the link is invalid. One aspect of regular site maintenance is ensuring links are valid both internally and externally.

Link types

Normally, Web applications contain a vast number of links. These links may go to a resource within the site (internal links) or outside of the current application (external links). In addition, other sites may link to a site. First, I concentrate on links within a site and how these may be located and resolved.

Finding broken links

You may plop down in front of a Web browser and manually try every link within a site. This allows you to find the broken links, but the time involved is not feasible for large applications.

Thankfully, there are a variety of tools available to automate this process, allowing you to concentrate on fixing the problem links. Basically, these tools crawl a site and verify all links found. Options are often included to define: what should be checked, links to ignore, and more. The following list provides a selection of these tools:

  • Xenu Link Sleuth: This is my preferred tool. It is fast and free. It provides great output via a detailed report of problems encountered. In addition to checking links, it verifies a variety of linked resources including images, frames, plug-ins, backgrounds, local image maps, style sheets, scripts, and Java applets.
  • LinkAlarm: This commercial service allows you to check the validity of all links within a site or page. It provides a very detailed report that is color-coded to highlight problems, as well as graphs (and everybody loves a good graph).
  • W3C Link Checker: This online Web application allows you to validate links within a Web application. It dives into a Web resource and provides information on invalid links or error messages.

Once a link is identified as a problem, you must decide how to address it.

Fixing broken links

The error message returned when trying to access a Web resource can reveal a lot about what may be wrong. The following list provides information about error codes that may be returned when attempting to access it via a link:

  • 301: This error says the target resource was permanently moved.
  • 302: This error says the target resource was temporarily moved.
  • 401: This signals an authorization error while trying to access a resource — meaning the resource may require a logon for access.
  • 404: The resource no longer exists, as this error signals the target resource was not found.
  • 408: The request to access the resource timed out.
  • 500: The most common error that is a generic catch-all. It signals there was a problem with the target resource. The platform for the target resource may provide more information.
  • 904: This error signals a bad hostname in the link.

A tool like Xenu Link Sleuth provides the error code returned for a broken link. A timeout error may mean the link is valid but busy when tested — you can retest manually, but the rest of the errors signal the link should be removed or replaced.

When dealing with internal links in an application, you may examine the target resource to identify what problems may exist within the page source code. An error code of 500 with an internal page usually signals a code error, so the error may be resolved with a code fix. The link will be fixed if the target page is fixed, but you will want to disable or remove the link until the problems with the target resource are addressed.

Unfortunately, there is not much you can do when dealing with external links on sites with which you have no control. In these instances, you will need to remove or replace the link to avoid user problems.

Inbound links

The beauty of the Web is the ability to link to other sites. These inbound links from other sites may generate errors as well. This poses a greater threat to potential users or customers that will be quickly turned away when confronted with a broken link on another site. These broken links may be caused by a deleted or renamed page, an old entry in a search index, a bad bookmark, or an incorrect URL.

One way to approach these errors is to set up your Web application to gracefully handle the errors outlined earlier. For example, a custom error page may be created for each error, so the custom page is displayed when/if the error occurs. This custom page can contain a user friendly message, as well as valid links within the application. A good example is creating a custom 404 page to circumvent situations where a linked resource no longer exists. The setup for such pages will depend on the Web application platform.

Another way to address errors is by implementing redirects that automatically send a user to another site resource when an error comes up. Again, the setup and usage of redirects depends upon your platform.

Lastly, you may try to keep an eye on external sites with broken links through tools like the Google Webmaster Tools, which include the ability to crawl error sources and view sites with invalid links.

No time to relax

Pushing a site to production does not give you any time to relax, as regular maintenance must be performed to keep the site up and available. One part of regular maintenance should be link validation to make sure users don’t experience problems while using the application.

Do you or someone within your organization regularly perform such maintenance on your Web applications? If so, what tools or methods do you prefer? Share your thoughts with the Web developer community by posting to the discussion.

Get weekly development tips in your inbox
Keep your developer skills sharp by signing up for TechRepublic’s free Web Developer newsletter, delivered each Tuesday.
Automatically subscribe today!