Broken links on a web page are one of the top reasons that users submit comments and complaints about website functionality, so it makes sense for links to be reviewed on a regular basis. A systematic link checking policy ensures that external links are always working.
External links are harder to track than internal links because third-party websites may have changed their content management system (CMS) or perhaps they've updated a URL naming convention; in either case, an automatic redirect for older or changed URLs may not have been implemented, so it's up to you to make sure dead and broken external links are fixed.
I previously wrote about using Dreamweaver to find internal broken links so they can be repaired for intranet or local files. Now I'll provide an overview of one tool that can be utilized to find external broken links on web page document files, and share links and short descriptions for additional link-checking applications.
LinkChecker is a free GPL-licensed website validator maintained by Bastian Kleineidam, and the project can be found in the wummel / linkchecker GitHub repository. The latest version updated on December 24, 2013 is LinkChecker 8.5 and is available for download from the website as an exe, deb, or tar.xz file.
LinkChecker's features include recursive and multithreaded link checking and site crawling; it supports a command line interface, a GUI client interface, or a CGI web interface; it provides cookie and HTML5 support; and it can check HTML and CSS syntax. The exe file downloads as LinkChecker-8.5.exe and is just over 11M, and the straightforward installation takes under one minute to complete.
Find the program from the installed list and then open the application. The GUI is displayed in Figure A. (Note: All screenshots are from the application running on a Windows OS.)
You can test a web page document by entering a fully qualified URL into the GUI client or web interface (i.e., http://www.domainname.com) and then pressing the Start button on the top right. The link check will validate recursively all pages starting with the parent URL; all external links pointing outside the parent URL will be checked but will not use recursive checking for external third-party web pages. For more information and details on options, configurations, output types, proxy support, and other topics, check out the online manual.
Figure B shows LinkChecker scanning a sample URL.
The link check on the example page http://wummel.github.io/linkchecker/index.html resulted in 48 URLs found, 12 warnings, and 0 invalid URLs. URL properties for the first and highlighted parent URL http://wummel.github.io/linkchecker/faq.html shows a warning of a redirect that should be updated from line 74 in the faq.html file to a link for http://seleniumhq.org, which ultimately redirects to http://docs.seleniumhq.org/. While the redirect works in this case, it's probably a good idea for Bastian to update the link on the page so as not to rely on Selenium's due diligence in keeping the redirect active.
Additional link-checking applications
Xenu is a free download by Tilman Hausherr and is trademarked as Xenu, Xenu's Link Sleuth, and Link Sleuth for software products and services. The latest working download of the software is version 1.3.8 from September 4, 2010. For more information check out the official Description page.
W3C Link Checker
With the W3C's free link checker, you enter a URL into the form field and get options for summary only, hide redirects for all or directories only, and check linked documents recursively within an assigned depth. Also, you can save the link checking options as a cookie.
What do you use to check external links?
What link checking tool do you use for your websites? Let us know in the discussion.