Despite the abundance of search engines available on the Web, none will deliver all of the information you want on a given subject. The “Invisible Web,” or “Deep Web,” is where you’ll find a level of information not accessible on standard Web sites, says Gary Price in the recently published The Invisible Web: Uncovering Information Sources Search Engines Can’t See (Information Today, Inc.; $23.96), a book he coauthored with Chris Sherman. Price is a Washington, DC-based library and information research consultant, and Sherman is president of Searchwise, a consulting firm in Boulder, CO.
The 430-page tome is a vast storehouse of information that pros can use, but it’s also valuable for anyone searching for information on any subject on the Web. Mostly, it’s a timesaving tool for getting information about technology and the workplace.
This article provides an overview of the Invisible Web, tips on how to use it, and shares some helpful sites for more information on the Invisible Web.
Visible vs. invisible
Price and Sherman can’t take credit for discovering the Invisible Web. Nat Kroll, president of an executive search firm in Washington, DC, coined the term in 1996.
The visible Internet is made up of HTML Web pages that search engines have chosen to include in their indices (collections of Web pages), Price said. The Invisible Web is harder to define. Price describes it as text pages, files, or other often high-quality authoritative information available on the Web that general-purpose search engines cannot—due to technical limitations—or will not—due to deliberate choice—add to their indices of Web pages. (TechRepublic, for example, blocks search-engine crawlers from its site to prevent poor performance due to the heavy traffic that crawlers generate.)
Theoretically, search engines could index some parts of the Invisible Web, but this would be impractical because it’s expensive, according to Price. Technically speaking, search engines have a big problem accessing information stored in databases, for instance.
“There are thousands, perhaps millions, of databases containing high-quality information that are accessible via the Web,” he said.
For example, if you need financial information on a company at which you are interviewing, you could go the conventional route and gather background information on the company using “top-level pages” from a general search engine. But you’d learn more if you accessed EDGAR Online and looked at the company’s Securities and Exchange Commission filing application, which would tell you everything you need to know.
More up-to-date information
Information placed on the Web today typically cannot be found until four to six weeks later, according to Price.
“Most general search engines such as Google and AltaVista don’t crawl the Web in real time,” Price explains. “With technology changing daily, search engines are not up-to-the-minute resources. The different search engines are also refreshed at different times, which makes some more current than others.”
Additionally, the depth of material is limited on most search engines.
“If 500 pages of a document are available, a search engine doesn’t guarantee every page will be available,” Price said. “And some material is searchable on one search engine but not on another. Most search engines are not making .pdf format files available. If you were searching for information in a .pdf document, few search engines would retrieve it.”
(Google, however, has begun crawling the Web for .pdf documents this year.)
Invisible Web tour guides
Two sites, InvisibleWeb.com and the authors’ invisible-web.net, provide a directory of some resources available on the Invisible Web. Also, SearchEngineWatch.com offers newsletters on search engines and shares tips and information about searching the Web.
Here are a few sites recommended by invisible-web.net:
- Meta-List.net allows you to search 17,000 newsletters and discussions on various topics, including computers and the Internet.
- Search Engine Guide offers a directory of specialty search engines and daily news about search engines and the search engine industry, including a subcategory about computing.
- SecuritySearch.net offers industry news, a search engine for security Web sites, and downloadable tools.
What’s your favorite search engine?
Have you found an invaluable site on the visible or the Invisible Web? What search engine do you use for general searches and specific questions? Send us your recommendation.