The Commonwealth Scientific and Industrial Research Organisation (CSIRO) made the finding when it compared the capabilities of its commercial heuristic search engine Panoptic - developed at the Advanced Computational Systems Cooperative Research Centre — with those used on enterprise-scale Web sites.
"One of the main reasons why people are happy with Google searches than with the search engines on the Web a few years ago is because of Google's ability to home in on the homepage of an organisation," CSIRO information retrieval expert Dr David Hawking told ZDNet Australia . For example, typing "Microsoft" into Google will result in the Microsoft Corporation's home page coming up in the number one rank.
"We've analysed query logs from a number of organisations and found there's quite a high proportion of navigational query submitted within Web sites," said Hawking. "For example, in the Sony site a visitor might type in the name of the product. In each case they're looking for the entry page to the site that is the entry page for that topic."
As most Internet users are probably aware, the search engine used on a company's site often does not return the most appropriate page. Hawking said this was because the engines were restricted to searching text within a Web page. Earlier Web search engines had similar problems, which Google improved by ranking pages based on incoming links, anchor texts and URL structures, among other things.
"We've used similar technology and worked out which ones are effective within a typical organisation," said Hawking. He emphasised the importance of an effective search engine to businesses.
"There is plenty of research to show that return on investment from Web sites is highly dependent on how efficiently that site can be searched for information and services - missed hits mean lost customers," said Hawking. "It is crucial to an organisation's business to have the right page appear in the top five, preferably at number one, when a customer is searching for information."
There are also ways to optimise Web sites for search engines. One of the reasons heuristic search engines such as Hawking's own Panoptic find it difficult to rank pages in the most appropriate order is their poor use of content management systems. Hawking agrees content management systems are good, but if left on the default setting they typically publish pages with URLs such as "http://www.drugemporium.com/cec/cstage?eccookie=@eccookie@&ecaction=de_ecwalkin&template=de_walkin.en.htm."
These URLs make ranking pages difficult, since the heuristic engines cannot use the structure of the URL to allocate importance to a page. Hawking advises that companies make a conscious effort to create sensible URLs, touting as an added benefit the fact that more people will link to a page if they understand how the URL works than if it is gibberish.
"If you have a well organised site like www.anu.edu.ua/physics/1styear.html, that [URL] contains useful information for people using the site and makes it easier for search engines," said Hawking. "People are more likely to link to a logical URL than a 40 digit ID."
He said they could advise companies how to optimise their sites for use with the Panoptic search engine, with the added benefit it would also be optimised for Google.