Kevin Ferguson

Craig Macfarlane, chief technology officer of, smiles incredulously as he studies the numbers: 4.7 million unique users logged on to his company’s Web site in December. No, wait—make that 1.75 million. Or rather, 1.13 million.

The first of the three numbers Macfarlane has to choose from was generated using a homegrown Web traffic-analysis program, the second was offered by PC Data Online, and the third was provided by Media Metrix. At least two of them are inaccurate. But most likely, all three are off the mark. “It’s scientific, but it’s messy,” says Macfarlane.

It’s also a hassle. “If we are undercounted and don’t appear in the top-tier sites, we don’t get to meet with agencies or significant advertisers,” he said. “This certainly does affect our ability to sell advertising.” So the Web site has to scrutinize the three numbers to figure out which ones are closest to reality.

Despite considerable improvements to the methodologies employed by Web traffic-analysis tools over the past two years, it seems that deriving meaningful results with them is still often two parts art and one part science. Regardless, the results they produce are indispensable.

Editor’s note

At the time of this writing, PC Data Online had ceased doing business. Certain assets of the company had been acquired by comScore Networks, which is honoring the contracts of former PC Data Online customers.

Three ways to measure Web traffic
Despite the lack of certainty in Web traffic numbers, analyzing your traffic can be invaluable. For example, do you want to know why your customers abandon their virtual shopping carts before hitting the checkout line? Look at the page-navigation statistics. Can’t understand why visitors never make it to the fourth page of your online catalog? See how much time they’re forced to spend on the first three pages. Sensing a dramatic shift in the demographics of your visitors? Analysis tools can show you which Web sites they visited before coming to you.

Broadly speaking, you have three ways to go about measuring and analyzing Web traffic:

  • You can install traffic-analysis software, such as that sold by NetGenesis, WebTrends, and Accrue Software, on your own servers (regardless of whether you host your own site or use an ISP for that purpose).
  • You can outsource this task to a service provider, such as WebSideStory, that specializes in traffic reporting.
  • You can subscribe to an independent tracking service, such as Nielsen/NetRatings or Media Metrix.

Most recently, traffic-analysis vendors have been offering a mishmash of products to attract a broader audience. For example, in March 2001, WebSideStory began selling its first packaged software, an analytical tool called HitBox DataWise, and software developer WebTrends now offers a hosted service called WebTrends Live.

Each product has its pros and cons. WebSideStory’s HitBox Enterprise, for instance, is very customizable and good at tracking larger historical Web trends but requires labor-intensive steps in tracking traffic minutia, such as a specific page’s ebb and flow of visitors over several weeks. (Such trends are easier to track using WebSideStory’s recent release, DataWise.) And PC Data Online captures useful details about Web usage—for example, repeated visits to the same site—but it generates such details by collecting data from only a sample of consumer-oriented Web users. While the consumer sample is vast—120,000 and growing by 3,000 a month—specialized sites such as’s Notre Dame souvenir shop are likely underrepresented.

Check out CNET Enterprise Business

This article appears courtesy of CNET’s Enterprise Business section, where you can explore IT business solutions on various topics, including ASPs, Linux, groupware, information systems infrastructure, and supply chain management.

How to track cached pages
A bigger technical watershed separating traffic-reporting tools, though, may be their ability to track cached pages. When PC users request Web pages from Internet service providers, they are often viewing pages that were cached in the ISP’s data centers once and served to multiple users. The result: Some cached pages are never counted and can throw off Web traffic reports. How many cached pages are missed? It’s difficult to say. Critics of traffic-analysis tools put the number as high as 10 percent for some sites, but the vendors of those tools say the number is negligible.

The Web traffic tools that are most susceptible to missing cached pages, critics argue, are those that use data-collection methods known as log-file analysis and network packet sniffing, such as WebTrends and NetGenesis. Network packet sniffers, which usually reside on stand-alone servers between the Web server and the firewall, scan Web server data packets that stream past, copy them, and then forward them to a database. Log-file analysis records the requests made from Web, proxy, and other Internet servers, noting such things as the visitor’s IP address and the time it took to process the request, and then sends them to a database for subsequent number crunching.

Proponents of log-file analysis insist that tracking cached pages is not the problem it was with the Web design tools available just a few years ago. “I understand why advertisers are concerned about this, but it’s a bit of an urban legend now,” said Kevin Epstein, director of product management in Inktomi’s networks products division. “All you need to do is put a noncacheable object in your page, like a piece of text.” That way, even if graphics and banner ads are served from cache memory, the pages will still be tracked.

But even vendors that consistently count cached pages aren’t always on the same page. WebSideStory and PC Data Online, for example, do capture traffic routed through caching servers, but their services still report different traffic numbers for the same Web pages during the same time. Why? Again, different methods. WebSideStory uses a technique known as page tagging by which the company’s clients place a few lines of code at the bottom of each Web page they want tracked. Each time that page is requested, whether or not the page has been cached, WebSideStory is notified.

PC Data Online, on the other hand, doesn’t code each page but captures the URL requested by placing tracking software on survey participants’ hard drives. (PC Data’s tracking software, @PC Data, starts tracking Internet usage as soon as users open their browsers. @PC Data collects and temporarily stores a log of participants’ Web activities for 15 minutes. The data is then sent in real time in an encrypted message to PC Data.) Caching, therefore, is not an issue for either WebSideStory or PC Data Online.

The best bet is to use a combination of third-party auditing tools, such as those offered by Media Metrix or PC Data, and analysis tools from NetGenesis, WebTrends, and the like. The auditing tools will help you compare your site to others in your market segment. The analysis tools will give you more specifics on your site.

How to choose the best tools
What are the characteristics of the best traffic-analysis tools? Enterprise users suggest you consider these five points:

Scalability—Pick software that handles quickly expanding sites; busy Web sites can generate gigabytes of traffic reports each day. “Most vendors’ software can’t handle the volume,” said Dan Vesset, a senior analyst at IDC. “That’s the biggest reason why businesses change software vendors.” Case in point: ABC Distribution switched from WebTrends to WebSideStory 15 months ago because WebTrends couldn’t handle the 6 GB of data generated each day by the online gift catalog’s more than 100,000 visitors. WebTrends has since released software designed for high-traffic sites.

Available reports—You can choose from hundreds of reports that measure types of activities. Some show the amount of time a user spends on each Web page; others show the paths users take to navigate your site; and still others note how much time a user spends with offline applications, such as Microsoft Word, before returning to the Web. You won’t need 80 percent of the available reports, but the ones you pick can be crucial. It all depends on your business and the context in which the numbers are read. For example, a report that shows that users spend an average of 20 minutes per visit sounds wonderful—unless you also look at the paths they take in navigating your site. You might find that they spend so much time not because they love your site, but because they keep getting lost.

Customization—Static monthly reports can take you only so far. Consider those that let you customize online reports and easily integrate data into other applications, such as e-CRM programs.

Price—Dust off your wallet. Typical of software in its class, NetGenesis 5.0 will cost enterprises about $160,000, which includes approximately $60,000 for NetGenesis consultants to spend six weeks analyzing your business and deploying the product. WebTrends Enterprise Reporting Server, a browser-based program that exemplifies the middle tier of Web traffic products, starts at $4,100 for one server. The average customer eventually spends about $30,000, the company said. Hosted solutions, such as those from WebSideStory, will vary in cost, depending on the volume of traffic analyzed. But expect to pay $2,000 to $5,000 per month.

Platform—This isn’t the headache it was 18 months ago. Previously, some tools were available only for Windows NT, requiring Herculean efforts by larger Web site hosts to port data over to UNIX-based servers. Now, in their efforts to attract larger enterprises, vendors have released UNIX-compatible applications. However, not many Linux tools are available yet. Web server support is often not an issue either. Most traffic tools now support the usual suspects, including Apache, Microsoft IIS, and Netscape Enterprise.

Kevin Ferguson has covered the computer industry for 16 years. Prior to writing for CNET Enterprise, he was executive editor for Computer Reseller News and editor in chief of Computer Retail Week.

How do you measure traffic to your Web site?

We look forward to getting your input and hearing about your experiences regarding this topic. Join the discussion below or send the editor an e-mail.