General discussion

Locked

Validity of user agents for tracking.

By normhaga ·
Let me see the discussion revolves around whether the user agent issued by a browser is a workable method that can be used to track unique visitors to a web site and whether this information would be useful in forming a reasonably accurate assessment of which browsers and operating systems are predominant.

One participant maintains that this is a viable method to track the browser and the OS used to access the website and that given a large enough population is useful in tracking the desired information. This participant did bring up that the user agent can be altered to indicate that a user agent and the associated information can be altered to something other than what is really being used, but also that such spoofing is relatively rare and statistically insignificant. The other participant that because of routing issues and spoofing UA based methods of browser and OS determination are only slightly better than useless.

This discussion grew from the Linux/Windows religious discussion.

This is a continuation of that discussion.

It is my position that even though UA's can be spoofed without difficulty they are a useful indicator of what OS and browser is being used. The reason for this is that spoofing is a method that, while not illegal, smacks of methods that criminal hackers/crackers use. It is so close to the border that in some US jurisdictions, such as Wisconsin, there is a very real possibility of finding criminal charges levied against you. I make no judgment as to how valid those charges would be, only that the possibility exists. I myself spoof my UA when needed; one such example is one bank I use supported only IE 6 and 7. Because I routinely use Linux as my prefered OS and we all know that Internet Explorer works with Linux, I have to spoof my UA presented by Firefox to indicate IE 6 or 7 on a Win 32 or 64 bit platform to access my account information. (The bank in question included FF as a supported browser when I presented these concerns and the news articles from WI. showing an ongoing criminal prosecution for UA spoofing.)

Is UA tracking by itself viable for the given purposes? No. However when used in conjunction with time expiring cookies this might be a viable method. UA strings survive proxies, unless the proxy is setup to strip the UA, or to spoof it. The reason is that when the web page finally returns to the originator, it needs to present the proper layout to the requesting browser. Sometimes funny things happen when you read a Safari layout in FF or IE 7. With my own website hosted by Yahoo Small Business Web Hosting, I track individual users, OS's and browsers with the UA string and time expiring cookies. It is a little more complex than simple cookie/UA tracking because I also use IP tracking to indicate whether the user is a unique IP. I then analyze this information to determine what browsers to support. For those concerned I only collect UA, OS, and IP information for the stated purpose; If I see a large number of repeated IP's I might do a reverse DNS to make a sales call, but so far have resisted the temptation. I also use this method as an experiment in tracking and blocking spam.

Can this method return spurious data? Certainly. One legitimate example would be were if several employees of a large viewing my website from the same server in a short time period. Could someone use proxies to obscure their IP? Certainly. There are many faults with this method, but I assert that for the purpose given it is a viable method.

What do you think?

Next contestant in "The Guess is Right," poleeaaase.

This conversation is currently closed to new comments.

1 total post (Page 1 of 1)  
| Thread display: Collapse - | Expand +

All Comments

Collapse -

Neon_Samari's response

by normhaga In reply to Validity of user agents f ...

Good lord, they want to extend wire fraud to include manipulating your browser response? That's as bad as banning networking tools rather than the malicious use of them. I can think of a few ligitimate reasons:

- testing firewall rules or trouble shooting network issues by changing your test system's MAC.

- testing your webserver config and logging or any browser specific functions by changing your browser response.

- the same subscriber using two machines on the same account (suspicious I know but stick with me here). I was on vacation in sunny Florida and the resort provided subscribed access to internet over wireless.

I have a PDA always with me which needs wireless connectivity when around the resort. Back at the room, I have my notebook which is also used for various networking needs but not carried around the resort with me. Only one machine would be online at a time and both would be used by myself as the week long ISP subscriber. This was a concern because I didn't want to go through the online subscription forms, have my mobile MAC recorded then not be able to use the notebook without a second fee and MAC recording.

After asking the support desk, I was told that the browser entered username and password where the only authentication to get out of the resort router. MAC filtering was not used and the same subscriber could use two different machines provided it was not at the same time.

Had MAC been a component of the authentication, I think using the same MAC on both machines would have been valid. I can use my access at any of the internet cafe machines around the resort and the resort specified that multiple machines could be used individually.

Penalize those who spoof there information for malicious purposes; absolutely. Heck, burn out there NIC if you can get a spike back down the network cable. But, for the love of rational thought, don't ban the softwaer, ban the idiot using it.

Your website stats are probably accurate enough for your needs. If not, then they at least tailor both your website and the end users who will consciously or unconsciously respond to the results of those statistics. This probably produces an accurate enough cross section between the subtle adjustments on both sides of the wire.

My own issue is more a habit born out of computers and experience now as a business analysts. With my computers, it's binary; it works or it doesn't. I either hit the command and enter or I don't. Like any computer, they are dumb, frightfully dumb, the can only do what you ask and exactly that be it what you intended or not.

At work, it is even more so the case. I can't stand putting out analysis based on best guesses and fiction (numbers not from the database or combared between two incomparible systems). Most of my time is spent figuring out how to get the most accurate information with only a few regular tasks requiring incomparable data consolidated into the same reporting. Questions like "these numbers look a little low/high, can we just change those down a little before you send it out?" - always responded with 'that's what the data base says it is, we can look further into why tha t is if you'd like". Basically, questionable source data and having to base real analysis off it is something I can't stand doing. 1+1!=3, ever (this isn't accounting ).

Those are probably the biggest reasons for my own distrust of market share research between software options and web server statistics beyond what is fully trustworthy. I still view the two separately though with very similar reasons for distrust.

I'm also one of the people who screwes up your statistics since cookies are whiped at each session end by all my browsers. Each visit, I'm a new browser/OS combination inflating your figures. I also often browse from FF under Mandriva so I then don't get counted as an active Windows install though I also use it for many other tasks; MS get's me tagged on the Windows Update stats though.

I can accept the stats from the logs as estimations to some degree for browser hits. I still can't shake the distrust of data that can not be definitively analysied without at least, less variables.

Back to Networks Forum
1 total post (Page 1 of 1)  

Related Discussions

Related Forums