One promise of the Internet is the ability to access boundless information with relative anonymity. Those seeking a higher level of privacy can achieve that via open proxies, or specialized services designed for conducting anonymous online activities such as Tor.
A new study by Balachander Krishnamurthy, a researcher at AT&T Labs, and Craig E. Wills, a professor of computer science at the Worcester Polytechnic Institute in Massachusetts, painted a different picture. Indeed, the rise of social networking sites appears to be changing the dynamics of online privacy as we understand it.
Their paper “On the Leakage of Personally Identifiable Information via Online Social Networks” can be accessed online here (PDF). Alternatively, I have summed up the technical root of the issue as well as highlighted some potential repercussions below.
Personally identifiable information
Essentially, the study looked at a total of 12 popular social networking sites: Bebo, Digg, Facebook, Friendster, Hi5, Imeem, LiveJournal, MySpace, Orkut, Twitter, Xanga, and LinkedIn. As you are no doubt already aware, each of these sites contains varying degrees of personally identifiable information (PII) that can be accessible by friends or the general public.
The heart of the issue is how online anonymity will be wrecked should these usernames or IDs ever get leaked to third-party aggregate sites. What are some of these aggregate sites, you ask? Well, how about ad networks like DoubleClick, Google Adsense, Omniture, or analytics services like Google Analytics, WebTrends, and Statcounter.
Leakage via HTTP
The problem here has to do with how the HTTP protocol leaks information via several variables found in the HTTP header, specifically, the information contained within the referral header, request-URI, as well as cookies.
Popular sites such as MySpace or Facebook, for example, will contain advertisements from ad networks. When loading the page, the typical Web browser will also load the ads that appear. When a user is logged in, it will result in his or her ID or username being transmitted in the HTTP request to the unrelated site.
The same goes for when you click a link that takes you outside the site or use any third-party applications (Facebook games) that access external resources via HTTP. Once a user is identified, it is a simple matter to link them with any number of tracking cookies and establish an overview of the sites that they have visited.
Of a slightly lower concern are some sites that have unwisely opted to store identifiable information via cookies.
The best solution against leaking PII via social networks, unfortunately, is best implemented at the site level. This would involve the elimination of the referral field and the judicious management of what gets transferred via cookies and URIs.
Where the end user is concerned, the use of proxy servers that automatically remove the referral field offers some protection, as does not allowing tracking cookies via security software or browser add-ons. However, these are half-measures at best and do not prevent at least a partial profile of one’s activities from being tracked. In the same vein, the use of “privacy modes” on Google Chrome and Mozilla Firefox can bring privacy only to the client end.
The implications are sobering and call for a reexamination of how we interact with the Web. Since tracking cookies have been in use for years, it is entirely possible that aggregator sites with historical records could theoretically link our social networking profiles with all our past accesses in its database.
What precautions do you take with social networking, based on your level of participation in the poll above? Let us know in the discussion area.