Every time you use Google search, Google collects information specific to your search. The collected information pertains to the search query, user IP address, and user cookie. Google isn’t specific about how the search data is stored on their servers, but it divulged that individual user information was initially stored intact and for an indefinite period.

Why retain this information?

Google system developers explain that retaining search logs allows them to analyze usage patterns, perform fraud detection/prevention, combat denial-of-service attacks, and help diagnose system issues. For a more detailed explanation of why Google retains this data, please refer to their blog post, “Why Data Matters.”

Google anonymizes search logs

For some time, Google has been receiving fairly significant pressure from privacy advocates to increase user anonymity. To their credit, Google made some initial changes to the search log data retention policy during March of last year. The changes amounted to anonymizing portions of the search log, but only after the individual search log aged 18-24 months. Anonymizing, according to Google, is where certain bits of the IP address are changed, making it very unlikely the IP address can be associated with a specific computer and thereby the user. Google also changed the cookie in such a way as to make it impossible to identify the user.

Google’s next step to increase privacy

On September 8, 2008, Google announced another change to the search log data retention policy as mentioned in their blog post, “Another Step to Protect User Privacy.” The change reduces the age when search logs become anonymized from 18-24 months down to nine months. This new change was again due to continued pressure from policymakers and privacy regulatory agencies in Europe and the United States.

It appears that Google can restore the search log

I’m not sure if the new change is significant, since user search logs are still intact for nine months. This need for specific user information really peaked my curiosity, so I tried to find out why Google needs information that is that detailed. In the process, I stumbled across Google’s “Log Retention Policy FAQ” (pdf). I still didn’t find any definitive answers, but I did find something of interest:

“Will governments be able to subpoena server log data after it is anonymized? Will anonymized data still be able to identify an individual user by cookie or IP address?

“Google does comply with valid legal process, such as search warrants, court orders, or subpoenas seeking personal information. Logs anonymization does not guarantee that the government will not be able to identify a specific computer or user, but it does add another layer of privacy protection to our users’ data.”

To me, that sounds like the anonymizing process is reversible. Whether that’s significant or not, I’m still undecided. I suspect there are governmental regulations pertaining to data retention and Google has to abide by them.

Final thoughts

I appreciate the fact that Google is paying attention to user privacy, but I really tip my hat to the EU and U.S. privacy advocates. They are the real heroes for bringing this kind of pressure on the search giant.