Using Search Results to Microaggregate Query Logs Semantically
Query log anonymization has become an important challenge nowadays. A query log contains the search history of the users, as well as the selected results and their position in the ranking. These data are used to provide a personalized re-ranking of results and trend studies. However, query logs can disclose sensitive information of the users. Hence, query logs must be submitted to an anonymization process to guarantee that: no sensitive information can be linked to an identity; the analysis of the anonymized data produces similar results than the original data, i.e. minimize data distortion. Latest anonymization approaches utilize micro-agreggation, a statistical disclosure control technique that provides privacy comparable with k-anonymity, attempting to minimize the data distortion.