Provable De-Anonymization of Large Datasets With Sparse Dimensions

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. The authors' goal is to analytically explain why such attacks work. Specifically, they analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. They prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks.

Provided by: Carnegie Mellon University Topic: Big Data Date Added: Feb 2012 Format: PDF

Find By Topic