Business Intelligence

The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing

Download Now Free registration required

Executive Summary

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "Quasi-identifier" attributes such as ZIP code and birthdate. This paper asks whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, they measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records.

  • Format: PDF
  • Size: 548.1 KB