V-MDAV: A Multivariate Microaggregation With Variable Group Size
Micro-aggregation is a clustering problem with minimum size constraints on the resulting clusters or groups; the number of groups is unconstrained and the within-group homogeneity should be maximized. In the context of privacy in statistical databases, micro-aggregation is a well-known approach to obtaining anonymized versions of confidential microdata. Optimally solving micro-aggregation on multivariate data sets is known to be difficult (NP-hard). Therefore, heuristic methods are used in practice. This paper presents a new heuristic approach to multivariate micro-aggregation, which provides variable-sized groups (and thus higher within-group homogeneity) with a computational cost similar to the one of fixed-size micro-aggregation heuristics.
Subscribe to the Data Insider Newsletter
Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays