Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation
Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is within-groups sum of squares minimization SSE. For multivariate points, optimal microaggregation, i.e. with minimum SSE, has been shown to be NP-hard.