Association for Computing Machinery
In this paper the authors propose FADD and G-FADD, scalable algorithms that detect outliers from multi-dimensional points despite large duplicates. Their main contributions are: no degeneracy. They re-design the standard outlier detection algorithm to overcome the problem of duplicate points in large, real world data; running time. Their algorithms enjoy near-linear runtime compared to the near-quadratic running time of the existing algorithm; discovery. They analyze large, real world data, and find interesting outliers. They propose several ways to eliminate the problem; they report wall-clock times and their time savings; and they show that their methods give either exact results, or highly accurate approximate ones.