PARDICLE: Parallel Approximate Density-based Clustering
DBSCAN is a widely used isodensity-based clustering algorithm for particle data well-known for its ability to isolate arbitrarily-shaped clusters and to filter noise data. The algorithm is super-linear (O(nlogn)) and computationally expensive for large datasets. Given the need for speed, the authors propose a fast heuristic algorithm for DBSCAN using density based sampling, which performs equally well in quality compared to exact algorithms, but is more than an order of magnitude faster. Their experiments on astrophysics and synthetic massive datasets (8.5 billion numbers) shows that their approximate algorithm is up to 56x faster than exact algorithms with almost identical quality.