A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. The authors present, a new Parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, they employ the disjoint-set data structure to break the access sequentiality of DBSCAN.