A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure

DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of DBSCAN is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. The authors present, a new Parallel DBSCAN algorithm (PDSDBSCAN) using graph algorithmic concepts. More specifically, they employ the disjoint-set data structure to break the access sequentiality of DBSCAN.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Resource Details

Provided by:
Institute of Electrical & Electronic Engineers
Topic:
Data Centers
Format:
PDF