An Improved Density Biased Sampling Algorithm for Clustering Large-scale Datasets

Provided by: Binary Information Press
Topic: Big Data
Format: PDF
As one of the most popular reduction methods of large-scale data mining, simple random sampling usually causes the loss of small clusters when dealing with unevenly distributed datasets. A density biased sampling algorithm based on grid can avoid this problem. However, both the efficiency and the effectiveness are restricted by grid granularity. To overcome such drawbacks, a density biased sampling algorithm based on variable grid division was proposed. Each dimension of original dataset is divided according to the corresponding distribution.

Find By Topic