Binary Information Press
As one of the most popular reduction methods of large-scale data mining, simple random sampling usually causes the loss of small clusters when dealing with unevenly distributed datasets. A density biased sampling algorithm based on grid can avoid this problem. However, both the efficiency and the effectiveness are restricted by grid granularity. To overcome such drawbacks, a density biased sampling algorithm based on variable grid division was proposed. Each dimension of original dataset is divided according to the corresponding distribution.