Association for Computing Machinery
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, the authors want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. They propose to address this problem and present D-Search, an efficient algorithm for similarity search in large distribution datasets.