Business Intelligence

Compression, Clustering and Pattern Discovery in Very High Dimensional Discrete-Attribute Datasets

Free registration required

Executive Summary

This paper presents an efficient framework for error-bounded compression of high-dimensional discrete-attribute datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Sub-sampling and compression are two key technologies for analyzing these datasets. The proposed framework, PROXIMUS, provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy.

  • Format: PDF
  • Size: 1651.8 KB