Scalable Discovery of Unique Column Combinations

Download Now
Provided by: VLD Digital
Topic: Big Data
Format: PDF
The discovery of all unique (and non-unique) column combinations in a given dataset is at the core of any data profiling e ort. The results are useful for a large number of areas of data management, such as anomaly detection, data integration, data modeling, duplicate detection, indexing, and query optimization. However, discovering all unique and non-unique column combinations is an NP-hard problem, which in principle requires verifying an exponential number of column combinations for uniqueness on all data values. Thus, achieving efficiency and scalability in this context is a tremendous challenge by itself.
Download Now

Find By Topic