DataSynth: Generating Synthetic Data using Declarative Constraints

A variety of scenarios such as database system and application testing, data masking, and benchmarking require synthetic database instances, often having complex data characteristics. The authors present DataSynth, a flexible tool for generating synthetic databases. DataSynth uses a simple and powerful declarative abstraction based on cardinality constraints to specify data characteristics, and uses sophisticated algorithms to efficiently generate database instances satisfying the specified characteristics. The paper show-case the various feature of DataSynth using two real-world data generation scenarios.

