Association for Computing Machinery
Testing the performance of database systems is commonly accomplished using synthetic data and workload generators such as TPCH and TPC-DS. Customer data and workloads are hard to obtain due to their sensitive nature and prohibitively large sizes. As a result, oftentimes the data management systems are not properly tested before releasing, and performance-related bugs are commonly discovered after deployment, when the cost of fixing is very high. In this paper, the authors propose RSGen, an approach to generating datasets out of customer metadata information, including schema, integrity constraints and statistics.