Myriad - Parallel Data Generation on Shared-Nothing Architectures

The need for efficient data generation for the purposes of testing and benchmarking newly developed massively-parallel data processing systems has increased with the emergence of BigData problems. As synthetic data model specifications evolve over time, the data generator programs implementing these models have to be adapted continuously - a task that might become complex as the set of model constraints grows. In this paper, the authors present Myriad - a new parallel data generation toolkit. Data generators created with the toolkit can produce very large datasets by exploiting a completely parallel execution model, while at the same time maintain cross-partition dependencies, correlations and distributions in the generated data.

Provided by: Technische Universität Berlin Topic: Big Data Date Added: Sep 2011 Format: PDF

Find By Topic