Big Data

Macro-level Scheduling of ETL Workflows

Date Added: Aug 2011
Format: PDF

Extract-Transform-Load (ETL) work-flows extract data from various sources, transform, cleanse and homogenize these data, and populate a target data store (e.g., a data warehouse). Typically, such processes should terminate during strict time windows and thus, ETL work ow optimization is of significant interest. In this paper, the authors deal with the problem of scheduling the execution of ETL activities, with the goal of minimizing ETL execution time and allocated memory. Apart from a simple, fair scheduling policy they also experiment with two policies, the first aiming to empty the largest input queue of the work ow and the second to activate the activity with the maximum tuple consumption rate.