Optimized Incremental ETL Jobs for Maintaining Data Warehouses

ETL jobs are used to integrate data from distributed and heterogeneous sources into a data warehouse. A well-known challenge in this context is the development of incremental ETL jobs for efficiently maintaining warehouse data in the presence of source data updates. In this paper, the authors present a new transformation-based approach to automatically derive incremental ETL jobs. To this end, they consider a simplification of the underlying update propagation process based on the computation of so-called safe updates instead of true ones. Additionally, they identify the limitations of already proposed incremental solutions, which are cured by employing Magic Sets leading to dramatic performance gains.

Provided by: Association for Computing Machinery Topic: Big Data Date Added: Aug 2010 Format: PDF

Find By Topic