Big Data

Optimized Incremental ETL Jobs for Maintaining Data Warehouses

Date Added: Aug 2010
Format: PDF

ETL jobs are used to integrate data from distributed and heterogeneous sources into a data warehouse. A well-known challenge in this context is the development of incremental ETL jobs for efficiently maintaining warehouse data in the presence of source data updates. In this paper, the authors present a new transformation-based approach to automatically derive incremental ETL jobs. To this end, they consider a simplification of the underlying update propagation process based on the computation of so-called safe updates instead of true ones. Additionally, they identify the limitations of already proposed incremental solutions, which are cured by employing Magic Sets leading to dramatic performance gains.