Partitioning Real-Time ETL Workflows

Many organizations are aiming to move away from traditional batch processing ETL to Real-Time ETL (RT-ETL). This move is motivated by a need to analyze and take decisions on as fresh a data as possible. The RT-ETL engines operate on the abstraction of data flow executed on parallel architectures. For high throughput and low response times, there is a need for partitioning the data over the large number of nodes in the engine. In this paper, the authors consider the problem of partitioning real-time ETL flows and they propose a high level architecture for that.