Provided by:
Aristotle University of Thessaloniki
Topic:
Big Data
Format:
PDF
Data-intensive analytic flows, such as populating a data-warehouse or analyzing a click stream at runtime, are very common in modern business intelligence scenarios. Current state-of-the-art data flow management techniques rely on the users to specify the flow structure without performing automated optimization of that structure. In this paper, the authors introduce a declarative way to specify flows, which is based on annotated descriptions of the output schema of each flow activity. They show that their approach is adequate to capture both a wide-range of arbitrary data transformations, which cannot be supported by traditional relational operators, and the precedence constraints between the various stages in the flow.