Performance Optimization for Distributed Intra-Node-Parallel Streaming Systems
The performance of intra-node parallel dataflow programs in the context of streaming systems depends mainly on two parameters: the degree of parallelism for each node of the dataflow program as well as the batching size for each node. In the state-of-the-art systems the user has to specify those values manually. Manual tuning of both parameters is necessary in order to get good performance. However, this process is difficult and time consuming - even for experts. In this paper, the authors introduce and optimization algorithm that optimizes both parameters automatically. They define a novel cost model for intra-node parallel dataflow programs with user-defined functions.