Peeking into the Optimization of Data Flow Programs with MapReduce-Style UDFs

Data flows are a popular abstraction to define data-intensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style User-Defined Functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude.

Provided by: Humboldt-Universitat zu Berlin Topic: Data Management Date Added: Dec 2012 Format: PDF

Find By Topic