RCFile: A Fast and Space-Efficient Data Placement Structure in MapReduce-Based Warehouse Systems

Provided by: Institute of Electrical & Electronic Engineers
Topic: Big Data
Format: PDF
MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on the user's observations and analysis of Facebook production systems, the authors have characterized four requirements for the data placement structure: fast data loading, fast query processing, highly efficient storage space utilization and strong adaptivity to highly dynamic workload patterns.

Find By Topic