RCFile: A Fast and Space-Efficient Data Placement Structure in MapReduce-Based Warehouse Systems

MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on the user's observations and analysis of Facebook production systems, the authors have characterized four requirements for the data placement structure: fast data loading, fast query processing, highly efficient storage space utilization and strong adaptivity to highly dynamic workload patterns.

Provided by: Institute of Electrical & Electronic Engineers Topic: Big Data Date Added: May 2011 Format: PDF

Find By Topic