Invisible Loading: Access-Driven Data Transfer from Raw Files into Database Systems

Commercial analytical database systems suffer from a high “Time-to-first-analysis”: before data can be processed, it must be modeled and schematized, transferred into the database’s storage layer, and optionally clustered and indexed. For many types of structured data, this upfront effort is unjustifiable, so the data are processed directly over the file system using the Hadoop framework, despite the cumulative performance benefits of processing this data in an analytical database system. In this paper, the authors describe a system that achieves the immediate gratification of running MapReduce jobs directly over a file system, while still making progress towards the long-term performance benefits of database systems.

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Subscribe to the Data Insider Newsletter

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more. Delivered Mondays and Thursdays

Resource Details

Provided by:
Association for Computing Machinery
Topic:
Data Management
Format:
PDF