In-Situ MapReduce for Log Processing
Log analytics are a bedrock component of running many of today's Internet sites. Application and click logs form the basis for tracking and analyzing customer behaviors and preferences, and they form the basic inputs to ad-targeting algorithms. Logs are also critical for performance and security monitoring, debugging, and optimizing the large compute infrastructures that make up the compute "Cloud", thousands of machines spanning multiple data centers. With current log generation rates on the order of 1 - 10 MB/s per machine, a single data center can create tens of TBs of log data a day. While bulk data processing has proven to be an essential tool for log processing, current practice transfers all logs to a centralized compute cluster.