Mining Large Distributed Log Data in Near Real Time
Analyzing huge amounts of log data is often a difficult task, especially if it has to be done in real time (e.g., fraud detection) or when large amounts of stored data are required for the analysis. Graphs are a data structure often used in log analysis. Examples are clique analysis and Communities Of Interest (COI). However, little attention has been paid to large distributed graphs that allow a high throughput of updates with very low latency. In this paper, the authors present a distributed graph mining system that is able to process around 39 million log entries per second on a 50 node cluster while providing processing latencies below 10 ms.