NEC Laboratories America
Systems with several components interacting to accomplish challenging tasks are ubiquitous; examples include large server clusters providing "Cloud computing", manufacturing plants, automobiles, etc. The authors' relentless efforts to improve the capabilities of these systems inevitably increase their complexity as they add more components or introduce more dependencies between existing ones. To tackle this surge in distributed system complexity, system operators collect continuous monitoring data from various sources including hardware and software-based sensors. A large amount of this data is either in the form of time-series or contained in logs, e.g., operators' activity, system event, and error logs, etc.