Provided by: Microsoft
System logs are an important tool in studying the conditions (e.g., environment mis-configurations, resource status, erroneous user input) that cause failures. However, production system logs are complex, verbose, and lack structural stability over time. These traits make them hard to use, and make solutions that rely on them susceptible to high maintenance costs. Additionally, logs record failures after they occur: by the time logs are investigated, users have already experienced the failures' consequences. The authors evaluate the effectiveness of this approach by replicating the behavior of a service used in production at Microsoft, and testing the ability to predict failures using a synthetic workload on a 650 million events production trace.