Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters

Date Added: May 2009
Format: PDF

Horizontally-scalable Internet services on clusters of commodity computers appear to be a great fit for automatic control: there is a target output (service-level agreement), observed output (actual latency), and gain controller (adjusting the number of servers). Yet, few datacenters are automated this way in practice, due in part to well-founded skepticism about whether the simple models often used in the research literature can capture complex real-life workload/performance relationships and keep up with changing conditions that might invalidate the models. The authors argue that these shortcomings can be fixed by importing modeling, control, and analysis techniques from statistics and machine learning.