According to Wikipedia, load balancing was designed to distribute workloads across multiple computing resources: to maximize throughput, minimize response time, and prevent the overloading of any single server.

In Facebook’s never-ending quest to improve data-center efficiency, it looked hard at load balancing as a possible way to save electricity. Qiang Wu, infrastructure software engineer at Facebook, wrote the blog post Making Facebook’s software infrastructure more energy efficient with Autoscale to highlight what he and other engineers at Facebook have learned.

Wu described Facebook’s current load-balancing policy as being based on a modified round-robin algorithm. Wu wrote, “This means every server receives roughly the same number of page requests (Facebook handles billions of page requests each day) and utilizes roughly the same amount of CPU.”

The problem with the round-robin approach

Wu stated that engineers ascertained certain web servers consume power at the following rates:

  • 60 watts when idling
  • 130 watts at low-level CPU usage
  • 150 watts at medium-level CPU usage

Because of the above numbers, Wu and the other engineers realized Facebook’s current load-balancing system did not take into account off-peak hours (late at night) when the number of page requests dropped off. Rather than filling every server at a reduced level, it made more sense and saved electricity to load servers until they were using 150 watts of power, and idle the unused servers.

Introducing Autoscale

Knowing what to do and accomplishing it are two different things. Wu described the task ahead of them, “Though the idea sounds simple, it is a challenging task to implement effectively and robustly for a large-scale system.”

However, not to be denied, the engineers came up with an idea they call Autoscale.

Figure A

Autoscale consists of three parts (Figure A): custom load balancer, Autoscale controller, and the physical pool of servers. Put simply, Autoscale:

  • adjusts the size of the active server pool to current conditions, and
  • ensures that each active server is loaded to the medium-usage level (150 watts).

Wu then described how Autoscale accomplished the scaling/load balancing.

  1. Autoscale collects usage information (CPU load and request queue) from all active servers.
  2. Autoscale decides the optimal number of servers to keep in the active pool.
  3. Autoscale informs Facebook load balancers as to the number of servers in the active pool.
  4. The load balancers spread the work amongst the active servers.

If that sounds simple to do, it’s not. As Wu alluded to earlier, there’s a great deal of software heavy lifting in getting Autoscale to know when to do what. Wu described what “knowing what to do” is:

We want to make an optimal decision that will adapt to the varying workload, including workload surges or drops due to unexpected events. On one hand, we want to maximize the energy-saving opportunity. On the other, we don’t want to over-concentrate the traffic in a way that could affect site performance.

Preliminary results are successful

Facebook data centers are already using Autoscale to control production web server loading. The Facebook graph in Figure B shows the total power used by a single web cluster during one 24-hour period. Wu wrote, “Autoscale led to a 27 percent power savings around midnight. The average power saving over a 24-hour cycle is about 10-15 percent for different web clusters.”

Figure B

Having to deal with billions of page requests each day leads one to believe Facebook has plenty of web clusters. The power reduction obtained through using Autoscale has to translate into significant savings.

Wu stated that Facebook engineers are still tweaking Autoscale algorithms and foresee further gains. As a leading member of the Open Compute Project, it is hoped that Autoscale will be available for others to use.