No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-Intensive Analytics
Infrastructure-as-a-Service (IaaS) cloud platforms have brought two unprecedented changes to cluster provisioning practices. First, any (non-expert) user can provision a cluster of any size on the cloud within minutes to run her data-processing jobs. The user can terminate the cluster once her jobs complete, and she needs to pay only for the resources used and duration of use. Second, cloud platforms enable users to bypass the traditional middleman - the system administrator - in the cluster-provisioning process. These changes give tremendous power to the user, but place a major burden on her shoulders.