How to trim your cloud infrastructure costs

Moving away from on-demand pricing and analyzing the ROI of compute-intensive workloads are two ways to reduce cloud spend.

How to become a cloud engineer: Cheat sheet

As the coronavirus shutdown stretches on, more and more companies are planning layoffs and budgets cuts in response. In PwC's fourth COVID-19 Pulse Survey, CFOs had no good news: 

  • 80% expect that COVID-19 will decrease revenue and/or profits this year
  • 86% are considering cost containment measures
  • 53% are projecting losses to be greater than 10% this year 
  • 32% anticipate layoffs in the next six months

One way IT teams can cut costs is by reviewing cloud infrastructure costs--considering new pricing models, optimizing workloads, and looking for ways to add more work to existing infrastructure. 

Ashish Thusoo, cofounder and CEO at Qubole, describes the elastic nature of cloud infrastructure as the technology's biggest strength and its biggest weakness.

"This agility also means if you are not careful, you can spend a lot on infracture that you don't need or infrastructure which may not be the right infrastructure for the workload," he said.

SEE: Cheat sheet: The most important cloud advances of the decade (free PDF)

Thusoo also said that Qubole clients who initially prioritized computing power and turnkey solutions are now focused on how they can get more work out of existing infrastructure. Qubole is a data lake platform for machine learning, streaming, and ad-hoc analytics.

If your department is looking at budget cuts, here are some ideas for cutting your cloud spend.

Cost-savings infrastructure changes 

Jean Atelsek, an analyst on 451 Research's Cloud Transformation team and Digital Economics unit, said that some of the most common cloud infrastructure mistakes are: 

  • Failure to delete unused block storage volumes that are left behind when the instances they were attached to are terminated
  • Infrastructure and storage sprawl due to insufficient tagging of instances and volumes
  • Unnecessary data egress charges because of multi-AZ redundancy for workloads that don't require this resiliency 

Atelsek also suggested using scheduling tools can turn instances off during periods when they're not likely to be used.

"Defaulting to turning resources off during non-working hours can save plenty of dough, especially if you're running expensive resources such as those using accelerators for compute-intensive jobs," she said.

Another change to consider is cloud-native architectures like containerized workloads which generally enable more efficient use of resources.

"Also, serverless architectures make it possible to run tasks only in response to events, so savings can be considerable versus having an always-on instance waiting for work to come in," Atelsek said.

Thusoo recommended looking for ways to add new workloads to existing cloud infrastructure.

"You pay for the machine for the whole time but you might be using it at only half capacity, so you're wasting money," he said.

SEE: How to build a successful career as a cloud engineer (free PDF)

Atelsek said auto-scaling is another way to increase utilization and better match spending to demand.

Finally, consider storage lifecycle management for infrequently accessed volumes. Many cloud providers offer tools that will automatically switch storage to cheaper tiers if they haven't been accessed for a user-specified period of time, Atelsek said. 

Consider a different pricing model

Atelsek recommends using a combination of pricing models for the greatest efficiency: Reserved instances for 24/7 applications and predictable usage; spot instances for bursty or batch workloads; and on-demand for unpredictable needs. 

Atelsek also suggested looking for third-party cost optimization tools in cloud provider marketplaces. These services will log usage and spending across clouds and suggest savings based on usage in a customer's account. 

"Some of these tools only analyze spending retrospectively, so you get a forensic view that may be too late to prevent wasted spending on, say, zombie EBS volumes for instances that have been terminated," she said.

Thusoo suggests moving as much infrastructure as possible from on-demand pricing to spot pricing.

"Platforms that can use spot instances to do things become very important because when you are running thousands of machines, it can save a lot of costs on infrastructure," Thusoo said.

Another way to control costs is to analyze the ROI of a particular workload to determine whether the cost of the infrastructure is covered by the business results such as increased user engagement or revenue. Qubole's platform includes a Cost Explorer that does this math. Thusoo used the example of workloads designed to identify product or content recommendations to increase user engagement. 

"For that data pipeline, they will know that user engagement went up by 1%, which can be converted into this much revenue," he said. "If you know the workload uses so many machines for this much time and you've paid this much for it, you've got the ROI."
He said that with always-on, long-running workloads, cloud engineers have to optimize constantly to control costs.

Also see

Clouds computing technology futuristic design double exposure with Hong Kong city

Image: cofotoisme, Getty Images/iStockphoto