How to tame cloud infrastructure sprawl with open source CloudQuery

Commentary: The cloud makes infrastructure sprawl easier and worse than ever. Here's an open source tool to help you keep it in control.

frustrated-programmer-coding-developers.jpg

Image: skynesher/ Getty Images

The cloud has been a fantastic boon to developers, but one of the dirty secrets of cloud is infrastructure sprawl. Developers, DevOps engineers and sysadmins struggle with this every day.

Today developers must create a bunch of custom scripts, write lots of code and maintain these over time to understand their infrastructure sprawl. Even if you just use AWS, you likely have multiple accounts with no programmable way to do asset inventory and automate queries.

For example, let's say you use AWS and Okta. How do you get visibility into how they work together, what new resources are being created, how they are connected and so on? Your team writes custom scripts and code. That's how.

A new open source startup called CloudQuery solves this problem across multiple clouds in a developer-friendly approach, relying on the popular SQL query language. CloudQuery just announced a $3.5 million seed round to help accelerate product and go-to-market development. Of course, given that it's open source, companies have already started using CloudQuery to tame their infrastructure sprawl, including Bloomberg, CloudBees, Zendesk and Salesforce.

SEE: Hiring kit: Cloud engineer (TechRepublic Premium)

Open sourcing a fix for infrastructure sprawl

For Salesforce lead security engineer Kinnaird McQuade, CloudQuery's appeal comes down to increased productivity in less time. "CloudQuery, which is relatively new to the ecosystem, is one that I'm particularly excited about," McQuade said. "It would have saved me a lot of time in previous cases. The amount of Python scripts that people had to write to substitute a tool like this is kind of ridiculous. So I think it's going to have a big future ahead."

CloudQuery solves the infrastructure visibility problem by extracting away all the configurations of your cloud environments and accounts. It transforms and normalizes that information in a relational database to give developers visibility into every corner of their infrastructure and an automated way to perform orchestration with SQL instead of code. There is a wide array of potential use cases, but most early adopters appear to be using it for security and compliance.

CloudQuery took a page from HashiCorp's Terraform playbook, and it seems to be working. While an imperfect measure, within months after release as open source, CloudQuery already had more than 1,000 stars on GitHub. This is perhaps not surprising given the potential upside: The company claims that it can take as little as half an hour to add a cloud service provider.

SEE: Power checklist: Local email server-to-cloud migration (TechRepublic Premium)

"The enterprise cloud vendors provide tooling, but they only work in the context of that cloud," said Yevgeny Pats, founder and CEO of CloudQuery. "When you have multiple tools, developers have to bash them together with Python scripts. SQL has been around for 50 years, and it's something everyone knows through a join and select."

Not that SQL is a perfect approach. SQL's primary upside and downside is that it's been around for so long–it's not necessarily modern, but it is a well-known technology. CloudQuery could eventually support GraphQL or an alternative, non-relational query language, but the company believes it's easier for developers to build when they have all their data in a relational database rather than the other way around. 

And, finally, open source. For CloudQuery, open source isn't merely a marketing exercise, but rather a way to increase the community around the project. Open source not only encourages integration contributions, it also allows best practices to be shared around managing infrastructure sprawl. 

Intrigued? Give CloudQuery a try.

Disclosure: I work for MongoDB, but the views expressed herein are mine.

Also see