As more and more enterprises embrace cloud-native applications and microservices, some argue that there is a need for a reimagined software stack. For cloud-native applications, there are new networking abstractions that engineers have to layer on (writing on new logic) to achieve reliability between services.
Currently, a consensus is forming among distributed systems developers around a new set of networking requirements for the new stack dubbed the "service mesh," pioneered by startups such as Buoyant with their Linkerd project, and closely followed by Google, IBM, and Lyft with Istio. Talking with Phil Calçado, who just joined Buoyant from DigitalOcean (and before that SoundCloud), I asked him how tackling scaling and reliability issues at SoundCloud and DigitalOcean led to the introduction of this service mesh...and how the copycatting of other companies drives innovation.
Monkey see, monkey do
Riffing on Picasso, the late Apple CEO Steve Jobs once said, "we have always been shameless about stealing great ideas." He's not alone. Indeed, while the industry has spent years trying to convince itself that the only way to innovate was through copyright and patent, open source has turned all that on its head. The best, most innovative software being released these days tends to be open source (TensorFlow, Apache Spark, etc.), and results directly from developers learning from, and copying, one another.
Talking with Calçado, this is precisely how the service mesh was born.
SEE: 15 books every programmer should read (free PDF) (TechRepublic)
While at SoundCloud, he and his team spent years building a sophisticated microservices/cloud-native platform that allowed for engineering teams to move fast. Once he had moved to DigitalOcean, however, he had to start from scratch, effectively rebuilding what he'd had at SoundCloud.
"A lot of the internal tooling we had to build at DigitalOcean was a one-for-one copy of what we had built at SoundCloud, and a lot of what we did at SoundCloud was a copy of what Twitter, Netflix, and others had," Calçado said.
Open source wouldn't have helped much in this case, as SoundCloud uses Scala while DigitalOcean was exclusively a Go shop.
Even so, Calçado feels strongly that it's time for the industry to stop rewriting, company by company, and developer by developer, the underlying microservices platform. He told me:
In software engineering we are pretty good at pushing commodity software to the infrastructure, that's why we have operating systems and network stacks. After the industry has been through a few iterations on how to do microservices, I think we have a good enough understanding to start pushing down the stack, to the underlying platform. We need this code that each company writes over and over again to be as commonplace as the TCP/IP stack present in every operating system.
This next step in infrastructure is what a service mesh provides. It shouldn't be closed within any particular company.
Getting to a service mesh
Not all companies realize yet how much they need (or will soon need) a service mesh. For Calçado, however, the need was drilled into him while at SoundCloud.
According to Calçado, the path towards microservices was motivated by the need to move fast while still complying with the "crazy requirements of the music industry." Following this strategy, Calçado's SoundCloud team was finally able to execute on its product without being slowed down by engineering and change management bottlenecking around the monolith, something he has written extensively about.
And yet the "how" has largely gone unanswered.
SEE: How Twitter's Fail Whale could save your company (TechRepublic)
"To take us there," he said, "we basically had to dedicate about 30% of our engineering time, sometimes much more, to build the tools and platforms we needed to make sure that adding more services wouldn't hit the overall productivity." This started with each team writing a little tool here and there and eventually they consolidated all these efforts into a core engineering team whose main task was building and maintaining these tools.
It was a "massive investment," he said, but it set his engineering team free to work on actual product features instead of infrastructure—stuff that could make or break the company. It was a lonely task, as back then there wasn't much in terms of container and microservice tooling and platforms that they could leverage from the open source community, or even from vendors.
The reality, Calçado said, is that "behind the increased productivity and reduction in time-to-market there was a massive investment from the organization in making sure this new architecture would work." SoundCloud was an Alexa top 100 platform with 150 million users per month and could afford this investment, but every day different companies go down this path without much knowledge about the fixed costs of microservices.
Fortunately, this pioneering work on a service mesh is coming together. While Linkerd and Istio ostensibly compete, the two camps increasingly collaborate with the goal that the two complement each other seamlessly. It's how innovation works in an open source world, and everyone is richer as a result.
- How Twitter's Fail Whale could save your company (TechRepublic)
- How the Linkerd service mesh can help businesses avoid catastrophic app failure (TechRepublic)
- Why the container community is wrong to whine about Docker (TechRepublic)
- Why microservices are about to have their "cloud" moment (TechRepublic)
- Why VCs look to tech giants like Google and Facebook to see the future of data infrastructure (TechRepublic)
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.