Cloud

Mesophere tackles container orchestration in a big way

In this interview with Mesosphere cofounder and CTO Tobias Knaup, he talks about mastering the challenges of operating containers at scale.

Image: Mesosphere

It seems there might be something to this container revolution after all.

First, Docker starts breaking loose from its test-and-dev roots. Next, Mesosphere raises another $73 million to continue its push to manage containers at massive scale. If there was any doubt that mainstream enterprises are interested, two of the investors in Mesosphere's Series C round were Hewlett Packard Enterprise (HPE) and Microsoft.

Neither company is interested in test-and-dev science projects. Both get paid to tackle thorny enterprise problems.

I recently sat down with Mesosphere cofounder and CTO Tobias Knaup to learn more about Mesosphere's vision for container orchestration and beyond. Knaup designed the original data infrastructure at Airbnb on Apache Mesos with Mesosphere cofounder and CEO Florian Leibert. Knaup thinks we're at an industry pivot point in the cloud where infrastructures are shifting from managing virtual machines to mastering the challenges of operating containers at scale.

TechRepublic: How is managing containers different than managing virtual machines?

Knaup: Containers are conceptually similar to virtual machines, but in practice are actually quite different and can provide much more functionality. They're different in many ways that developers and operators prefer.

For example, containers start much faster — often in milliseconds as compared with tens of seconds — so they can be used for different kinds of workloads. Containers are more lightweight, and with schedulers like Marathon we can pack more of them onto the same machines. This lets users operate more flexibly and efficiently on the same server footprint, or even on a smaller footprint.

Containers encourage a more agile workflow, with code deployment happening much more frequently. If managed correctly, this can have major benefits for the quality of applications and developer agility. Containers are not black boxes like VMs, so we can look inside and get statistics on what's running. This allows for some pretty cool optimizations and cost savings.

TechRepublic: Why is container operations more than orchestration? What do you mean by container operations?

Knaup: With container operations, we're talking about the whole lifecycle from code being pushed to a repo all the way to a container running in production, potentially for a long time. This lifecycle encompasses a wide range of considerations and capabilities, including orchestration, but also building and testing, artifact storage, security scanning, health monitoring, load management, debugging, and much more.

So, if you actually want to operate large numbers of containers in production, everyone from developers to the operations team needs the right pieces in place to do their jobs. Ideally, these pieces should be automated, scalable, simple, and everything people have come to expect from next-generation IT products.

TechRepublic: Marathon manages more than containers, including long-running services like databases, and even app servers like Tomcat. Some speculate that you might be broadly tuning Marathon to be the runtime for distributed applications in general...?

Knaup: It's more than speculation: Marathon was designed as a generic workload manager from the beginning, even before Docker was a big thing. The goal was to build a bridge from the old world to the new so that existing applications could be deployed and scaled with little or no changes. As a result, it's a great fit for anything that's long running, whether it's a modern application architecture like microservices or a more traditional three-tier enterprise app.

The project actually started by trying to capture the patterns we saw while deploying web services at various places, such as Airbnb and Twitter. Many of the things you want to run in a data center follow the same fundamental scheduling pattern: "Take this blueprint of my app, run 100 instances, and make sure all of them are healthy all the time."

This applies to modern microservices as well as more traditional apps.

This is what Marathon does really well. While we want to keep it laser-focused on this strength, we also see folks using it to run all kinds of other workloads, like distributed databases, message queues, and even Marathon itself. So it really is a great general-purpose runtime. For those more advanced distributed systems that require complex scheduling patterns, DCOS also runs purpose-built packages (for Apache Kafka and Spark, for example), which provide custom-tailored scheduling and management.

TechRepublic: Kubernetes and Swarm have been taking off the gloves a little bit lately. Swarm, for example, cited some performance statistics that they suggested indicated superiority over Kubernetes. What is Marathon's strength in container orchestration relative to these two?

Knaup: These stats remind me of the time when PC vendors were competing on who offers the most gigahertz per dollar. Sure it's an interesting data point, but you have to look at many other things if you want to buy a PC that's useful. This stat about container orchestration performance is probably not going to matter that much.

While we happen to believe Marathon is a high-performance system in its own right, we also have a long list of production workloads — at large companies, no less — to establish Marathon's status. Mesosphere is working closely with Microsoft on its Azure Container Service, which runs a collection of DCOS components, including Marathon, as a foundation. Verizon, Yelp, Samsung, Autodesk, Orbitz, Bloomberg, and many more are already running Marathon in production to power all sorts of interesting systems.

Part of this, of course, has to do with the stability of our DCOS and Apache Mesos. They provide a foundation for Marathon that has been proven over the years at Twitter, Apple, and other large companies, while also offering advanced features around usability, security, networking, and performance. DCOS also lets users run Marathon alongside other workloads, including big data systems, that need more than simple container orchestration to operate.

Really, Mesosphere is trying to address the issues of running an entire 21st century data center. That involves container orchestration for sure, but also a whole lot more.

Also see

    About

    Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.

    Editor's Picks

    Free Newsletters, In your Inbox