Data Centers

4 critical lessons DevOps admins can learn from Netflix's container journey

Netflix chose to build their own container orchestration system. Here are some of the key lessons the company learned along the way.

It's difficult to transform a large enterprise IT operation. Applications that were bleeding edge when written become legacy in only a few short years.

Take web-scaler Netflix, for example. Netflix remains a prime example of an organization that leverages public cloud for extensive operations. Most of Netflix's applications have mainly run within virtual machines, but the firm recently went on a journey toward providing containers as an option within their infrastructure. Here four lessons to draw from their experience.

SEE: Quick glossary: DevOps (Tech Pro Research)

1. Governance

Netflix is a bottom-up organization. The governance drove many of their container orchestration design decisions. Operations didn't dictate what applications must go in containers—it remained up to the individual application teams to determine which of their services go into containers and which applications remained in virtual machines.

Enterprises should always start with governance when considering a container strategy. I've seen many organizations deploy cloud-native technology only to see it go unused. The primary challenge is culture. Either there's no incentive to adopt the technology, or no sponsorship to force adoption. In Netflix's case, the container team motivation began with providing value to their application community.

2. Kubernetes vs. Titus

Netflix chronicled their container journey in a white paper. Running containers at scale requires orchestration, and Netflix started their journey near the beginning of the Kubernetes open source project. Netflix had to decide if it would build its own orchestration platform or adopt an existing platform.

Netflix chose to build a dedicated container orchestration platform called Titus. While Netflix claims most organizations look to write greenfield applications on new container platforms such as Kubernetes, its team wanted to consider existing applications as well. Therefore, Netflix chose to build their Titus container management system on top of Mesophere.

Today, Kubernetes has broad support for brownfield applications. For example, Docker Swarm now integrates Kubernetes into Swarm clusters. Also, operations teams can deploy legacy apps into Docker containers and deploy the containers to Kubernetes clusters.

3. Container networking

Organizations have to give considerable thought to container networking. Networking is especially important as organizations design application interactions between legacy applications. Netflix's Titus enabled container-to-container networking to conserve IP address space. The solution also allows placing containers directly on the routable network address space of existing applications.

A common approach within enterprise deployments of containers is to adopt a network overlay. Every major network vendor, such as VMware, Cisco, Juniper, Extreme Networks, and Big Switch offer Kubernetes container support. Each solution plugs into Kubernetes to enable both overlay support and security support. And applications can use native Kubernetes network APIs to control security policies.

4. Public cloud

As noted in an earlier TechRepublic post, Netflix is an extremely large consumer of Amazon Web Services (AWS). Although, integration with AWS Identity and Access Management (IAM) proves an operational challenge. In Titus, Netflix created a proxy service that enables legacy applications to remain unchanged. Titus leverages IAM roles to enable a single Titus node to adopt an IAM role for the containers running on the node. As part of workload placement, Titus must take IAM security into consideration.

Another consideration is leveraging EC2 instances as container hosts. Prior to container adoption, Netflix was challenged with the inefficiency of EC2. Containers allow Netflix to slice EC2 instances into smaller units by placing multiple workloads in a single EC2 instance. Netflix has seen a higher level of efficiency as a result.

Also see

    netflix.jpg
    Image: iStockphoto/kasinv

    About Keith Townsend

    Keith Townsend is a technology management consultant with more than 15 years of related experience designing, implementing, and managing data center technologies. His areas of expertise include virtualization, networking, and storage solutions for Fo...

    Editor's Picks

    Free Newsletters, In your Inbox