Streaming giant Netflix has revealed how it is making the most of the versatile programming language Python.
The company has detailed the ways it uses Python, one of the world’s fastest growing languages, for everything from operations management and analysis through to security and networking.
Netflix relies on a mix of well-known packages and in-house software libraries, with Python seemingly used in nearly every corner of the business, which is largely run on the Amazon Web Services (AWS) cloud platform.
“We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members,” write Netflix engineers in a blog post.
If you’re interested in finding out more about Python, check out TechRepublic’s guide to free resources for learning Python and this round-up of the best Python guides and code examples on GitHub.
Here’s how Netflix uses Python.
Neftlix’s demand engineering team build resiliency into the network by providing regional failovers and orchestrating the distribution of Netflix’s traffic.
“We are proud to say that our team’s tools are built primarily in Python,” the team writes.
“The ability to drop into a bpython shell and improvise has saved the day more than once.”
Tools used by the team include:
- NumPy and SciPy to perform numerical analysis
- Boto3 to make changes to AWS infrastructure
- rq to run asynchronous workloads
- Flask APIs are used as a wrapper around the orchestration tools above.
- Jupyter Notebooks and nteract are used to analyze operational data and prototype visualization tools. Neflix uses Python to build custom extensions to the Jupyter server that allows engineers to manage tasks like logging, archiving, publishing and cloning notebooks.
Meanwhile, the big data orchestration team provide services and tooling for scheduling and executing ETL (Extract, Transform, Load) of data and adhoc data pipelines.
Python has also been used to develop a time series correlation system, as well as a distributed worker system to parallelize large analytic workloads.
On top of that, Python is also typically used for automation tasks, data exploration and cleaning, and visualization.
Monitoring and automated response
Netflix’s Insight Engineering team is responsible for building and operating the tools for generating alerts, diagnostics, and automatic remediation.
They now support Python clients for most of their services, including the Spectator Python client library, a library for recording dimensional, time-series metrics.
Netflix’s information security team uses Python for a wide variety of tasks, including security automation, risk classification, auto-remediation, and vulnerability identification.
Python projects include:
- Security Monkey– an open-source Netflix library for monitoring AWS, Google Cloud Platform, OpenStack, and GitHub for changes to assets.
- The Bless SSH Certificate Authority to protect SSH resources.
- Repokid allows Python to be used to help with IAM (Identity and Access Management) permission tuning.
- Lemur is used to help generate TLS certificates and Netflix also uses the Diffy forensics triage tool, which it built entirely using Python.
Metaflow, a Python framework that makes it easy to execute ML projects from the prototype stage to production, is used across the company at scale. With Metaflow, Netflix relies on well parallelized and optimized Python code to fetch data at 10Gbps, handling hundreds of millions of data points in memory, and orchestrating computation over tens of thousands of CPU cores.
Jupyter Notebooks are also used for working up new experiments.
Netflix’s scientific computing team for experimentation provides a platform for scientists and engineers to analyze AB tests and other experiments.
Among the Python frameworks they use are:
The Metrics Repo, a Python framework based on PyPika that allows users to write reusable, parameterized SQL queries.
Meanwhile Netflix’s visualizations library is based on Plotly.
Video encoding and automated content analysis
Netflix has a team dedicated to encoding the Netflix catalog and using machine learning to analyse it, for example to extract the best stills from a movie.