In Kubernetes terms, an operator is a piece of software designed to run routine operations for specific pieces of software on a Kubernetes cluster. The name comes from human operators, which used to be an actual job.
When I was graduating college, I interviewed for a position as a computer operator. While all positions in IT were expected to grow, the career counselors told me that human operators were expected to shrink. The role involved running shell scripts, FTPing files, and handling exceptions, as was due to be automated away by tools like cron.
SEE: Kubernetes security guide (free PDF) (TechRepublic)
That advice turned out to be correct. Today, too many people running Kubernetes are spending too much time doing routing operations. Now those jobs are being automated as well.
Portworx, a cloud storage provider, has created its own operator for Kubernetes, one of only seven certified to the highest level of capability. I asked Michael Ferranti, a vice president of product at Portworx, to explain how operators evolved.
How did we come to need Kubernetes operators?
Ferranti describes Kubernetes itself as a solution to the operator problem. Google needed to hire system administrators and reliability engineers for its ever growning cloud and it simply could not hire enough. The problem was not budget, as Google is wildly profitable. The problem was getting enough people who were smart enough to make the decision to be system administrators and move to Silicon Valley, then building enough buildings to house them. So, Google built a generic framework to manage a cluster of virtual machines running any application.
While Kubernetes is sometimes described as an all-singing, all-dancing system, it can actually require intervention and management, for example to deal with scaling as workloads shift. Ravi Lachhman, an evangelist at Harness, claims that Kubernetes was initially designed for relatively simple, autonomous, stateless systems that could handle their own failover.
What Kubernetes does not have is domain expertise. Lachhman named the term the “trifecta of a stateful app.” His three pieces of the trifecta are clustering, load balancing, and replication. These features are typically associated with a need for high availability. According to Lachhman, the trifecta causes a great increase in the effort to manage the application. That is the point where people either add human operators or write code to manage the infrastructure. The Kubernetes-native way to do this is to code an operator.
What do Kubernetes operators do?
The Kubernetes project describes five levels of capability for operators. Sometimes called levels of maturity, these broadly correspond to the skill level of a human operator. Level one provides for basic install of the tool, including provisioning or negotiating with the cluster for the resources to run the application. Level two provides seamless upgrades for patches and minor versions, while level three includes backup and failure recovery. At level four, the operator is handling alerts, log processing, and doing workload analysis, while at level five the operator is solving some of the scaling issues Kuberenetes fails to address outside of the box, along with advanced topics like tuning the configuration or scheduling of workloads.
SEE: What is Kubernetes? (free PDF) (TechRepublic)
Once an operator exists for a specific tool, such as Redis, CouchDb, or Kafka, the author can put it in github and open-source it. Creating the operator might take a person-year and save a half a person-year, per person-year, for one cluster. Deploy that over a thousand companies and a thousand clusters, and we are talking about some savings.
If you are running Kubernetes, or thinking about it, it can’t hurt to check Operator Hub for your most popular open-source packages. If your application isn’t there, or your software is in-house, you can always write your own.
Should you write your own Kubernetes operator?
If you have an in-house application that needs Lachhman’s trifecta—clustering, load balancing, and replication—that leads to ongoing maintenance costs. If you have google-scale problems and don’t want to have to hire teams of teams of software reliability engineers to do what can, to some extent, be programmed, then writing your own operator might make sense.
SEE: Kubernetes version 1.18 has 38 enhancements (TechRepublic)
Operators consist of two pieces: The code to perform the commands (written on Go, Ansible, or for simpler work, Helm) along with the Custom Resource Definition (CRD). The CRD maps the operator code back to the kubectl command. That makes calling an operator’s features as creating a YAML file and passing it to Kubernetes through kubectl-apply.
Then again, someone on the team has to know Go, Ansible, or Helm. That code probably needs to be stored in version control and managed as a software development artifact. To make it more than a single point of failure, the team will want a second or third programmer fluent in Go or Ansible.