How SIL International leveraged DevOps

Learn how DevOps brought better efficiency to one organization using a new set of tools and frameworks.

Tips for how to become a DevOps engineer In this intro for TechRepublic's how to become a DevOps engineer cheat sheet, Alison DeNisco Rayome discusses what the job is at its core, why it's in demand, the average salary in the US, and more.

DevOps (a combination of "Development" and "Operations") impacts companies by allowing them to "rapidly deliver software and security updates internally and to customers," writes TechRepublic's James Sanders in "DevOps: A cheat sheet." 

Sanders continues, that "DevOps—a workflow that emphasizes communication between software developers and IT pros managing production environments—is at the forefront when considering how to shape an IT department to fit an organization's internal needs and best serve its customers."

SEE: Telephone interview cheat sheet: Software developer (TechRepublic Premium)
 
I spoke to Phillip Shipley, director of IT Software Engineering at SIL International, a language development organization, about how SIL leveraged DevOps for better work efficiencies.
 
Scott Matteson: Can you provide some specific examples as to how SIL automated tedious tasks via DevOps?

Phillip Shipley: Our efforts to streamline our operations and automate everything that made sense started with testing and deployment. As a first step we worked on implementing continuous integration to automate the execution of tests and report status. 
 
We chose CloudBees CodeShip for this function. We didn't want to run and manage our own CI/CD platform so we looked for something robust, easy to use, and affordable, and found Codeship checked all the boxes. Using a hosted platform also helped create a consistent testing environment so that we could avoid many of the "works on my machine" issues found when developers only run tests on their local machines. 
 
After that, we moved on to automating deployment of applications. With Docker this means Codeship builds our Docker images, runs our tests in containers from these images, pushes the images to a registry for us (either Docker Hub or Amazon's ECR service), and then deploys the updated images into their target environment, either staging or production.
 
Once those processes were established and second nature to us, we moved on to automating our infrastructure with Terraform. This has been hugely beneficial as it enabled us to rapidly scale and reproduce complex sets of infrastructure. We developed modules for many common infrastructure components like auto-scaling groups, VPCs, ECS clusters and services, databases, etc. When we need to spin up a new environment for an application it is a relatively simple exercise of picking out the modules we need and wiring them up.

SEE: How iRobot used data science, cloud, and DevOps to design its next-gen smart home robots (cover story PDF) (TechRepublic)

Another example is automating more intelligent processes for scaling infrastructure. AWS has some nice abilities to scale up and down clusters of servers, but due to all the abstractions it is hard to scale specifically for the metrics you want. For example, you cannot auto-scale an auto-scaling group (ASG) based on desired capacity for an ECS cluster. So we wrote a script to check all the desired ECS tasks for a given cluster, determined how many EC2 instances are needed to support the desired load, and adjusted the ASG accordingly. This isn't possible or supported out of box from AWS. We deployed this script as a serverless Lambda process and scheduled it to run every few minutes. So, when we deploy a new service into a cluster, the infrastructure capacity to support it increases. When we decommission a service the infrastructures capacity decreases as well.
 
A final example is automating replacement of EC2 instances in a way that doesn't disrupt live services. Rather than manually patch and maintain servers, we simply replace them with updated services on a regular basis. When AWS releases an updated AMI we run Terraform to discover it and update the launch configuration for our auto-scaling groups. We then run a script we wrote that intelligently replaces instances one at a time while monitoring ECS task availability, so that we always have redundant tasks running before terminating an instance they may be running on. With this process we can completely replace our production servers without impacting users during business hours.

Time-saving initiatives

Scott Matteson: What are some examples of time-saving initiatives, which produced better results faster?

Phillip Shipley: First off, choosing Codeship for a hosted, easy-to-use CI/CD service saved us a lot of time from having to select a self-hosted option, provision the infrastructure to run it, install it, configure it, learn it, etc. I highly recommend using a cloud-based CI/CD service. 
 
Automating testing and deployment saved a ton of time. Before CI/CD we had to build/package software and send it to another team with the request to deploy during an approved change window. With automation, we developed a high degree of trust in our testing as well as the process for deployment so now we deploy as fast as we can write code, review it, and approve the pull requests. Multiple deployments in a single day is common.
 
Automating infrastructure provisioning has also been a major time saver. One of our services is composed of half a dozen micro-services as well as several AWS services. Configuring a new environment for the service takes at least a day to do manually using the AWS console and is very prone to human error. Using Terraform however takes one-to-two hours depending on how long AWS takes to provision a few of their services, and due to being codified and templitized, the change for human error is quite low.

SEE: IT leader's guide to making DevOps work (TechRepublic Premium)

Day-to-day tasks

Scott Matteson: What are some day-to-day tasks and long-term tasks being performed via DevOps methodologies?  

Phillip Shipley: Day-to-day tasks are primarily in the CI/CD space. We do use Terraform several times a week to create or destroy environments as needed, but maybe not daily. 
 
On a more long term end, beyond just the tools we have, we also combined our development and operations teams as we found our disciplines becoming more similar and collaborative in nature.

Training involved

Scott Matteson: What sort of DevOps skills/training was involved?


Phillip Shipley: Most of our training was on-the-job learning through experimenting and trial and error. We are also fortunate to have a budget for a few of us to attend a couple conferences each year. Attending conferences like DockerCon, HashiConf, and GopherCon are great opportunities to connect with others and learn through sessions. Many of us participate regularly in our local Docker Meetup and DevOpsDays conferences. Typically one or two people on the team will focus on learning something, and if its of value for the rest of the team, they'll teach/coach us on how to use the new technology or methodology.

SEE: Special report: Riding the DevOps revolution (free PDF) (TechRepublic)

Pitfalls in transition

Scott Matteson: Were there any pitfalls in the transition?

Phillip Shipley: From a technologies perspective we got on the Docker boat a bit too early. It was long before "Docker for Windows" and "Docker for Mac," and the process of setting up a consistent development environment for developers on different operating systems was quite involved. Docker has come a long way in the past four years, and now we hardly think about things like setting up a local development environment with Docker.
 
One of the bigger challenges in adopting more of a DevOps approach was on the software design and testing side. Writing and running an automated unit or functional tests is one thing in a local environment, but writing and designing them to run in an ephemeral containerized environment was different. We changed how we started developing applications with this in mind and to ensure that our local development environment was just as ephemeral as any other environment might be to ensure that we were not testing and depending on local changes that would not exist elsewhere.

While making this transition was challenging at times, it has been tremendously valuable as I believe the quality of the software we produce is much higher than it was before as we're designing for more portability and reproducibility.

Lessons learned

Scott Matteson: What did SIL learn from the experience?

Phillip Shipley: Aside from a lot of cool technology, we learned that developers appreciate the opportunity to learn and experiment. The freedom to try new things and fail and grow in skills and understanding along the way is incredibly valuable and usually worth more than cash. 

This isn't true for everyone, though, and I think it is a key indicator for how successful a DevOps transformation can be in an organization. If a team is made up of people who do not have a desire to learn and improve there will be lots of push back and dragging of feet.When embarking on a DevOps journey you need to be ready to move people around and even let them "get off the bus" if they aren't on board for the changes.

Also see