Image: Ashalatha/Shutterstock

Fix once, automate the solution and then deploy many times. That’s the philosophy behind, a company that aims to make life easier for site reliability engineers.

Anurag Gupta, founder and CEO of, has plenty of experience in solving operational problems for cloud deployments. He was a vice president at AWS for almost eight years and ran the analytic and relational database services on the AWS Database team. He also spent more than three years as a vice president of engineering at Oracle.

Gupta has two goals for the company:

  1. Make managing, debugging and repairing a fleet of servers as simple as repairing an individual box
  2. Create a tool that makes fixing a problem permanently as easy as fixing it the first time

A site reliability engineer maps out a fix for a frequent or new problem, and the platform figures out how to run the operation in parallel across an entire fleet. This includes running one-line commands as well as remediation loops.

“Rather than taking a month or two months to build an automation, you can do it in a few minutes,” he said.

Gupta sees automating incident response as the next logical step in the DevOps evolution.

“We’ve solved QA automations through pipelines, and now we need automation with configuration and deployment,” he said. “Production ops is a hard problem, and the number of problems scales with the size of your fleet.”

SEE: Why site reliability engineers face more security incidents and higher stress levels (TechRepublic)

Gupta said that fixing a problem the first time is an interesting challenge which quickly turns into a frustrating waste of time when the problem shows up the fifth time.

“One of the tough things that I used to see is that you’ve got your best guy doing one on-call shift but everyone is calling them all the time so they’re always on call,” he said. “You tell us what to do and then we figure out all the resources and distribution and how to do it.”

The platform also has the potential to capture institutional knowledge and make it more widely available.

“Deciding what to do when something happens, it’s not straightforward,” he said. “Shoreline can delete temp space or move logs or do whatever is appropriate in your environment.”

Gupta compared the new automation tool to Rundeck and Splunk. “The difference between us and Splunk is that we let you change your environment and not just look at it,” he said.

Shoreline includes the domain-specific language Op that provides a pipe delimited syntax with the ability to execute any task that can be run at the Linux command prompt.

Gupta said that Op uses resources, metrics and Linux commands as the nouns, adjectives and verbs of operations.

“In one line you can say, ‘If the JVM is running hot, do a deep stack dump and then bounce the JVM,'” he said.

Shoreline works with AWS currently, the Azure integration will be done this quarter and Google Cloud will be next, according to Gupta.

“Then we’ll abstract across all three so SREs will have a single pane of glass,” he said.

Shoreline is hiring engineers in its California and Romania locations, and the company is looking for IT professionals who have experience with distributed systems, functional programming and DevOps.