As a U.S. Air Force pilot and veteran of the
Korean and Vietnam wars, Colonel John Boyd is well known in military
and business circles for his impact on strategic decision-making. Boyd developed
a concept called the OODA loop (for observe,
orient, decide, and act) to aid military combat. The idea came about as a
way to handle decision-making under pressure.

The basic premise was that victory could occur
by creating situations in which one can make appropriate decisions faster and
catch the opponent off guard. Harry Hillaker (chief designer of the F-16) said
of the OODA theory: “Time is the dominant parameter. The pilot who goes
through the OODA cycle in the shortest time prevails because his opponent is
caught responding to situations that have already changed.”

This is exactly the scenario that IT managers
face today given the continuous change within cloud computing, virtual systems
and modern, web-based applications.  The
confluence of cloud computing and open source code has made it easier than ever
to quickly deploy systems into production and iterate frequently.

However, this agility often comes at a price: as
the pace of development speeds up, IT operations staff are constantly playing
catch up to the current state of the system. Visibility and context suffer. Ops
and Dev people become adversaries instead of allies. In modern IT operations,
the OODA loop process can help companies stay ahead of issues and react quickly
and accurately. Here’s the construct:


Through collecting early and detailed
information into infrastructure and application changes, IT has the foundation
to make winning decisions that save the day. This relies upon the ability to
acquire quality data, all the time. In the past, the constraints of data
engineering techniques and hardware have made it difficult to obtain a
high-quality data set without maxing out resources. The result has been a
low-resolution slideshow that is a vague abstraction of the dynamic system it’s
meant to represent. Now, advances in computing power and the advent of
low-cost, on-demand cloud computing and SaaS have made it viable to gather and
analyze terabytes of real-time data. Modern monitoring tools are like
purchasing the latest digital cameras: you get a sharp, high resolution picture
of application and network behavior by the second. This ultimately feeds
insights that are critical for managing performance.


Orientation is the process of contextualizing
data in a larger narrative to gain insight. In a typical monitoring system this
is the point where human and machines interact. The system summarizes data as
graphs, tables, dashboards, and whatever else. It’s the job of the operator to
absorb these representations and construct the narrative behind them. The
ability to review the data and see a clear timeline and a storyline is a vital
modern IT operations requirement. This
provides both the bigger picture view as well as all the data points that
relate to it. It takes experience to be able to weave raw data together into
the big picture and the tools should help augment this experience and guide
operations professionals to the right conclusions.


As it grows, the narrative guides IT Operations
towards the contributing factors in a pending failure. An important point:
there is no single root cause of application issues in modern, hybrid or
cloud-based environments. The larger the failure, typically the more complex
are the causes behind it, which often leads to slower resolution. Therefore,
it’s important to take immediate action when an issue is critical, based on the
data you have now.


In the above scenario, the operator will adjust
the request timeouts in the Web tier to 1 second. The premise is that adjusting
latency will significantly lower the rate of request failures. The management
system should provide real-time feedback as to whether the action you take is
actually the right one. However, as in science, positive results are only tentative
confirmation.  Only through successive
iterations of observation and orientation will the theory hold firm.

Underpinning the four steps for completing the
OODA loop is speed: the loop must run faster than the rate at which the system
can change. If the feedback from action or inaction takes too long to observe
or orient, then decisions will be made against either stale data or data
resulting from a natural change in the system dynamics.

For instance, if IT takes action after traffic
has already peaked for the day, the problems may naturally resolve regardless
of the intervention.  Worse, the action
may deepen the problem and without high-resolution data, the regression may fly
under the radar until traffic once again starts to rise.

The convergent trends of cloud computing and
open source software are principally about enabling agility when developing and
deploying code in production. This newfound agility breaks the assumptions of
existing monitoring solutions, which grew up in an era of static computing

The future of monitoring in the cloud is high
resolution, real-time and optimized for rapid integration of data into a
narrative model. Part and parcel to technology is organizing people around the
cause. In resolving multifaceted performance issues, there’s rarely just a
single person involved.

Enabling streamlined collaboration between
different parties, preferably in the DevOps fashion, is imperative to
supporting the model of OODA and fostering business agility. Companies want to
realize the benefits of cloud computing― cost savings, flexibility and driving
innovation, to name a few. Making the best decisions rapidly can be a life or
death matter for cloud computing in the modern IT operations environment.

About the author:  Cliff Moon is
CTO and Co-founder of Boundary.