Monitoring + Automation: An Elusive Goal

By: Jonah Kowall

Today’s monitoring investments align more often with automation than any other technology. Automation is one of the principal objectives of DevOps to reduce toil, i.e. manual work. This helps keep engineers happy and engaged, allowing for better scale in building and operating applications. Automation typically spans infrastructure and application technologies. The challenge is that many organizations just have too many automation tools. EMA Research reported in a 2018 EMA survey encompassing 3,400 enterprises that 56% of them were in that situation.

The need for so many automation and monitoring tools is because of silos within teams and domain-specific needs. The network team likely has a few tools, the server team different automation needs, and finally the DevOps team both legacy and modern tools. There are also likely layers of automation including current favorite tools and those likely to be deprecated.

The ground is already shifting beneath teams’ feet. While we have modern applications and components that deploy on Kubernetes, only a few applications requiring scale are automated and orchestrated.

We’ll avoid discussing platform-specific automation systems since tools whose use is for special requirements are domain-specific. This could be, for instance, the VRealize Orchestrator (vRO) managing the VMware lifecycle. Or, alternatively, using System Center Orchestration for managing Windows lifecycle. Likewise, there are countless automation tools that tie into specific platforms and applications.

The network, storage, and server automation markets are full of vendor or platform-specific tools to manage their automation requirements. The automation markets of most interest are Robotic Process Automation (RPA) and Application Release Orchestration (ARO). The bulk of applications still run on now-traditional virtual infrastructure. Many of these virtualization platforms include their own automation platforms.

Robotic Process Automation (RPA)

Robotic Process Automation comprises a set of technologies of acute interest. This might be odd considering they automate manual processes that users would do with a user interface. These tools use the keyboard, mouse, and other techniques to interact with systems as a user would. RPA technologies can provide programmatic interfaces to interact with legacy systems.

They can save a lot of time by avoiding the need to modernize systems. And yet, ultimately, they allow legacy to persist which builds technical debt and complexity. This debt makes operations costlier and keeps legacy vendors in business, many of whom have limited business benefit.

Many of these RPA systems are like the IT Process Automation (ITPA) systems of the past. These tools enabled easier ways to build automations via graphical interfaces and often hooked into monitoring systems. ITPA solutions are still in use today, due to their importance keeping legacy applications up and running. An excellent example would be ServiceNow Orchestration, which allows for low code automation and interfacing between external systems to ServiceNow. ServiceNow also provides hooks for orchestration into their own platform, including alerting, ticketing, and other workflow tools.

Application Release Orchestration (ARO)

Gartner defines Application Release Orchestration as technologies that: “enable DevOps teams to automate application deployment, manage continuous integration/continuous delivery pipelines and orchestrate release workflows.”

These tools align more with DevOps teams for software deployment or releases, coordinating systems or containers, and managing other workflows.

Most folks use a combination of tools which fall into the ARO category. Today, the need for automation in DevOps teams starts with continuous integration (CI) systems. The most popular are open source Jenkins or Travis as they have been around for a while and there is much ingrained knowledge across the field.

Apollo: Open Source CI/CD

At Logz.io we use Jenkins to run our CI pipeline. Some will adapt these systems to do Continuous Deployment (CD), but at Logz.io we created Apollo, which is our CD system we use and have open sourced for anyone to use and modify.

The reason we built Apollo was to create a self-service way for engineers to manage Kubernetes auto deployments. Some organizations are using Helm while building packages with CI systems to automate releases. More advanced users who want to automate rollouts often look at open source tools like Spinnaker, which Netflix created and open-sourced. There is also a new add-on to Jenkins called Jenkins X which is designed for Kubernetes and CD, but it’s newer and still unproven. We focused on simplicity with Apollo as these systems are all relatively, highly complex and require a lot of work to implement and maintain.

Ansible, Chef & Puppet

Another alternative is using traditional ARO systems in the pipeline. Popular open source projects include Ansible, Chef, Puppet, Terraform, and many commercial vendors. While these tools provide the ultimate flexibility via glue code and scripting, they don’t have native integrations to other systems, including CI/CD pipelines. Since most integrations use APIs or command line integrations, these scripts and their code are tough to maintain as they age.

Continuous Verification

Yet another approach to solving automation challenges is buying sophisticated commercial offerings which have native integrations to various observability or monitoring tools to create the closed loop feedback necessary to validate changes. The difference between the open source and commercial options, aside from cost, is the level of sophistication built into the product with less coding.

One such choice is Harness, which provides native integration to monitoring tools to create Continuous Verification of changes including automating canary deployment and measuring release quality.

These advanced capabilities are easier to integrate, but you’ll have to pay for those features and capabilities. Harness highlights how their system uses input from various other technologies to drive automated decision making within a CD pipeline with minimal to little custom code.

Similarly, Amazon has created their Pipeline service which also has tie-ins to many external systems, including monitoring tools. These integrations need to be built manually.

Google has built a more advanced system using Spinnaker and Prometheus with similarities to Harness, but requires more manual configuration due to the fact that Prometheus is less advanced than commercial APM tools that better monitor application performance and quality. You can check out the tutorial on Google Cloud Platform’s site.

Keptn

Another alternative is newly emerging. Dynatrace, similarly to Logz.io has created an open source project called Keptn, which they use internally to manage their services. Keptn is an event-driven control plane;one could connect this system to monitoring tools and other systems in order to drive automation. Many of the monitoring tools have AI and ML capabilities, which can act as the intelligence in the release automation. Keptn is new, but is close to CNCF acceptance, which would let a community to be built around this interesting technology.

Keep on Going

There is still much to be done to standardize automation, CI, and CD. None of these systems—aside from Harness—are applying sophisticated AI or ML in an integrated manner to detect and remediate issues using telemetry from observability systems. Keptn shows the opposite approach is also feasible. Of course, one could hook up additional tools to various solutions, but this requires a lot of customization and upkeep.

If you have feedback, comments, or would otherwise suggest any additional data points please message or tweet me @jkowall thanks for reading!