The Growing Role of Machine Learning in Monitoring

When does a buzzword stop being a buzzword?

In the world of IT and software development, we are all too used to having terms and concepts thrown around left, right, and center. At some stage, though, widespread adoption of the technology and platforms behind these buzzwords turns them into best practices and realities in the field. (Did I hear someone say “Docker”?)

What about “machine learning”?

While the adoption of machine learning in DevOps is relatively slow compared to other industries, the potential is huge. To start understanding what has to gain from this rapidly developing field, one needs only to look at the world of monitoring and log analysis, where machine learning can be used to alleviate some of the main pain points experienced by DevOps teams — namely, the analysis of vast volumes of data and the extraction of actionable insights from this data.

Based on the monitoring solutions on show at Monitorama this year, I can safely claim that in this space at least, the machine learning revolution is well underway.

Moogsoft

This company offers what it calls “AIOps” (Algorithmic IT Operations) — another term (perhaps a new buzzword?) for the application of machine learning algorithms in the world of DevOps and IT operations.

Using a combination of supervised and unsupervised machine learning algorithms, Moogsoft promises to decrease the signal-to-noise ratio of alerts and correlating those alerts across your toolsets in real-time. This is a known pain point with traditional anomaly detection systems, and Moogsoft promises to deliver highly actionable “Situations” (clusters of related alerts).

moogsoft

In addition, Moogsoft uses algorithms to capture remediation behavior within the Situation Room to recommend remediation steps when future incidents occur that the algorithms identify as similar.

Netuitive

Netuitive calls itself a “full-stack monitoring solution built on a machine learning platform, designed with DevOps teams and modern infrastructure application environments in mind.”

In addition to standard monitoring features (e.g., metrics, dashboards, and alerts), Netuitive offers an anomaly detection system based on advanced machine learning algorithms. This technology allows Netuitive to identify correlations between sets of metrics sent from the different data sources in your infrastructure and applications. For AWS users, Netuitive offers unique cost optimization reports comparing instance utilization with AWS spend.

netuitive

Anodot

Anodot defines itself as a “real-time analytics and automated anomaly detection system that discovers outliers in vast amounts of time series data and turns them into valuable business insights.” In other words, machine learning-based anomaly detection for business intelligence.

Anodot applies its machine learning logic to metrics shipped from multiple sources in your environment and performs automated anomaly detection. Data is crunched up and a normalized range defined. Based on this definition, anomalies are flagged and scored to determine how much of an anomaly the event actually is.

anodot

Event correlation, alerts, and dashboards complement this system.

Perspica

Perspica is another solution — and a new one at that — that has developed machine learning engines to tackle the challenge of monitoring modern infrastructures. Five concurrent engines, to be exact!

These machine learning engines, called “Fab Five Analytics,” analyze a company’s entire time-series data, cross-correlate behavior, and create a normalized baseline. If an anomaly is detected, Perspica’s monitoring tools will notify you. Early indicators of issues that can develop into real problems are also identified.

This AI implementation also promises to mitigate “alert fatigue” by generating actionable alarms for only the metrics that actually represent problems.

perspica

Dexda

Dexda applies natural language processing, semantics, clustering, and topology algorithms to “distill millions of events down to a manageable set of Insights.” Similar to the other solutions described here, Dexda promises to reduce the stream of events and resulting alerts being triggered to allow for the more efficient use of resources and faster issue resolution.

dexda

Dexda’s offering also includes a proactive element, using pattern sequencing to provide a list of early warnings. You can even tune its algorithms based on the needs of your specific business domain.

The Logz.io Approach

While most of the solutions mentioned above offer machine learning-based anomaly detection based on ingested metrics, Logz.io uses a somewhat different approach.

Based on the world’s most popular log analysis platform — the ELK Stack — Logz.io combines machine learning and crowdsourcing to sift through vast amounts of log data and identify events. Instead of focusing on a mathematical analysis of the data, we focus on how humans are actually interacting with the data.

The result of this approach is flagged log messages within Kibana — what we call Cognitive Insights — that signify that critical events may be taking place that you need to take a look at. Read more about Cognitive Insights here.

cognitive insights

The Future

Walking the halls and listening to the talks at Monitorama this year, one message was repeated again and again: Engineers today face two main monitoring challenges — information overload and the resulting “alert fatigue”.

While applying machine learning is not straightforward, to say the least, the potential is obvious and already apparent in some of the tools outlined here. Looking forward, there is little doubt that this is the direction in which the industry is headed.

So – is machine learning a buzzword or not?

Get started for free

Completely free for 14 days, no strings attached.