The State of Machine Learning in DevOps

By: Amir Kalron

January 18, 2018

DevOps methodologies are increasingly generating large and diverse data sets across the entire application lifecycle — from development, to deployment, to application performance management, and only a robust monitoring and analysis layer can truly harness this data for the ultimate DevOps goal of end-to-end automation.

The recent rise of machine learning — and related capabilities such as predictive analytics and artificial intelligence — has started to push organizations to explore the implementation of a new data analysis model that relies on mathematical algorithms.

Despite the promised benefit of helping teams optimize operations and gain more visibility into their data, adoption of machine learning into the DevOps toolbox is limited.

Let’s try and examine why.

How Machine Learning Can Help

Let’s start with understanding how machine learning can fit into and benefit the DevOps methodology.

There are two key interrelated benefits for implementing machine learning in DevOps: reducing the noise-to-signal ratio and replacing the reactive mode with a proactive approach based on accurate predictions.

Because of the way systems have been monitored for decades now, and because a better approach has not been introduced yet, most teams today use the threshold approach in monitoring. Thresholds are defined based upon conventional wisdom, gut feelings, and habit.

If you have defined 50% CPU as the threshold for your EC2 instances, and this metric goes up to 70%, your auto-scaling group will provide more instances until the entire group goes back down to 50%.

This is of course a huge improvement compared with the previous method in the traditional datacenter environment, which was to constantly hook up new servers. Yet the truth of the matter is that the traditional thresholds approach still results in a high signal-to-noise ratio and alert fatigue, ultimately causing DevOps engineers to chase fires that are not fires to start with.

The machine learning approach is grounded in a more mathematical approach, defining thresholds based upon what is statistically significant and logically sound. Machine learning uses various methodologies and models, such as linear and logistic regression, classification, and deep learning, to scan large sets of data, identity trends and correlations, and make predictions.

The Current State of Machine Learning in DevOps

More and more next-generation tools in the DevOps stack support machine learning to some extent or other, but these tools are often black boxes operating as isolated data silos.

With DevOps teams still too busy putting out fires, and with a lack of DevOps practitioners who truly understand machine learning, predictive analytics and AI, the overall impact of these tools on comprehensive and data-driven automation is still limited.

Monitoring or deployment products that do feature machine learning typically do not provide visibility into how the underlying algorithms work, leaving data scientists skeptical as to whether or not its conclusions are correct. The black box approach also runs counter to normal machine learning procedures that enable the analyst to adjust the algorithm in an iterative fashion until it becomes sufficiently accurate.

Furthermore, and perhaps more importantly, even when the vendor does provide visibility, adjusting the machine learning to the business’s needs requires knowledge that ordinary programmers lack.

DevOps engineers today are required to know how the infrastructure works, how to code, and how to utilize DBaaS in the cloud. Adding machine learning to this skill set is a huge if not impossible challenge since most DevOps engineers are simply not mathematicians.

Obstacle #1 – The Machine Learning Skills Gap

Machine learning is applied mathematics. To understand it, the developer needs a solid understanding of logarithms, calculus, infinite series and sequences, linear algebra, statistics, linear programming, regression analysis, and trigonometry. Most college graduates studied these in college, but have likely forgotten them because they do not use them every day.

Big data programmers understand sets and how to run mapReduce functions. But having extracted and transformed data, they might not know where to go from there to draw conclusions and make projections.

To do so, they need to know what statistical function and algorithm they should use. Should they use logistic or linear regression, k-mean clustering, support vector machines, naive bayes, stochastic gradient descent, or a neural network? All of that lingo is indecipherable to most. Understanding it, though, is what the data scientist does. A recent article in The New York Times says that due to being in such short supply, data scientists can earn up to $300,000 per year.

Obstacle #2 – Organizational Challenges

As mentioned earlier, the number one obstacle to incorporating machine learning in a meaningful way into DevOps is that regular computer engineers do not understand applied mathematics and statistics — and machine learning is all about data science.

As a result, a DevOps machine learning project will often have to be divided among different skill sets and titles. Big Data engineers, Big Data programmers, Data Scientists — putting together a multi-disciplinary team of this nature is in itself a significant organizational obstacle.

Even more daunting is managing such a complex project so that it meets its objectives on time and within budget. It’s not surprising, therefore, that incorporating machine learning deeply into DevOps is not an easy decision for management because it definitely requires hiring new people and forcing your current team to learn and control new skills.

Looking into the future

Despite the challenges and obstacles, machine learning adoption is only going to grow as high salaries push more IT engineers into this space.

The main reason for future growth though is that algorithms will become easier to understand and implement due to the proliferation of frameworks. Google, Facebook, and other companies continue to develop and give away frameworks that allow data scientists and Big Data programmers to do more easily what only a PhD-level researcher could do before.

Also, the knowledge base is growing as developers commit what they have learned to new open-source frameworks and make improvements to existing ones. In addition, more people are becoming trained in these technologies as their use increases, meaning there are more use cases that people can refer to.