Here’s a short recap of the topics discussed and a link to relevant resources in case you missed it.
The Challenge of Log Analysis
In the webinar, we reviewed the various challenges facing DevOps and IT operations teams when they attempt to analyze logs in a modern environment:
To sum it up, log analysis today remains a highly complicated and resource-consuming task for even the most skilled team despite that advanced monitoring and logging platforms that are available today.
As explained in the webinar, this is due to the fact that at the end of the day, behind these tools sits a human being who needs to somehow collect the data from the various data sources, connect the dots, and ultimately extract actionable insights to be able to make informed and timely decisions.
Facing an ever-growing amount and diversity of log data, companies are struggling with this “Big Data” challenge.
Why Current Methodologies Are Not Enough
So, how do companies attempt to overcome this “finding a needle in a haystack” challenge?
Most solutions today use anomaly detection via mathematical analysis of the raw data. The problem with this method is that it is highly susceptible to false alerts and a high signal-to-noise ratio. In the webinar, we went through the three main reasons — and showed examples that explain why — that anomaly detection cannot work effectively in modern log analysis:
- First, not every anomaly is an error. Seasonal peaks are a good example of this — take weekends or Christmas Eve. Any monitoring tool would alert you of an anomaly taking place as traffic picks up, but does this mean your app is failing? Not necessarily.
- Second, not every error represents itself in an anomaly. Some errors are extremely gradual and occur over time. For example, take resource consumption and memory usage. Chances are, you’ll know that something had been happening only after your server crashes.
- And third, anomaly detection, as it is performed today, cannot accurately predict or identify errors in applications that are constructed on step functions and exception paths. Because of how most of our code is written today, it’s impossible for a mathematical formula to understand how it’s supposed to execute and when an event is really taking place.
Of course, anomaly detection can work great for very specific metrics — such as monitoring server response time when analyzing user behavior — but it is not a viable solution for monitoring modern applications and IT infrastructure.
What are Cognitive Insights™ and How It Works
In the webinar, we described Cognitive Insights™ as a new AI platform that combines machine learning and human interaction with data. It transforms manual DevOps and IT operations log analysis tasks into automated scientific processes that uncover otherwise overlooked events and enriches them with actionable data about context, severity, relevance, and next steps.
So, how does it work?
Since we understand that most of the log data is meaningless and that no human or machine can collect and analyze the log data on their or its own, we’ve unified both human and machine into one, patent-pending technology called UMI™ (Unified Machine Intelligence) which is at the core of Cognitive Insights™:
UMI™ identifies human interactions with log data including discussions on StackOverflow or Serverfault, Google searches for relevant information, and issues that are posted on GitHub; correlates these interactions with our log data; and then displays these as events or insights, in the Logz.io UI.
In other words, Cognitive Insights™ focuses on human interactions with the data instead of just the data itself. Imagine troubleshooting issues in production together with hundreds and thousands of other engineers in the room. That’s precisely what this feature is all about — helping you to harness the rich and ever-growing amount of knowledge that is available on the Web to discover and resolve issues quickly and easily.
To help understand Cognitive Insights™, the webinar includes a brief demonstration of the technology, the recording of which is available below, together with additional resources: