Logs contain extremely valuable information. They can tell you what happened and when. It’s the question why that is challenging. In log analysis, the problem is that none of the logs contain a message clearly stating something specific such as this: “Your website crashed due to an awfully-written database query.”
That’s where centralized logging and log correlation comes into the picture. Being able to collect logs from all the various layers in your infrastructure and identify correlations between the different data sources is one of the reasons that the ELK Stack (Elasticsearch, Logstash and Kibana) is so popular. The stack allows you to ingest and store logs from multiple sources easily and then build visualizations and dashboards on top of the data.
If you’re using the Logz.io AI-powered ELK Stack for centralized log analysis and Datadog for metric monitoring, you can now use both for log correlation. This post will describe how to use the new integration between the two platforms. Specifically, we will be creating an alert in Logz.io for a specific Elasticsearch query, ingesting this alert using the Datadog API, and then building a comprehensive dashboard.
Retrieving the Datadog API Key
This first step is easy — retrieving a Datadog API key to ingest Logz.io alerts. This can be done via the Integrations → APIs tab:
Make note of the API key. We’ll need it when we will configure our Datadog endpoint in Logz.io (for the sake of good order, I recommend creating a new key called “Logz.io”).
Creating an Alert
Next up, let’s create a new alert in Logz.io and a new endpoint with which to send the alerts to Datadog.
In this case, I will be analyzing Apache and application logs. As a result of some forensics work, I’ve nailed down a specific log message that I’d like to monitor and on which to receive alerts — it’s a log that is reporting an invalid webapp transaction:
Clicking Create Alert starts the process of alert creation and displays a three-step wizard. You’ll see that the filter that we used when querying Elasticsearch (return_message: “Invalid Transaction”) is already loaded in the “Query” field. You can modify this field any way you like — don’t worry about messing the query up because there is a verification process in place that will validate it.
We’re going to set the trigger conditions to set off an alert if this queried log message is logged more than once within a timeframe of one hour (the Logz.io alerting engine will search for a match every sixty seconds):
After entering a name and description for the alert in the second step of the wizard, it’s now time to define how we want to receive the new alert.
You can, of course, get notified via e-mail, but Logz.io also allows you to integrate with various messaging and alerting apps by creating custom endpoints.
Creating a custom endpoint is pretty straightforward. In the “Notifications” endpoints field, open the drop-down menu and select “Custom endpoint.” In the “Create Endpoint” dialog, configure the endpoint by entering a webhook URL (which you will need to retrieve from the third-party application) and selecting an HTTP method to use (such as GET, POST, or PUT).
In our case, though, we’re going to use the new built-in support for Datadog. To do this, just select “Datadog” from the “Type” drop-down menu, name the endpoint, enter a description, and then paste the API key that you had retrieved from Datadog:
Save the endpoint, and create the alert. It is now added to your defined alerts, and from this point onwards, you should be receiving it as an event in Datadog.
Creating the Dashboard
When you open the “Events” tab in Datadog, you should be able to see the Logz.io alert. Note that the alert will only display a maximum of five events that took place, in our example, five were displayed out of a total of twelve events that actually occurred:
The event gives you the parsed log message in JSON format, as it was indexed in Elasticsearch.
Now, to get a more holistic view of the environment and be able to identify whether there are any correlations in the data, our next natural step is to add this event into a comprehensive monitoring dashboard.
In Datadog, this is done from the “Dashboards” tab — in which there are a number of default dashboards. You can, of course, build your own customized dashboard, but for the purpose of this demonstration, we’re going to open the host dashboard for our server:
This dashboard contains a rich variety of metrics on resource utilization — memory usage, load averages, disk usage, and so forth.
To add the Logz.io events, simply type “logz” in the search box at the top of the page and hit enter. A new window containing the alert that we defined in Logz.io will be added to the dashboard:
Now, to be able to identify whether there are any correlations between the events and the metric collection from our server, all you have to do is hover over the column in the graphs that represent incoming events from Logz.io:
Summing up, the combination of Datadog’s monitoring capabilities and Logz.io ELK-based alerting mechanism can be a powerful toolset to have on your side, both for after-the-fact forensics and predictive analytics.