A Drupal Log Analysis Tutorial

drupal log analysis elk stack

While most developers and DevOps teams will admit that logging is important, many will still insist on avoiding the task if possible. Although log files contain a wealth of valuable information and should, therefore, be the first place to look at when troubleshooting errors and events, they are often opened only as a last resort.

The reason for this is simple: Log files are not easy. They’re not easy to access, they’re not easy to collect, and they’re not easy to read. Often, they can’t even be found to start with. These problems have only intensified over the past few years, with applications being built on top of distributed infrastructures and containerized architectures.

Drupal applications add another layer of complexity to this, offering basic logging features for developers and being complex creatures to start with. Drupal developers can define the message type and the severity level (for example, “emergency” or “debug”) for logs and have the messages saved to the database. Drupal 8 also provides a logging class (that replaces Watchdog) to write custom logs to the database.

But for modern apps, querying the database for error messages and analyzing Drupal and PHP logs is not enough. There are web server and database logs to sift through as well, and in a normally sized production environment, this means a ton of data. No. A more solid solution is required that will allow you to centralize all of the streams of log data being generated by the app, query this data to identify correlations and anomalies, and monitor the environment for events.

Enter the ELK Stack (Elasticsearch, Logstash and Kibana). The most popular and fastest-growing open source log analytics platform, ELK allows you to build a centralized logging system that can pull logs from as many sources as you define and then analyze and visualize the data.

Logstash is the stack’s log shipper, pulling logs from various data sources before forwarding them to a defined output. Elasticsearch saves and stores the data, and Kibana is the user interface through which logs can be queried, searched for and visualized.

To show an example of using ELK, this article will go through the steps of establishing a pipeline of logs from your Drupal application into the Logz.io ELK Stack. Logz.io provides the ELK Stack as an end-to-end service on the cloud, so there’s no need to install the stack yourself. You can, however, use any instance of the stack to perform the exact same procedure (see these instructions for installing ELK.)

My environment

A few words on the environment I’m using for this tutorial. I’m using an AWS Ubuntu 14.04 instance and have installed Drupal 8 on top of the standard LAMP stack. For instructions on how to get this set up, I recommend reading this Cloud Academy post.

Note: You will need to install the GD extension because this is a minimum requirement for Drupal 8.

Preparing the log files

My first step is to prepare the log files that we want to track and analyze. In the case of a standard LAMP stack, this usually means web server logs, PHP error logs (which include Drupal errors as well), and MySQL logs.

PHP errors, such as undefined variables and unknown functions, are logged by default into the Apache error log file (/var/logs/apache2/error.log), which is convenient in some cases. But to make our analysis work easier, it’s better to separate the two log streams.

To do this, I’m going to access my ‘php.ini’ file and define a new path for PHP errors:

error_log=/var/log/php_errors.log

Next, I’m going to restart Apache and verify the change using phpinfo().

Installing Filebeat

While there are numerous ways to forward data into ELK, I’m going to ship my log files using Filebeat — which is a log shipper created by Elastic that tails defined log files and sends the traced data to Logstash or Elasticsearch.

To install Filebeat from the repository, I’m going to first download and install the Public Signing Key:

$ curl https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -

Next, I’m going to save the repository definition to /etc/apt/sources.list.d/beats.list:

$ echo "deb https://packages.elastic.co/beats/apt stable main" |  sudo tee -a /etc/apt/sources.list.d/beats.list

Finally, I’m going to run apt-get update and install Filebeat:

$ sudo apt-get update && sudo apt-get install filebeat

Now, since Logz.io uses TLS as an added security layer, my next step before configuring the data pipeline is to download a certificate and move it to the correct location:

$ wget https://raw.githubusercontent.com/logzio/public-certificates/master/COMODORSADomainValidationSecureServerCA.crt

$ sudo mkdir -p /etc/pki/tls/certs

$ sudo cp COMODORSADomainValidationSecureServerCA.crt /etc/pki/tls/certs/

Configuring Filebeat

My next step is to configure Filebeat to track my log files and forward them to the Logz.io ELK Stack. To demonstrate this configuration, I’m going to show how to define tracking for my PHP and Apache log files. (The process is similar for MySQL logs as well.)

In the Filebeat configuration file at /etc/filebeat/filebeat.yml, I’m going to define a prospector for each type of logs. I’m also going to add some Logz.io-specific fields (codec and user token) to each prospector.

The configuration is as follows:

################### Filebeat Configuration Example ############################

############################# Filebeat #####################################

filebeat:

 # List of prospectors to fetch data.

 prospectors:

   # This is a text lines files harvesting definition

   -

    paths:

      - /var/log/php_errors.log

    fields:

      logzio_codec: plain

      token: tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb

    fields_under_root: true

    ignore_older: 24h

    document_type: php

   -

   paths:

      - /var/log/apache2/*.log

   fields:

     logzio_codec: plain

     token: tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb

   fields_under_root: true

   ignore_older: 24h

   document_type: apache

   registry_file: /var/lib/filebeat/registry

In the Output section, I’m going to define the Logz.io Logstash host (listener.logz.io:5015) as the output destination for our logs and the location of the certificate used for authentication.

############################# Output ########################################

# Configure what outputs to use when sending the data collected by the beat.

output:

 logstash:

   # The Logstash hosts

   hosts: ["listener.logz.io:5015"]

   tls:

     # List of root certificates for HTTPS server verifications

     Certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']

Now, if I were using the open source ELK stack, I could ship directly to Elasticsearch or use my own Logstash instance. The configuration for either of these outputs, in this case, is straightforward:

Output:


  logstash:
    hosts: ["localhost:5044"]

 Elasticsearch:

   hosts: ["localhost:9200"]

Save your Filebeat configuration.

Beautifying the PHP logs

Logstash, the component of the ELK Stack that is in charge of parsing the logs before forwarding them to Elasticsearch, can be configured to manipulate the data to make the logs more readable and easier to analyze (a.k.a., log “beautification” or “enhancement”).

In this case, I’m going to use the grok plugin to parse the PHP logs. If you’re using Logz.io, grokking is done by us. But if you’re using the open source ELK, you can simply apply the following configuration directly to your Logstash configuration file (/etc/logstash/conf.d/xxxx.conf):

if [type] == "php" {

   grok {

      match => [

         "message", "\[%{MONTHDAY:day}-%{MONTH:month}-%{YEAR:year} %{TIME:time} %{WORD:zone}\] PHP %{DATA:level}\:  %{GREEDYDATA:error}"

         ]

   }




   mutate {

      add_field => [ "timestamp", "%{year}-%{month}-%{day} %{time}" ]

      remove_field => [ "zone", "month", "day", "time" ,"year"]

   }

   date {

      match => [ "timestamp" , "yyyy-MMM-dd HH:mm:ss" ]

      remove_field => [ "timestamp" ]

   }

}

Verifying the pipeline

It’s time to make sure the log pipeline into ELK is working as expected.

First, make sure Filebeat is running:

$ cd /etc/init.d

$ ./filebeat status

And if not, enter:

$ sudo ./filebeat start

Next, open up Kibana (integrated into the Logz.io user interface). Apache logs and PHP errors will begin to show up in the main display area.

In this case, we’re getting an undefined variable error that I have simulated by editing the ‘index.php’ file. Note that since I have other logs coming into my system from other data sources, I’m using the following Kibana query to search for the two log types we have defined in Filebeat:

type:php OR type:apache

searching for php or apache drupal logs

Analyzing the logs

To start making sense of the data being ingested and indexed by Elasticsearch, I’m going to select one of the messages in the main display area — this will give me an idea of what information is available.

Now, remember the different types that we defined for the Filebeat prospectors? To make the list of log messages more understandable, select the ‘type’, ‘response’, and ‘level’ fields from the list of mapped fields on the left. These fields were defined in the grok pattern that we applied to the Logstash configuration.

drupal log messages kibana

Open one of the messages and view the information that has been shipped into the system:

{
   "_index": "logz-dkdhmyttiiymjdammbltqliwlylpzwqb-160705_v1",
   "_type": "php",
   "_id": "AVW6v83dflTeqWTS7YdZ",
   "_score": null,
   "_source": {
      "level": "Notice",
      "@metadata": {
         "beat": "filebeat",
         "type": "php"
      },
      "source": "/var/log/php_errors.log",
      "message": "Undefined variable: kernel in /srv/bindings/710eee3fb41644e5b806b270be851601/code/index.php on line 19",
      "type": "php",
      "tags": [
         "beats-5015"
      ],
      "@timestamp": "2016-07-05T11:09:39.000Z",
      "zone": "UTC",
      "beat": {
         "hostname": "ip-172-31-37-159",
         "name": "ip-172-31-37-159"
      },
      "logzio_code": "plain"
   },
   "fields": {
      "@timestamp": [
         1467716979000
      ]
   },
   "highlight": {
      "type": [
         "@kibana-highlighted-field@php@/kibana-highlighted-field@"
      ]
   },
   "sort": [
      1467716979000
   ]
}

Visualizing the logs

One of the advantages of using the ELK Stack is its ability to create visualizations on top the data stored on Elasticsearch. This allows you to create monitoring dashboards that can be used to efficiently keep tabs on your environment.

As an example, I’m going to create a line chart that shows the different PHP and Drupal errors being logged over time.

Selecting the Visualize tab in Kibana, I’m going to pick the line chart visualization type from the selection of available visualizations. Then, I’m going to select to create the visualization based on a new search and use this query to search for PHP and Drupal events only: ‘type:php’.

All that’s left now is to configure the visualization. Easier said than done, right? The truth is that creating visualizations in Kibana can be complicated at times and takes some trial and error testing before fine-tuning it to get the best results.

We’re going to keep it simple. We’re using a simple count aggregation for the Y-axis and a date histogram cross-referenced with the ‘level’ field.

The configuration for our line chart visualization looks as follows:

line chart visualization drupal log analysis

Hit the green Play button to see a preview of the visualization:

drupal log analysis visualization

A common visualization for web application environments is a map of web server requests. This gives you a general picture of where requests are coming from (and in this case, from where yours truly is writing this post).

Selecting the TileMap visualization this time, I’m going to change my Kibana query to:

type:apache

Then, the configuration is simple:

map visualization configuration drupal

Of course, these are merely basic demonstrations of how to visualize your log data in Kibana and how ELK can be used to analyze and monitor Drupal applications. The sky’s the limit. You can build much more complex visualizations and even create your own custom Kibana visualization type if you like.

Once you have a series of visualizations for monitoring your Drupal app, you can collect them in a dashboard giving you a general overview of your environment.

Parse, Visualize, Set Up Alerts and Leverage AI with Cloud-Based ELK

Artboard Created with Sketch.

2 responses to “A Drupal Log Analysis Tutorial”

  1. Gaurav says:

    Thanks for this article but step related to downloading the certificate is not working. Seems URL is returning 404.

Leave a Reply

Your email address will not be published. Required fields are marked *

×

Turn machine data into actionable insights with ELK as a Service

By submitting this form, you are accepting our Terms of Use and our Privacy Policy

×

DevOps News and Tips to your inbox

We write about DevOps. Log Analytics, Elasticsearch and much more!

By submitting this form, you are accepting our Terms of Use and our Privacy Policy
× Book time with us at re:Invent here! Book