Apache Log Analysis with Logz.io

apache log analyzer

Due to its ease of use, open source nature, and inherent flexibility, Apache is the most popular web server today. Apache log analysis, however, is nowhere near as popular as the web server itself — despite being very important.

In production environments, huge numbers of Apache logs are being generated every second, making keeping track and analyzing all of this data a challenge for even the most experienced DevOps teams out there. That’s where the ELK Stack (Elasticsearch, Logstash, and Kibana) comes in.

The world’s most popular log analysis platform, ELK provides the tools for easily ingesting and monitoring Apache logs — a super-powerful and fast indexing engine, a flexible log shipper and parser, and a rich interface for visualization and querying.

This guide will show you how to ingest Apache logs into the Logz.io ELK Stack using Filebeat and then analyze and visualize that data. Note: You can use any open source ELK installation to follow almost all of the steps provided here.

Installing Apache

If you’ve already got Apache up and running, great! You can skip to the next step.

If you’re not sure (yes, this happens!), use the next command to see a list of all your Apache packages:

dpkg --get-selections | grep apache

If Apache is not installed, enter the following commands:

$ sudo apt-get update
$ sudo apt-get install apache2

This may take a few seconds as Apache and its required packages are installed. Once done, apt-get will exit and Apache will be installed.

By default, Apache listens on port 80, so to test if it’s installed correctly, simply point your browser to: http://localhost:80.

Installing Filebeat

I will assume that you are running Ubuntu 14.04 and are going to install Filebeat from the repository. If you’re using a different OS, additional installation instructions are available here.

First, download and install the Public Signing Key:

$ curl https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -

Next, save the repository definition to /etc/apt/sources.list.d/beats.list:

$ echo "deb https://packages.elastic.co/beats/apt stable main" |  sudo tee -a /etc/apt/sources.list.d/beats.list

Now, run apt-get update and install Filebeat:

$ sudo apt-get update

$ sudo apt-get install filebeat

Logz.io uses TLS as an added security layer, so the next step before configuring the data pipeline is to download a certificate and move it to the correct location (this step is optional if you’re using the open source ELK):

$ wget http://raw.githubusercontent.com/cloudflare/cfssl_trust/master/intermediate_ca/COMODORSADomainValidationSecureServerCA.crt

$ sudo mkdir -p /etc/pki/tls/certs

$ sudo cp COMODORSADomainValidationSecureServerCA.crt /etc/pki/tls/certs/

Configuring the pipeline

The next step is to configure Filebeat to track your Apache log files and forward them to the ELK Stack.

In the Filebeat configuration file at /etc/filebeat/filebeat.yml, you will define a prospector for each type of logs and add some Logz.io-specific fields (codec and user token) to each prospector. If you were to track other log files, you would need to add a prospector definition for them as well.

So, first:

$ sudo vim /etc/filebeat/filebeat.yml 

Then, the configuration is as follows:

################### Filebeat Configuration Example ############################

############################# Filebeat #####################################

filebeat:

   # List of prospectors to fetch data.

   prospectors:

      # This is a text lines files harvesting definition

      -

      paths:

         - /var/log/apache2/*.log

      fields:

         logzio_codec: plain

         token: tWMKrePSAcfaBSTPKLZeEXGCeiVMpuHb

         fields_under_root: true

         ignore_older: 24h

         document_type: apache

         registry_file: /var/lib/filebeat/registry

In the Output section, define the Logz.io Logstash host (listener.logz.io:5015) as the output destination for our logs and the location of the certificate used for authentication:

############################# Output ########################################

# Configure what outputs to use when sending the data collected by the beat.

output:

   logstash:

      # The Logstash hosts

      hosts: ["listener.logz.io:5015"]

      tls:

         # List of root certificates for HTTPS server verifications

         Certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']

Now, if you were using the open source ELK stack, you could ship directly to Elasticsearch or use your own Logstash instance. The configuration for either of these outputs is straightforward:

Output:

   logstash:
      hosts: ["localhost:5044"]

   Elasticsearch:
   hosts: ["localhost:9200"]

Save your Filebeat configuration.

Verifying the pipeline

That’s it. You’ve successfully installed Filebeat and configured it to ship logs to ELK! To verify the pipeline is working as expected, make sure that Filebeat is running:

$ cd /etc/init.d
$ ./filebeat status

If not, enter:

$ sudo ./filebeat start

Wait a minute or two, and open the Discover tab in your Kibana dashboard. You should be seeing your Apache logs in the main log messages area. If you’re already shipping different types of logs, it’s best to query the Apache logs using:

type:apache

query apache logs

To make things a bit more interesting and play around with more complex data, download some sample access logs.

If you’re using Logz.io, use the following cURL command. Be sure to replace the placeholders in the command with your info — the full path to the file and a Logz.io token (which can be found in the Logz.io user settings):

curl -T <Full path to file> http://listener.logz.io:8021/file_upload/<Token>/apache_access

If you’re using the open source ELK, you can simply copy the contents of the downloaded file into your Apache access log file:

$ wget https://logz.io/sample-data
$ sudo -i
$ cat /home/ubuntu/sample-data >> /var/log/apache2/access.log
$ exit

apache log analysis

Analyzing and visualizing Apache logs

There are various ways to query Elasticsearch for your Apache logs.

One way is to enter a field-level search for the server response. For example, you can search for any Apache log with an error code using this search query:

type:apache AND response:[400 TO *]

You can use Kibana to search for specific data strings. You can search for specific fields, use logical statements, or perform proximity searches — Kibana’s search options are varied and are covered more extensively in our Kibana tutorial.

One of the reasons that the ELK Stack is great for analyzing Apache logs is the ability to visualize the data, and Kibana allows you to create visualizations from your search results — meaning that the specific data in which you’re interested can be reflected in easy-to-use, easy-to-create, and shareable graphical dashboards.

To create a new visualization from a custom search, first save a search by clicking the “Save Search” icon in the top-right corner in the Kibana “Discover” tab.

Once saved, select the Visualize tab:

kibana visualize tab

You have a variety of dashboard types from which to select including pie charts, line charts, and gauge graphs.

You then need to select a data source to use for the visualization. You can choose a new or saved search to serve as the data source. Go for the “From a saved search” option and select the search you saved just a minute ago.

Please note that the search you selected is now bound to this specific visualization, so the visualization will update automatically when you make changes to this search. (Though you can unlink the two, if you like.)

You can now use the visualization editor to customize your dashboard — more information on this will be published soon — and save the visualization. If you wish, you can also add it to your Kibana dashboard or even share it by embedding it in HTML or by sharing a public link.

ELK Apps

Logz.io provides its users with a free library of pre-made Kibana dashboards, visualizations and alerts, called ELK Apps. These apps have already been fine-tuned by Logz.io to suit specific types of log data.

For Apache logs, there are plenty of available ELK Apps to use including an “Apache Average Byte”’ app that monitors the average number of bytes sent from your Apache web server as well as the extremely useful and popular “Apache Access” app that shows a map of your users, response times and codes, and more.

Installing these visualizations is easy — simply select the ELK Apps tab and search for Apache“:

apache elk apps

To use a specific visualization, simply click the Install button and then the Open button.

The ELK app will then be loaded in the Visualization editor, so you can then fine-tune it to suit your personal needs and then load it in the Dashboard tab.

What next?

Once you’ve set up your dashboard in Kibana for monitoring and analyzing Apache logs, you can set up an alerting system to notify you (via either e-mail or Slack or other options) whenever something has occurred in your environment that exceeds your expectations of how Apache and the serviced apps are meant to be performing. Logz.io’s alerting feature allows you to do just that, and you can read up on how to create alerts in this video:

[youtube id=”PjjVzrJrjD0″ width=”600″ height=”350″ autoplay=”no” api_params=”” class=””]

Use Logz.io to Monitor Your Server Logs

Artboard Created with Sketch.
× Book time with us at re:Invent here! Book