Server Log Analysis with the ELK Stack

server log analysis

Let’s start with a simple definition.

Server log analysis, aka web server log analysis, is the process of collecting, parsing and analyzing log files generated by web servers, the goal of which is to extract insights on requests being made to the server and any issues that might be occurring.

There are various use cases for analyzing server logs, starting with security and compliance and ending with SEO and BI, but most organizations ship access and error logs for monitoring and troubleshooting.

Server access logs

Access logs contain information on when requests to the server were made, where it’s from, what specific pages were requested, response codes, and more. Here is an example of a nginx server access log:

66.249.65.159 - - [06/Nov/2014:19:10:38 +0600] "GET 
/news/53f8d72920ba2744fe873ebc.html HTTP/1.1" 404 177 "-" "Mozilla/5.0 
(iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, 
like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; 
Googlebot/2.1; +https://www.google.com/bot.html)"

Server error logs

Error logs, as their name implies, are records of errors encountered by the server while processing requests, and can also contain extra diagnostic data. Here is an example of an Apache error log:

[Wed Mar 29 11:18:33 1000] [error] [client 167.81.12.11] client denied 
by server configuration: /home/apache2/htdocs/demoapp

Understanding the challenge

As useful as they are, a lot of time and resources used to be required for analyzing these log files.

In the good old days, these logs would be accessed and analyzed manually. Sysadmins or DevOps teams would be required to securely access the hosting server and search the log file for specific strings within them.

While relatively structured, these log messages are still not readable by humans. Today, with environments becoming distributed and larger in nature, the manual strategy does not work. Accessing tens or hundreds of servers, manually tailing files, or scanning them for strings is too time-consuming.

Centralized logging to the rescue

Centralized logging helps organizations tap into server logs efficiently. A centralized logging architecture adds a degree of automation to the process by aggregating all the logs from the different data sources in a central data store, applying parsing and processing so the logs are easier to read and analyze, and providing analysis and visualization tools.

The ELK Stack is the most popular solution, particularly in the open source world, for centralized logging. Elasticsearch is a full-text search and analysis engine where your data resides. Logstash aggregates your logs and processes them before sending them on to Elasticsearch for indexing. Kibana provides you with an easy way to query the data and build beautiful dashboards.

Analyzing server logs with the ELK Stack

Server logs usually reside under the /var/log directory (on Linux-based systems). To track these logs in real time you could, of course, perform a simple tail -f command but for more effective log analysis you would want to ship the logs into your centralized logging platform. In the case of the ELK Stack, there are a few steps involved in building the logging pipeline.

Collecting and forwarding server logs

The best way to track your server log files and send them on into the ELK Stack is to use Filebeat. Filebeat is a lightweight log shipper belonging to the Beats family of log shippers that can be easily configured to track specific log files on a host and send them down the pipeline for processing and indexing.

To install and configure Filebeat, you can refer to our Filebeat tutorial. Below is an example of what a Filebeat configuration for shipping Apache server logs looks like:

filebeat.prospectors:
- paths:
  - /var/log/apache2/access.log
  fields: 
apache_access
- paths:
  - /var/log/apache2/error.log
  fields: 
apache_error

output.logstash:
  hosts: ["localhost:5044"]

Each prospector contains the path to the server log and adds a field identifying the log file for easier analysis later in Kibana. The output is defined as a local Logstash instance.

Processing server logs

Configuring Logstash is a critical step in your server logging pipeline since it is in this step where you apply advanced parsing and data enhancements to your log data.

For example, if we want to be able to add geographical information to our logs for building a nice map visualization depicting where requests are being sent to our server from, we need to use a geoip filter plugin.

Using Logstash and configuring it is a topic for an article of its own. Below is an example of a Logstash configuration file for processing Apache access logs:

input {
  beats {
    port => 5044
  }
}
 
filter {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  geoip {
      source => "clientip"
    }
}
 
output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

The input section defines where we are inputting the data from. In this case, it’s Filebeat. The filter section includes three different filter plugins for mapping the different fields in the message, defining a timestamp field and geo enhancing the clientip field. The output section is defining a local Elasticsearch instance as the destination for the data.

Analyzing server logs

Once a pipeline of server logs is up and running, you can begin to query and analyze them in Kibana.  

Using Lucene syntax, you can query the data indexed in Elasticsearch from within Kibana. For example, you could use a field-level search to look up apache errors:

type:apache_access AND response:[400 TO *]

You can read about other search methods in this Kibana tutorial.

Of course, one of the benefits of using the ELK Stack and centralized logging for server log analysis is the ability to visualize the data. As a simple example, you could create a map of the requests being made to your server using the geo enhancement you applied in the previous step.

map

Or a bar chart depicting the most frequently served URLs:

bar chart

The sky’s the limit – you can slice and dice the data in any way you want. Once you have your visualizations finished, add them up into a comprehensive dashboard that gives you a nice overview of server action.

Summing it up

As with any software-related topic, nothing is simple. While the ELK Stack is a great tool for server log analysis, there are some challenges involved. Building complex pipelines with multiple inputs and advanced processing, as well as building visualizations and dashboards can take time. Be patient.

Having all your server logs stored in one location and being able to efficiently analyze them is one thing, but being able to proactively monitor them is another. Logz.io offers users a built-in alerting mechanism that will enable you to get notified when your server is misbehaving or processing abnormal request activity. Logz.io’s ELK Apps also provides built-in dashboards and visualizations for the various web servers.

If you want to learn more about analyzing specific web server logs, I recommend taking a look at our Apache, IIS and nginx log analysis tutorials.

Use Logz.io to Monitor and Analyse Your Server Logs

Artboard Created with Sketch.
×

Turn machine data into actionable insights with ELK as a Service

By submitting this form, you are accepting our Terms of Use and our Privacy Policy

×

DevOps News and Tips to your inbox

We write about DevOps. Log Analytics, Elasticsearch and much more!

By submitting this form, you are accepting our Terms of Use and our Privacy Policy