logstash tutorial

A great use for the ELK Stack is the storing, visualization, and analysis of logs and other time-series data. Logstash is an integral part of the data workflow from the source to Elasticsearch and further. Not only does it allow you to pull data from a wide variety of sources, it also gives you the tools to filter, massage, and shape the data so that it’s easier to work with. This Logstash tutorial gives you a crash course in getting started with Logstash.

How to Install Logstash

The only requirement for installing Logstash is Java 7 or higher. Everything else you need, including JRuby, the language Logstash was written in, is included in the Logstash bundle. The easiest way to confirm if you have the correct version of Java installed is to run the following in your CLI:

java -version

It should print out something like the following:

java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

The important part is:

java version “1.7.0_80”

As long as the number behind the first 1 is 7 or higher, you’re good to go.

Once you’ve established that you have a supported Java version, you have two choices when it comes to installing Logstash: You can either download the Logstash bundle and use that, or you can install Logstash using your OS’s package manager. The package manager is the recommended route because it makes upgrading and patching Logstash so much easier.

The following steps are specific to Ubuntu and other Debian based OSes. Check out Elastic’s Package Repositories page for information on other OSes.

Firstly, you need to add Elastic’s signing key so that the downloaded package can be verified. This can be skipped if you’ve installed packages from Elastic before:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

The next step is to add the Logstash repository definition to your system. It’s best to keep the definitions in different files, making them easier to manage. In this case, we’re adding it to:

/etc/apt/sources.list.d/logstash-2.x.list

Here is the definition:

echo "deb http://packages.elastic.co/logstash/2.1/debian stable main" | sudo tee -a /etc/apt/sources.list.d/logstash-2.x.list

All that’s left to do is to update your repositories and install Logstash:

sudo apt-get update
sudo apt-get install logstash

Since we added the Logstash 2.1 repository definition, we’ll now have installed Logstash 2.1 and will have access to all of the updates for that version.

How to Configure Logstash

Logstash Inputs

One of the things that makes Logstash great is its ability to source logs and events from various sources. As of version 2.1, there are 48 different inputs on the Logstash documentation page. That’s 48 different technologies, locations, and services from where you can pull events and manipulate them. These include monitoring systems like collectd, databases like Redis, services like Twitter, and various others such as File and RabbitMQ. By using these inputs, you can import data from multiple sources and manipulate them however you want — and eventually send them to other systems for storage or processing.

Inputs are the starting point of any configuration. If you do not define an input, Logstash will automatically create a stdin input. Since you can create multiple inputs, it’s important to type and tag them so that you can properly manipulate them in filters and outputs.

Logstash Outputs

As with the inputs, Logstash comes with a number of outputs that enable you to push your events to various locations, services, and technologies. You can store events using outputs such as File, CSV, and S3, convert them into messages with RabbitMQ and SQS, or send them to various services like HipChat, PagerDuty, or IRC. The number of combinations of inputs and outputs in Logstash makes it a really versatile event transformer.

Logstash events can come from multiple sources, so as with filters, it’s important to do checks on whether or not an event should be processed by a particular output. If you define no output, Logstash will automatically create a stdout output.

Logstash Filters

If Logstash were just a dumb pipe between a number of inputs and outputs, you could easily replace it with a service like IFTTT or Zapier. Luckily, it isn’t. It also comes with a number of very powerful filters with which you can manipulate, measure, and create events. It’s the power of these filters that makes Logstash a very versatile and valuable tool.

Logstash events can come from multiple sources, so as with outputs, it’s important to do checks on whether or not an event should be processed by a particular filter.

logstash logo

A Logstash Configuration Example

Logstash has a simple configuration DSL that enables you to specify inputs, outputs, and filters along with their specific options. Order matters, specifically around filters and outputs, as the configuration is basically converted into code and then executed. Keep this in mind when you’re writing your configs, and try to debug them.

Structure

Your configurations will generally have three sections: inputs, outputs and filters. You can have multiple instances of each of these instances, which means that you can group related plugins together in a config file instead of grouping them by type. My Logstash configs are generally structured as follows:

#/etc/logstash/conf.d/
- apache_to_elasticsearch.conf
- haproxy_to_elasticsearch.conf
- syslog_to_s3.conf

You’ll see that I have a configuration file for each of the functions or integrations that I’d like Logstash to perform. Each of those files will contain the necessary inputs, filters, and outputs to perform that function. Let’s look at the apache_to_elasticsearch.conf file, as it’s typical of what you’d see in a Logstash config file:

input {
  file {
    path => "/var/log/apache/access.log"
    type => "apache-access"
  }
}

filter {
  if [type] == "apache-access" {
    grok { 
      type => "apache-access"
      pattern => "%{COMBINEDAPACHELOG}" 
    }
  }
}

output {
  if [type] == "apache-access" {
    if "_grokparsefailure" in [tags] {
      null {}
    }

    elasticsearch { 
    }
  }
}

The input section tells Logstash to pull logs from the Apache access log and specify the type of those events as apache-access. Setting the type is important, as it will be used to selectively apply filters and outputs later on in the event’s lifetime. It’s also used to organize the events when it’s eventually pushed to Elasticsearch.

In the filter section, we specifically apply a grok filter to events that have the apache-access type. This conditional ensures that only the apache-access events get filtered. If it is not there, Logstash will attempt to apply the grok filter to events from other inputs as well. This filter parses the log string and populates the event with the relevant information from the Apache logs.

Lastly, we see the output section. The first conditional ensures, once again, that we only operate on the apache-access events. The next, nested, conditional sends all of the events that didn’t match our grok pattern to the null output. Since they didn’t conform to the specified pattern, we assume that they are log lines that contain information we’re not interested in and discard it. Since order is important in filters and outputs, this will ensure that only events that were successfully parsed will make it to the Elasticsearch output.

Each of the configuration files can contain these three sections. Logstash will typically combine all of our configuration files and consider it as one large config. Since you can have multiple inputs, it’s recommended that you tag your events or assign types to them so that it’s easy to identify them at a later stage. Also ensure that you wrap your filters and outputs that are specific to a category or type of event in a conditional, otherwise you might get some surprising results.

 

Working with Logstash Plugins

Since version 1.5, Logstash has relied on a plugin infrastructure to give it access to various inputs, filters, codecs, and outputs. Plugins are essentially Ruby gems and can be managed through Logstash’s plugin utility:

# List all the installed plugins
bin/plugin list
# List all the installed output plugins
bin/plugin list --group output

All the plugins that originally resided in the logstash-core codebase are installed by default on Logstash 1.5 and up. Plugins that are part of logstash-contrib or are outside of the logstash ecosystem should be installed:

bin/plugin install logstash-input-cloudwatch

This will add the plugin / gem to Logstash’s Gemfile and make it available to you. Updating and removing a plugin is just as easy:

bin/plugin update logstash-input-cloudwatch

bin/plugin uninstall logstash-input-cloudwatch

Start Stashing!

The only thing left to do now is to get your hands dirty. This post guided you through installing Logstash, configuring it, and making sure that you have access to all the functionality that you need through the plugin ecosystem. Since Logstash is the first element of the ELK stack, you should now have a solid grounding in how to use it for log and time-series data analysis.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for log analysis, application monitoring, business intelligence, and more. Start your free trial today!

Jurgens tries to write good code for a living. He even succeeds at it sometimes. When he isn’t writing code, he’s wrangling data as a hobby. Sometimes the data wins, but we don’t talk about that. Ruby and Elasticsearch are his weapons of choice, but his ADD always allows for new interests. He’s also the community maintainer for a number of Logstash inputs.