logstash tutorial

Logstash is the “L” in the ELK Stack — the world’s most popular log analysis platform and is responsible for aggregating data from different sources, processing it, and sending it down the pipeline, usually to be directly indexed in Elasticsearch.

The role Logstash plays in the stack is critical — it allows users to filter, massage, and shape the data so that it’s easier to work with. This Logstash tutorial gives you a crash course in getting started with Logstash, and provides instructions for installing Logstash and configuring it.

Installing Logstash

Depending on your operating system and your environment, there are various ways of installing Logstash. We will be installing Logstash on an Ubuntu 16.04 machine running on AWS EC2 using apt. Check out other installation options here.

Before you install Logstash, make sure you have Java 8 installed.

To install Java 8, use:

You can now begin the installation process for Logstash.

First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):

The next step is to add the repository definition to your system:

All that’s left to do is to update your repositories and install Logstash:

Configuring Logstash

Logstash configuration is one of the biggest obstacles users face when working with Logstash. While improvements have been made recently to managing and configuring pipelines, this can still be a challenge for beginners.

We’ll start by reviewing the three main configuration sections in a Logstash configuration file, each responsible for different functions and using different Logstash plugins.

Logstash Inputs

One of the things that makes Logstash so powerful is its ability to aggregate logs and events from various sources. Using more than 50 input plugins for different platforms, databases and applications, Logstash can be defined to collect and process data from these sources and send them to other systems for storage and analysis.

The most common inputs used are file, beats, syslog, http, tcp, udp, stdin but you can ingest data from plenty of other sources.

Inputs are the starting point of any configuration. If you do not define an input, Logstash will automatically create a stdin input. Since you can create multiple inputs, it’s important to type and tag them so that you can properly manipulate them in filters and outputs.

Logstash Filters

If Logstash were just a simple pipe between a number of inputs and outputs, you could easily replace it with a service like IFTTT or Zapier. Luckily for us, it isn’t. Logstash supports a number of extremely powerful filter plugins that enable you to manipulate, measure, and create events. It’s the power of these filters that makes Logstash a very versatile and valuable tool.

Logstash Outputs

As with the inputs, Logstash supports a number of output plugins that enable you to push your data to various locations, services, and technologies. You can store events using outputs such as File, CSV, and S3, convert them into messages with RabbitMQ and SQS, or send them to various services like HipChat, PagerDuty, or IRC. The number of combinations of inputs and outputs in Logstash makes it a really versatile event transformer.

Logstash events can come from multiple sources, so it’s important to check whether or not an event should be processed by a particular output. If you do not define an output, Logstash will automatically create a stdout output.

A Logstash Configuration Example

Logstash has a simple configuration DSL that enables you to specify the inputs, outputs, and filters described above, along with their specific options. Order matters, specifically around filters and outputs, as the configuration is basically converted into code and then executed. Keep this in mind when you’re writing your configs, and try to debug them.

Structure

Your configurations will generally have three sections: inputs, outputs and filters. You can have multiple instances of each of these instances, which means that you can group related plugins together in a config file instead of grouping them by type. Logstash configs are generally structured as follows:

So you can have a configuration file for each of the functions or integrations that you would like Logstash to perform. Each of those files will contain the necessary inputs, filters, and outputs to perform that function.

Your configurations will generally have three sections: inputs, outputs and filters. Here’s an example of what a Logstash configuration file looks like:

The input section is using the file input plugin to tell Logstash to pull logs from the Apache access log.

In the filter section, we are applying: a) a grok filter that parses the log string and populates the event with the relevant information from the Apache logs, b) a date filter to define the timestamp field, and c) a geoip filter to enrich the clientip field with geographical data.

Tip! The grok filter is not easy to configure. We recommend testing your filters before starting Logstash using the grok debugger. A rich list of the most commonly used grok patterns is available here.

Lastly, the output section which in this case is defined to send data to a local Elasticsearch instance.

Each of the configuration files can contain these three sections. Logstash will typically combine all of our configuration files and consider it as one large config. Since you can have multiple inputs, it’s recommended that you tag your events or assign types to them so that it’s easy to identify them at a later stage. Also ensure that you wrap your filters and outputs that are specific to a category or type of event in a conditional, otherwise you might get some surprising results.

Working with Logstash Plugins

You will find that most of the most common use cases are covered by the plugins shipped and enabled by default. To see the list of loaded plugins, access the Logstash installation directory and execute the list command:

Installing other plugins is easily accomplished with:

Updating and removing plugins is just as easy, as well as installing a plugin built locally.

Start Stashing!

The only thing that’s left to do is get your hands dirty – start Logstash!

Configuration errors are a frequent occurrence, so using the Logstash logs can be useful to find out what error took place.

This post guided you through the steps for installing Logstash, configuring it, and making sure that you have access to all the functionality that you need through the plugin ecosystem. Since Logstash is the first element in an ELK-based data pipeline, you should now have a solid base on which to build a log analysis pipeline. Check out this Kibana tutorial to understand how.

Easily configure and ship logs with Logz.io ELK as a service.

 

Jurgens du Toit

Jurgens tries to write good code for a living. He even succeeds at it sometimes. When he isn't writing code, he's wrangling data as a hobby. Sometimes the data wins, but we don't talk about that. Ruby and Elasticsearch are his weapons of choice, but his ADD always allows for new interests. He's also the community maintainer for a number of Logstash inputs.