5 Logstash Filter Plugins You Need to Know About

5-logstash-filter-plugins

In ELK, Logstash handles the resource-heavy task of aggregating and processing logs. The processing work performed by Logstash makes sure our log messages are parsed and structured correctly, and it is this structure that enables you to analyze and visualize the data more easily post indexing in Elasticsearch. What exact processing is performed on the data is determined by you in the filter section of your Logstash configuration files. In this section you can choose from a large number of both officially supported and community Logstash filter plugins to determine how exactly to transform the logs. The most commonly used filter plugin is Grok, but there are a number of other extremely useful plugins you can use.

Which plugin you use will of course depend on the logs themselves, but this article tries to list five of the plugins you will most likely find useful in any logging pipeline that involves Logstash.

1. Grok

As mentioned above, grok is by far the most commonly used filter plugin in Logstash. Despite the fact that it is not easy to use, grok is popular because what it allows you to do is give structure to unstructured logs.

Take this random log message for example:

2016-07-11T23:56:42.000+00:00 INFO
[MySecretApp.com.Transaction.Manager]:Starting transaction for session
-464410bf-37bf-475a-afc0-498e0199f008

The grok pattern we will use looks like this:

filter {
 grok {
   match => { "message" =>"%{TIMESTAMP_ISO8601:timestamp} 
%{LOGLEVEL:log-level} \[%{DATA:class}\]:%{GREEDYDATA:message}" }
        }
}

After processing, the log message will be parsed as follows:

{
 "timestamp" => "2016-07-11T23:56:42.000+00:00",
 "log-level" =>"INFO",
 "class" =>"MySecretApp.com.Transaction.Manager"
 "message" => "Starting transaction for session 
  -464410bf-37bf-475a-afc0-498e0199f008"
 }

This is how Elasticsearch indexes the log message. Sorted out in this format, the log message has been broken up into logically-named fields which can be then queried, analyzed and visualized more easily.

More information about how grok works and how to use it can be found in this article.

2. Mutate 

Another common Logstash filter plugin is mutate. As its name implies, this filter allows you to really massage your log messages by “mutating” the various fields. You can, for example, use the filter to change fields, join them together, rename them, and more. 

Using the log above as an example, using the lowercase configuration option for the mutate plugin, we can transform the ‘log-level’ field into lowercase: 

filter {
 grok {...}
 mutate {
   lowercase => [ "log-level" ]
  }
 }

The mutate plugin is a great way to change the format of your logs. A full list of the different configuration options for the plugin is listed here.

3. Date 

How can you analyze logs and events if they are not accurately sorted in chronological order?  

The Logstash date filter plugin can be used to pull a time and date from a log message and define it as the timestamp field (@timestamp) for the log. Once defined, this timestamp field will sort out the logs in the correct chronological order and help you analyze them more effectively. 

There are tens, if not hundreds, of different ways time and date can be formatted in logs.  

Here is an example of an Apache access log:  

200.183.100.141 - - [25/Nov/2016:16:17:10 +0000] "GET 
/wp-content/force-download.php?file=../wp-config.php HTTP/1.0" 200 
3842 "https://hack3r.com/top_online_shops" "Mozilla/4.0 (compatible; 
MSIE 8.0; Windows NT 5.1; Trident/4.0; YTB720; GTB7.2; .NET CLR 
1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 
3.5.30729)"

Using the date filter as follows, we can extract the date and time pattern and define it as the @timestamp field according to which all are logs will be sorted by:

filter {
 
   grok {
 
      match => { "message" => "%{COMBINEDAPACHELOG}"}
      
      }
   
 
   date {
        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        target => "@timestamp"
     }
 }

It’s important to note that if you do not use the date filter, Logstash will automatically set a timestamp based on the input time.

Read about additional configuration options here. 

4. JSON 

JSON is an extremely popular format for logs because it allows users to write structured and standardized messages that can be easily read and analyzed.  

To maintain the JSON structure of either an entire message or a specific field, the Logstash json filter plugin enables you to extract and maintain the JSON data structure within the log message. 

The example below is an Apache access log formatted as a JSON: 

{ "time":"[30/Jul/2017:17:21:45 +0000]", 
"remoteIP":"192.168.2.1", "host":"my.host.local",
 
"request":"/index.html", "query":"", "method":"GET", 
"status":"200",
 
"userAgent":"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; 
Trident/4.0; YTB720; GTB7.2; .NET CLR 1.1.4322; .NET CLR 
2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)", 
"referer":"-" }

Instead of having the log flattened into one line, we can use the json filter to retain the data structure:

filter {
 json { source =>"message"
        target => "log"
   }
  }
 }

The source configuration option defines which field in the log is the JSON you wish to parse. In this example, the entire message field is a JSON. I’m also using the target option to expand the JSON into a data structure within a field called log.

Read about additional configuration options here. 

5. KV 

Key value pairs, or KVPs, is another commonly used logging format. Like JSON, this format is popular mainly because it is readable, and the Logstash kv filter plugin allows you to automatically parse messages or specific fields formatted this way. 

Take this log as an example:

2017-07025 17:02:12 level=error message="connection refused" 
service="listener" thread=125 customerid=776622 ip=34.124.233.12 
queryid=45

I can use the following kv filter to instruct Logstash how to process it:

filter {
  kv {
      source => "metadata"
      trim => "\""
      include_keys => [ "level","service","customerid",”queryid” ]
      target => "kv"
   }
 }

Note the usage here of configuration options. I’m using source to define the field to perform the key=value searching on, trim to omit specific characters, include_keys to specify the parsed keys which should be added to the log, and target to define the container into which all the key-pair values are to be placed in.

Read about additional configuration options here 

Summary 

As I said at the beginning of the article, there is a huge amount of Logstash filter plugins at your disposal. Which one you use greatly depends of course on the specific log message you want to process.  

Other extremely useful filter plugins that are worth mentioning are the geoip (for adding geographical data for IP fields) and csv (for parsing CSV logs) plugins.

While each and every one of these plugins is useful in its own right, their full power is unleashed when used together to parse logs. Indeed, in most cases, you will most likely be using a combination of grok and at least one or two additional plugins. This combined usage will guarantee your logs come out on the other end of Logstash perfectly formatted!

Get started for free

Completely free for 14 days, no strings attached.