A Beginner’s Guide to Logstash Grok

logstash grok

The ability to efficiently analyze and query the data being shipped into the ELK Stack depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines.

This ungrateful but critical task is usually left to Logstash (though there are other log shippers available, see our comparison of Fluentd vs. Logstash as one example). Regardless of the data source that you define, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being outputted to Elasticsearch.

Data manipulation in Logstash is performed using filter plugins. This article focuses on one of the most popular and useful filter plugins — the Logstash grok filter, which is used to parse unstructured data into structured data.

How does it work?

Put simply, grok is a way to match a line against a regular expression, map specific parts of the line into dedicated fields, and perform actions based on this mapping.

There are many built-in patterns that are supported out-of-the-box by Logstash for filtering items such as words, numbers, and dates (the full list of supported patterns can be found here). If you cannot find the pattern you need, you can write your own custom pattern.

Here is the basic syntax format for a Logstash grok filter:

%{PATTERN:FieldName}

This will match the predefined pattern and map it to a specific identifying field. Since grok is essentially based upon a combination of regular expressions, you can also create your own regex-based grok filter. For example:

(?\d\d-\d\d-\d\d)

This will match the regular expression of 22-22-22 (or any other digit) to the field name.

A Logstash grok example

To demonstrate how to get started with grokking, I’m going to use the following application log:

2016-07-11T23:56:42.000+00:00 INFO [MySecretApp.com.Transaction.Manager]:Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008

The goal I want to accomplish with a grok filter is to break down the logline into the following fields: timestamp, log level, class, and then the rest of the message.

The following grok pattern will do the job:

grok {
   match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:log-level} \[%{DATA:class}\]:%{GREEDYDATA:message}" }
 }

This will try to match the incoming log to the given pattern. In case of a match, the log will be broken down into the specified fields, according to the defined patterns in the filter. In case of a mismatch, Logstash will add a tag called _grokparsefailure.

In our case, the filter will match and result in the following output:

{
     "message" => "Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008",
     "timestamp" => "2016-07-11T23:56:42.000+00:00",
     "log-level" => "INFO",
     "class" => "MySecretApp.com.Transaction.Manager"
}

Manipulating the data

On the base of a match, you can define additional Logstash grok configurations to manipulate the data. For example, you can make Logstash add fields, override fields, or remove fields.

grok {
   match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:log-level} \[%{DATA:class}\]:%{GREEDYDATA:message}" }
   overwrite => [“message”]
   add_tag =>  [ "My_Secret_Tag” ] 
}

In this case, we are using the ‘overwrite’ action to overwrite the ‘message’ field. This way our ‘message’ field will not appear with the other fields we defined (‘timestamp’, ‘log-level’ and ‘class’). Also, we are using the ‘add_tag’ action to add a custom tag field to the log.

A full list of available actions you can use to manipulate your logs is available here, together with their input type and default value.

The grok debugger

A great way to get started with building yours grok filters is this grok debug tool: https://grokdebug.herokuapp.com/

This tool allows you to paste your log message and gradually build the grok pattern while continuously testing the compilation. As a rule, I recommend starting with the %{GREEDYDATA:message} pattern and slowly adding more and more patterns as you proceed.

In the case of the example above, I would start with:

%{GREEDYDATA:message}

Then, to verify that the first part is working, proceed with:

%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:message}

Common examples

Here are some examples that will help you to familiarize yourself with how to construct a grok filter.

Syslog

grok {
   match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp}
%{SYSLOGHOST:syslog_hostname}
%{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?:
%{GREEDYDATA:syslog_message}" }
}

Apache access logs

grok  {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
   }

Elasticsearch

grok {
      match => ["message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}\[%{DATA:index}\] %{NOTSPACE} \[%{DATA:updated-type}\]",
                "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:loglevel}%{SPACE}\]\[%{DATA:source}%{SPACE}\]%{SPACE}\[%{DATA:node}\] (\[%{NOTSPACE:Index}\]\[%{NUMBER:shards}\])?%{GREEDYDATA}"
      ]
   }

Summing it up

Logstash grok is just one type of filter that can be applied to your logs before they are forwarded into Elasticsearch. Because it plays such a crucial part in the logging pipeline, grok is also one of the most commonly-used filters.

Here is a list of some useful resources that can help you along the grokking way:

Happy grokking!

Easily Configure and Ship Logs with Logz.io ELK as a Service.
Thank you for Subscribing!
Artboard Created with Sketch.

8 responses to “A Beginner’s Guide to Logstash Grok”

  1. Yalexn says:

    Hello,
    I really found this article helpful as I am still new to logstash. Thank you.
    I just wanted to ask a question regarding the syslog grok filter.
    what is the significance of using “(?:” before “%{POSINT:syslog_id}” ?

    Thanks in advance

  2. LOKESH JANGIR says:

    I found this article very helpful in jump start on grok patterns. (y)
    thanks

  3. Abdelilah Maatallah says:

    Thank’s bro i found this article very helpful

  4. Robin Ersek-Obadovics says:

    This Logstash Grok article was interesting, thanks for that! In case Logstash does not provide the performance required, I suggest NXLog – https://nxlog.co/products/nxlog-community-edition – an open source centralized log management solution available for free to download. It provides high-performance even when it scaling to thousands of servers. A great tool to check out if high requirements are needed.

  5. sanjay baghel says:

    hiii i am not able to make grok of this line
    16:14:32,852 DEBUG [ConfigurationUtils ] ConfigurationUtils.locate(): base is null, name is global-configuration.properties
    can you please help me

  6. 36279677 says:

    Awesome, finally a grok debugger.

Leave a Reply

Your email address will not be published. Required fields are marked *

×

Turn machine data into actionable insights with ELK as a Service

By submitting this form, you are accepting our Terms of Use and our Privacy Policy

×

DevOps News and Tips to your inbox

We write about DevOps. Log Analytics, Elasticsearch and much more!

By submitting this form, you are accepting our Terms of Use and our Privacy Policy