How to debug your Logstash configuration file
Logstash plays an extremely important role in any ELK-based data pipeline but is still considered as one of the main pain points in the stack. Like any piece of software, Logstash has a lot of nooks and crannies that need to be mastered to be able to log with confidence.
One super-important nook and cranny is the Logstash configuration file (not the software’s configuration file (/etc/logstash/logstash.yml), but the .conf file responsible for your data pipeline). How successful you are at running Logstash is directly determined from how well versed you are at working with this file and how skilled you are at debugging issues that may occur if misconfiguring it.
To all those Logstash newbies, before you consider alternatives, do not despair — Logstash is a great log aggregator, and in this article you’ll find some tips for properly working with your pipeline configuration files and debugging them.
Understanding the structure of the config file
Before we take a look at some debugging tactics, you might want to take a deep breath and understand how a Logstash configuration file is built. This might help you avoid unnecessary and really basic mistakes.
Each Logstash configuration file contains three sections — input, filter and output.
Each section specifies which plugin to use and plugin-specific settings which vary per plugin. You can specify multiple plugins per section, which will be executed in order of appearance.
Let’s take a look at this simple example for Apache access logs:
##Input section input { file { path => "/var/log/apache/access.log" } } ##Filter section filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "clientip" } } ##Output section output { elasticsearch { hosts => ["localhost:9200"] } }
In this case, we are instructing Logstash to use the file input plugin to collect our Apache access logs from /var/log/apache/access.log, the grok and geoip plugins to process the log, and the Elasticsearch output plugin to ship the data to a local Elasticsearch instance.
Tips
- Use a text editor to verify closing curly brackets for each statement and no breaking lines.
- Each plugin has different settings. Verify the syntax for each plugin by referring to the plugin’s documentation.
- Only use the plugins you need. Do not overload your Logstash configuration with plugins you don’t need that will only add more points of failure. More plugins also affect performance.
Building your groks
The grok filter plugin is one of the most popular plugins used by Logstash users. Its task is simple — to parse logs into beautiful and easy to analyze data constructs. Handling grok, on the other hand, is the opposite of simple.
Grok is essentially based upon a combination of regular expressions so if you’re a regex genius, using this plugin in Logstash might be a bit easier compared to other users. Still, if you need some tips on grokking, take a look at this article.
The grokdebugger is a free online tool that will help you test your grok patterns on log messages. This tool makes life much easier (there is even a version of this tool available within Kibana), but please note that even if your grok passes the grokdebugge’s test, you still might encounter a Logstash configuration error or even a failed grok (_grokparsefailure).
Tips
- Use the Logstash supported patterns in your groks. A full list of these patterns is available here.
- As you begin configuring your grok, I recommend starting with the %{GREEDYDATA:message} pattern and slowly adding more and more patterns as you proceed.
- There are a bunch of online tools that will help you with building regex’s. I like using regex101.
Testing your configuration
There’s no rush. Before you start Logstash in production, test your configuration file. If you run Logstash from the command line, you can specify parameters that will verify your configuration for you.
In the Logstash installation directory (Linux: /usr/share/logstash), enter:
sudo bin/logstash --config.test_and_exit -f <path_to_config_file>
This will run through your configuration, verify the configuration syntax and then exit. In case an error is detected, you will get a detailed message pointing you to the problem.
For example, in the error below we can see we had a configuration error on line 34, column 7:
[FATAL] 2019-03-09 17:37:27.334 [LogStash::Runner] runner - The given configuration is invalid. Reason: Expected one of #, => at line 34, column 7 (byte 1173) after filter
In case your configuration passes the configtest, you will see the following message:
Configuration OK [INFO ] 2019-03-06 19:01:46.286 [LogStash::Runner] runner - Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash
Logstash logging
In most cases, if you’ve passed the configtest and have verified your grok patterns separately using the grokdebugger, you’ve already greatly enhanced the chances you have of starting your Logstash pipeline successfully.
However, Logstash has the uncanny ability to surprise you with an error just when you’re feeling confident about your configuration. In this case, the first place you need to check is the Logstash logs (Linux: /var/log/logstash/logstash-plain.log). Here you might find the root cause of your error.
Another common way of debugging Logstash is by printing events to stdout.
output { stdout { codec => rubydebug } }
Tips
- You cannot see the stdout output in your console if you start Logstash as a service.
- You can use the stdout output plugin in conjunction with other output plugins.
- I have a habit of opening another terminal each time I start Logstash and tail Logstash logs with:
sudo tail -f /var/log/logstash/logstash.log
Endnotes
Working with Logstash definitely requires experience. The examples above were super-basic and only referred to the configuration of the pipeline and not performance tuning. Things can get even more complicated when you’re working with multiple pipelines and more complex configuration files.
As a rule of thumb, before you start with Logstash make sure you actually need it. Some use cases might be able to rely on beats only. Filebeat now supports some basic filtering and processing which might mean you don’t need to complicate matters with Logstash.
Again, Logstash is a great log aggregator. The improvements added in recent versions, such as the monitoring API and performance improvements, have made it much easier to build resilient and reliable logging pipelines. If you do indeed require Logstash, have started to work with it and have begun to encounter issues — be patient, it’s worth your while!