5 Logstash Pitfalls You Need to Avoid

Tomer Levy

Logstash (part of the ELK Stack) is very easy to start using out-of-the-box. You simply download it, run it and start working. While you don’t need to be an expert from the get-go, when you delve deeper into configurations, certain complexities may surface.

At Logz.io, our users use Logstash extensively. As a result of the great deal of time we’ve spent configuring and running Logstash, we wanted to explore and share the top five pitfalls that we’ve experienced, as well as some corresponding solutions and tips.

A Bit about Logstash

If you already know and use Logstash, you might want to jump to the next paragraph 🙂

Logstash is a system that receives, processes and outputs logs in a structured format. By sending a string of information, you receive a structured and enriched JSON format of the data. One of Logstash’s main uses is to index documents in data stores that require structured information, most commonly Elasticsearch. For example, if you send,  “Hello world”, in a string to Logstash, you will receive a JSON output. By default, this structured information of key values will include the message, “Hello world”, a timestamp of when the message was received, a hostname from the source of the message, and a version.

Five Logstash Pitfalls, Tips, and Possible Solutions

Although Logstash is great, no product is flawless. Below are the top five pitfalls that we’ve encountered in our journey working with Logstash users.

1. Key-Value Filter (KV Plugin)

Key-values is a filter plug-in that extracts keys and values from a single log using them to create new fields in the structured data format. For example, let’s say a log line contains “x=5”. If you pass that through a key-value filter, it will create a new field in the output JSON format where the key would be “x” and the value would be “5”.

“hello x=5”   =>  {“message”: “hello x=5”, “x”:”5”}

By default, the key-value filter will extract every key=value pattern in the source field. However, the downside is that you don’t have control over the keys and values that are created when you let it work automatically, out-of-the-box with the default configuration. It may create many keys and values with an undesired structure, and even malformed keys that make the output unpredictable. If this happens, Elasticsearch may fail to index the resulting document and parse irrelevant information.

Our Solution:

In order to get the most out of this plug-in, it is important to specify which keys should be extracted. This can be done by adding the “include_keys” parameter to the configuration. As you can see below, we’ve added “name”, “type” and “count”. Therefore, the plug-in will only extract the name, type and count keys as long as they are in the right format (e.g. name=x).

kv {

source => "message"

include_keys => ["name", "type", "count"]

trim => "<>[],"

trimkey => "<>[],"

}

2. Memory Consumption

Logstash runs on JVM and consumes a hefty amount of resources to do so. Many discussions have been floating around regarding Logstash’s significant memory consumption. Obviously this can be a great challenge when you want to send logs from a small machine (such as AWS micro instances) without harming application performance.

Our Tip:

In order to save resources, you can use the Logstash Forwarder (previously known as Lumberjack), which is a lighter version of Logstash that includes the minimum amount of plug-ins. The forwarder uses Lumberjack’s protocol, enabling you to securely ship compressed logs, reducing resource consumption and bandwidth. The sole input is file/s, while the output can be directed to multiple destinations.

Other options do exist, as well, to send logs. You can use rsyslog on Linux machines, and there are other agents for Windows machines, such as nxlog and syslog-ng.

3. Multiple Configuration Files

When you begin working with Logstash, you tend to start with a small configuration file that grows over time. As a result, the file becomes difficult to maintain, read and understand.

Our Tip:

Did you know that you can separate your large configuration file into several different smaller files? Instead of supplying a path to a configuration file, you can set the path to the configuration folder that contains multiple configuration files. For example, you can have one file that contains the output/input transport plug-ins and have other files that contain filters. The files are merged by name, alphabetically, so it is important to name them according to how you’d like them to be ordered.

4. The Multi-Line Plug-In

Sometimes, an event message is spread across a few log lines. For example, let’s say that Java exception takes up 10 lines in a log file. When looking at the event via Elasticsearch, it’s better to be able to view all 10 lines as a single event. The Multi-Line plug-in can join multiple log lines together. Simply specify the desired pattern, and the plug-in will be able to identify which lines should be joined together accordingly.

Pitfall#1

In general, Logstash is multi-threaded based on the plug-ins you use. Surprisingly, not all of Logstash’s plug-ins are built to run in parallel. For example, the Multi-Line plug-in is not thread-safe. If you configure Logstash to run multiple filter threads, there is a good chance that the Multi-Line filter will break and may cause Logstash to crash.

Pitfall #2

When sending multiple logs with TCP, generally speaking, TCP will break them up log by log, sending one after the other in separate packets in a stream. However, TCP might place two logs in the same packet in a stream. Multi-Line doesn’t know how to handle this since it expects each message to come in a separate packet.

There is no single tip for dealing with this correctly. Usually when you use plug-ins in Logstash, you don’t need to think about whether or not they are thread safe or work in TCP. However, while you may think everything is working correctly with Multi-Line, you may find out later that it’s not. Be sure to use it correctly.

5. Varying Syntax between Plug-Ins

There are a few common things you want to do with Logstash. For example, since it creates a structured file with fields and values, it is common to add and remove fields and tags. Most of the plug-ins allow you to perform these types of global operations. However, this can be problematic because plug-ins have different syntax. Therefore, the configuration that you use to add a field in one plug-in, may not work in another.

An example:

i.e. adding tags to the event in tcp input or file input is done by

tags => [“tag1”, “tag2”]

but in mutate filter adding tags is configured by:

add_tag => [“tag1”, “tag2”]

Our Tip

Since you don’t know if one plug-in’s configuration will work on another plug-in, be sure to test the configuration before you run it. You can test the configuration by running Logstash with the —configtest command line parameter. This doesn’t actually run Logstash, but it does validate the configuration.

Bonus Tip: Ruby Debug Performance

It is very useful to print incoming and outgoing messages. This makes it easier to debug the system. However, forgetfulness happens. The issue here is that forgetting about these printouts could result in excessive resource consumption and increased latency.

stdout {
 codec => rubydebug
 }

Our Tip

When you move to production, it is obviously important to remove the STDout plug-ins. The debug mode should be off in production, or else you run the risk of slowing down the environment.

Summary

Logstash is a great tool that has created a lot of ease in centralizing logs for DevOps operations. The fact that it is open source is an added benefit. With over 100 plug-ins, there is a lot more to Logstash than what meets the eye. However, while open source has its advantages, it also has its disadvantages. These are seen in the incomplete plug-ins that break the structured message that Logstash generates. I hope that we’ve been able to help you extend the options that Logstash already offers, as well as protect your system from crashes and performance degradation.

Easily Configure and Ship Logs with Logz.io ELK as a Service.
Artboard Created with Sketch.

16 responses to “5 Logstash Pitfalls You Need to Avoid”

  1. Concerning Multi-Line Plug-In, pitfall #2, there is a simple solution that works fine to me.
    Add this into tcp input config :
    codec => line

  2. Paul Crook says:

    Regarding the multi-line plug-in pitfall #1, you say to not “configure Logstash to run multiple filter threads”. Can you give some examples? I don’t know how to tell if I am running multiple filter threads or not. Are you saying that if I use multi-line, I shouldn’t configure any other filters?

    Thanks in advance.

  3. @Paul Crook

    By default, Logstash runs with only one filter thread.
    This can be customized using the flag “-filterworkers ” at logstash startup.
    It represents “how many events can be processed simultaneously by one filter”

    In all cases, you can use as many filters as you want. Multiline filter doesn’t limit you.
    But if you use multiline plugin, you are strongly encouraged to not customize “filterworkers” flag, else you will have problems…

  4. Eric says:

    For separating the configuration files, would you just have a:

    File-a.conf
    filter {
    }

    File-b.conf
    filter {
    }

    In each individual separate file?

  5. Eric, this is just as simple as you tell.

  6. raghav says:

    Thanks for detailed info.

    I am facing one issue here, i have used KV pattern to parse my logs, My log is getting parsed but it is always resides under “_source” tag, what i need is display all key = value pairs outside of siurce tag, which will help to generate lot of dashboards.

    would be great help if you can provide some inputs?

    sample log which is parsed using KV pattern:

    @timestamp August 3rd 2015, 14:36:09.422
    t@version 1
    t_id auqPHInlQKimneCagEtW1A
    t_index logstash-2015.08.03
    t_source {“message”:”abbc=1234,efghi=3456634″,”@version”:”1″,”@timestamp”:”2015-08-03T09:06:09.422Z”,”host”:”DSK-046-0BE1″,”path”:”/home/tester3/kibana-4.0.1-linux-x64/bin/samplelog”,”abbc”:”1234″,”efghi”:”3456634″}
    t_type logs
    ?abbc 1234
    ?efghi 3456634
    thost DSK-046-0BE1
    tmessage abbc=1234,efghi=3456634
    tpath /home/tester3/kibana-4.0.1-linux-x64/bin/samplelog

  7. Peter L says:

    Have you ever considered doing a blog post on structuring logstash config files? I.E split up into input/filter/output files or one file for each source, tags over type etc? Its something I’m currently researching and hard to find good info that covers more than 1 source at a time!
    Great posts thanks!

  8. Sundar Rajan says:

    great tips. One caveat: trim and trimkey should escape [ and ]. So it should be trim => “[],” otherwise your logstash might crash with premature end of class error. I experienced this problem.

  9. Oddgeir Gitlestad says:

    Great tips. Our solution to #4 is to use filebeat and its multiline plug-in on the clients. That way you can circumvent the need for the multiline on the server and the thread and TCP issue goes away.

  10. asif soomro says:

    I removed the logstash-output-stdout but logstash is not started !! following logs appear when run the logstash start

    {:timestamp=>”2016-05-03T07:24:45.948000-0400″, :message=>”The error reported is: n Couldn’t find any output plugin named ‘stdout’. Are you sure this is correct? Trying to load the stdout output plugin resulted in this error: no such file to load — logstash/outputs/stdout”}

    I really need that tip cause we want display just warn, error or exception entries in Kibana GUI

  11. asif soomro says:

    how we off debug mode ? can you elaborate the process step please, how we setup configuration file?

  12. Sudarshan T says:

    Thanks for the Ruby debug tip. It did slow down things considerably.

Leave a Reply

Your email address will not be published. Required fields are marked *

× Book time with us at re:Invent here! Book