L-post1

Logstash (part of the ELK Stack) is very easy to start using out-of-the-box. You simply download it, run it and start working. While you don’t need to be an expert from the get-go, when you delve deeper into configurations, certain complexities may surface.

At Logz.io, our users use Logstash extensively. As a result of the great deal of time we’ve spent configuring and running Logstash, we wanted to explore and share the top five pitfalls that we’ve experienced, as well as some corresponding solutions and tips.

A Bit about Logstash

If you already know and use Logstash, you might want to jump to the next paragraph 🙂

logstash

Logstash is a system that receives, processes and outputs logs in a structured format. By sending a string of information, you receive a structured and enriched JSON format of the data. One of Logstash’s main uses is to index documents in data stores that require structured information, most commonly Elasticsearch. For example, if you send,  “Hello world”, in a string to Logstash, you will receive a JSON output. By default, this structured information of key values will include the message, “Hello world”, a timestamp of when the message was received, a host name from the source of the message, and a version.

Five Logstash Pitfalls, Tips, and Possible Solutions

Although Logstash is great, no product is flawless. Below are the top five pitfalls that we’ve encountered in our journey working with Logstash users.

1. Key-Value Filter (KV Plugin)

Key-values is a filter plug-in that extracts keys and values from a single log using them to create new fields in the structured data format. For example, let’s say a log line contains “x=5”. If you pass that through a key-value filter, it will create a new field in the output JSON format where the key would be “x” and the value would be “5”.

By default, the key-value filter will extract every key=value pattern in the source field. However, the downside is that you don’t have control over the keys and values that are created when you let it work automatically, out-of-the-box with the default configuration. It may create many keys and values with an undesired structure, and even malformed keys that make the output unpredictable. If this happens, Elasticsearch may fail to index the resulting document and parse irrelevant information.

Our Solution:

In order to get the most out of this plug-in, it is important to specify which keys should be extracted. This can be done by adding the “include_keys” parameter to the configuration. As you can see below, we’ve added “name”, “type” and “count”. Therefore, the plug-in will only extract the name, type and count keys as long as they are in the right format (e.g. name=x).

2. Memory Consumption

Logstash runs on JVM and consumes a hefty amount of resources to do so. Many discussions have been floating around regarding Logstash’s significant memory consumption. Obviously this can be a great challenge when you want to send logs from a small machine (such as AWS micro instances) without harming application performance.

Our Tip:

In order to save resources, you can use the Logstash Forwarder (previously known as Lumberjack), which is a lighter version of Logstash that includes the minimum amount of plug-ins. The forwarder uses Lumberjack’s protocol, enabling you to securely ship compressed logs, reducing resource consumption and bandwidth. The sole input is file/s, while the output can be directed to multiple destinations.

Other options do exist, as well, to send logs. You can use rsyslog on Linux machines, and there are other agents for Windows machines, such as nxlog and syslog-ng.

3. Multiple Configuration Files

When you begin working with Logstash, you tend to start with a small configuration file that grows over time. As a result, the file becomes difficult to maintain, read and understand.

Our Tip:

Did you know that you can separate your large configuration file into several different smaller files? Instead of supplying a path to a configuration file, you can set the path to the configuration folder that contains multiple configuration files. For example, you can have one file that contains the output/input transport plug-ins and have other files that contain filters. The files are merged by name, alphabetically, so it is important to name them according to how you’d like them to be ordered.

4. The Multi-Line Plug-In

Sometimes, an event message is spread across a few log lines. For example, let’s say that Java exception takes up 10 lines in a log file. When looking at the event via Elasticsearch, it’s better to be able to view all 10 lines as a single event. The Multi-Line plug-in can join multiple log lines together. Simply specify the desired pattern, and the plug-in will be able to identify which lines should be joined together accordingly.

Pitfall#1

In general, Logstash is multi-threaded based on the plug-ins you use. Surprisingly, not all of Logstash’s plug-ins are built to run in parallel. For example, the Multi-Line plug-in is not thread-safe. If you configure Logstash to run multiple filter threads, there is a good chance that the Multi-Line filter will break and may cause Logstash to crash.

Pitfall #2

When sending multiple logs with TCP, generally speaking, TCP will break them up log by log, sending one after the other in separate packets in a stream. However, TCP might place two logs in the same packet in a stream. Multi-Line doesn’t know how to handle this since it expects each message to come in a separate packet.

There is no single tip for dealing with this correctly. Usually when you use plug-ins in Logstash, you don’t need to think about whether or not they are thread safe or work in TCP. However, while you may think everything is working correctly with Multi-Line, you may find out later that it’s not. Be sure to use it correctly.

5. Varying Syntax between Plug-Ins

There are a few common things you want to do with Logstash. For example, since it creates a structured file with fields and values, it is common to add and remove fields and tags. Most of the plug-ins allow you to perform these types of global operations. However, this can be problematic because plug-ins have different syntax. Therefore, the configuration that you use to add a field in one plug-in, may not work in another.

An example:

i.e. adding tags to the event in tcp input or file input is done by

but in mutate filter adding tags is configured by:

Our Tip

Since you don’t know if one plug-in’s configuration will work on another plug-in, be sure to test the configuration before you run it. You can test the configuration by running Logstash with the —configtest command line parameter. This doesn’t actually run Logstash, but it does validate the configuration.

Bonus Tip: Ruby Debug Performance

It is very useful to print incoming and outgoing messages. This makes it easier to debug the system. However, forgetfulness happens. The issue here is that forgetting about these print outs could result in excessive resource consumption and increased latency.

Our Tip

When you move to production, it is obviously important to remove the STDout plug-ins. The debug mode should be off in production, or else you run the risk of slowing down the environment.

Summary

Logstash is a great tool that has created a lot of ease in centralizing logs for DevOps operations. The fact that it is open source is an added benefit. With over 100 plug-ins, there is a lot more to Logstash than what meets the eye. However, while open source has its advantages, it also has its disadvantages. These are seen in the incomplete plug-ins that break the structured message that Logstash generates. I hope that we’ve been able to help you extend the options that Logstash already offers, as well as protect your system from crashes and performance degradation.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. We parse your logs for you so that you will not have to worry about these pitfalls at all.

START YOUR FREE TRIAL
Tomer Levy is co-founder and CEO of Logz.io. Before founding Logz.io, Tomer was the co-founder and CTO of Intigua that developed innovative, Docker-like containers designed for large enterprises. Prior to Intigua, Tomer spent six years at CheckPoint, where he managed its Intrusion Prevention System (IPS) Software Blade from concept to market, generating $100M in revenue in the second year. Tomer has an M.B.A. from Tel Aviv University and a B.S. in computer science and is an enthusiastic kite surfer.