logstash VS fluentd

The unsung heroes of log analysis are the log collectors. They are the hard-working daemons that run on servers to pull server metrics, parse log files, and transport them to backend systems such as Elasticsearch and PostgreSQL. While visualization tools such as Kibana and re:dash bask in the glory, the log collectors ensure that all logs are routed to the correct locations in the first place.

In the open source world, the two most-popular data collectors are Logstash and Fluentd. Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch.

In this article, we aim to give a no-frills comparison of Logstash, which is owned by Elastic, and Fluentd, which is owned by Treasure Data. The goal is to collect all of the facts about these excellent software platforms in one place so that readers can make informed decisions for their next projects.

We at Logz.io support both Logstash and Fluentd, and we see a growing number of customers leveraging Fluentd to ship logs to us. As a result, it was important for us to make this comparison. Here, we have compiled a summary chart of the differences between Logstash and Fluentd, and then we go into more detail below.

Comparison overview

Platform Comparison

For a long time, one of the advantages of Logstash was that it is written in JRuby, and hence it ran on Windows. Fluentd, on the other hand, did not support Windows until recently due to its dependency on a *NIX platform-centric event library. Not anymore. As of this pull request, Fluentd now supports Windows.

Logstash: Linux and Windows
Fluentd: Linux and Windows

Event Routing Comparison

One of the key features of log collectors is event routing. Both log collectors support routing, but their approaches are different.

Logstash routes all data into a single stream and then uses algorithmic if-then statements to send them to the right destination. Here is an example that sends error events in production to PagerDuty:

Fluentd relies on tags to route events. Each Fluentd event has a tag that tells Fluentd where it wants to be routed. For example, if you are sending error events in production to PagerDuty, the configuration would look something like this:

Fluentd’s approach is more declarative whereas Logstash’s method is procedural. For programmers trained in procedural programming, Logstash’s configuration can be easier to get started. On the other hand, Fluentd’s tag-based routing allows complex routing to be expressed clearly. For example, the following configuration applies different logic to all production and development events based on tag prefixes.

Logstash: Uses algorithmic statements to route events and is good for procedural programmers
Fluentd: Uses tags to route events and is better at complex routing

Plugin Ecosystem Comparison

Both Logstash and Fluentd have rich plugin ecosystems covering many input systems (file and TCP/UDP), filters (mutating data and filtering by fields), and output destinations (Elasticsearch, AWS, GCP, and Treasure Data)

One key difference is how plugins are managed. Logstash manages all its plugins under a single GitHub repo. While the user may write and use their own, there seems to be a concerted effort to collect them in one place. As of this writing, there are 199 plugins under logstash-plugins GitHub repo.

Fluentd, on the other hand, adopts a more decentralized approach. Although there are 516 plugins, the official repository only hosts 10 of them. In fact, among the top 5 most popular plugins (fluent-plugin-record-transformer, fluent-plugin-forest, fluent-plugin-secure-forward, fluent-plugin-elasticsearch, and fluent-plugin-s3), only one is in the official repository!

Logstash: Centralized plugin repository
Fluentd: Decentralized plugin repository

Transport Comparison

Logstash lacks a persistent internal message queue: Currently, Logstash has an on-memory queue that holds 20 events (fixed size) and relies on an external queue like Redis for persistence across restarts. This is a known issue for Logstash, and it is actively worked on this issue where they aim to persist the queue on-disk.

Fluentd, on the other hand, has a highly configurable buffering system. It can be either in-memory or on-disk with more parameters that you ever care to know.

The upside of Logstash’s approach is simplicity: the mental model for its sized queue is very simple. However, you must deploy Redis alongside Logstash for improved reliability in production. Fluentd has built-in reliability, but its configuration parameters take some getting used to.

Logstash: Needs to be deployed with Redis to ensure reliability
Fluentd: Built-in reliability, but its configuration is more complicated

Performance Comparison

This is a nebulous topic. As discussed in this talk at OpenStack Summit 2015, both perform well in most use cases and consistently grok through 10,000+ events per second.

That said, Logstash is known to consume more memory at around 120MB compared to Fluentd’s 40MB. For modern machines, this is hardly a meaningful difference between the two aggregators. For leaf machines, it’s a different story: Spread across 1,000 servers, this can mean 80GB of additional memory use, which is significant. (This hypothetical number comes from the 80MB difference between Logstash and FluentD on a single machine multiplied by 1,000 machines.)

Don’t worry, Logstash has a solution. Instead of running the fully featured Logstash on leaf nodes, Elastic recommends that you run Elastic Beats, resource-efficient, purpose-built log shippers. Each Beat focuses on one data source only and does that well. On Fluentd’s end, there is Fluent Bit, an embeddable low-footprint version of Fluentd written in C, as well as Fluentd Forwarder, a stripped down version of Fluentd written in Go.

Logstash: Slightly more memory use. Use Elastic Beats for leaf machines.
Fluentd: Slightly less memory use. Use Fluent Bit and Fluentd Forwarder for leaf machines.

So Much Information! What’s Next?

While there are several differences, the similarities between Logstash and Fluentd are greater than their differences. Users of either Logstash or Fluentd are miles ahead of the curve when it comes to log management.

Easily configure and ship logs with Logz.io ELK as a service.