Fluentd is an open source data collector developed by Treasure Data that acts as a unifying logging layer between input sources and output services. Fluentd is easy to install and has a light footprint along with a fully pluggable architecture.
In the world of the ELK Stack, Fluentd acts as a log collector — aggregating logs, parsing them, and forwarding them on to Elasticsearch. As such, Fluentd is often compared to Logstash, which has similar traits and functions (see a detailed comparison between the two here).
Both Logstash and Fluentd are supported by us at Logz.io, and we see quite a large number of customers using the latter to ship logs to us. This Fluentd tutorial describes how to establish the log shipping pipeline — from the source (Apache in this case), via Fluentd, to Logz.io.
To complete the steps below, you’ll need the following:
- HTTPS traffic allowed to port 8071
- An installed cURL and Apache web server
- An active Logz.io account. If you don’t have one yet, create a free account here.
- 5 minutes of free time!
Step 1: Installing Fluentd
The latest stable release of Fluentd is called ‘td-agent.’ To install it, use this cURL command (this command is for Ubuntu 12.04 — if you’re using a different Linux distribution, click here):
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-precise-td-agent2.sh | sh
The command will automatically install Fluentd and start the daemon. To make sure all is running as expected, run:
$ sudo /etc/init.d/td-agent status
If you’re on Mac OS X, use these instructions.
Step 2: Installing the Logz.io plugin
Our next step is to install the Logz.io plugin for Fluentd. To do this, we need to use the gem supplied with the td-agent:
To install the Logz.io plugin, run:
$ sudo /opt/td-agent/usr/sbin/td-agent-gem install fluent-plugin-logzio
Step 3: Configuring Fluentd
We now have to configure the input and output sources for Fluentd. In this tutorial, we’ll be using Apache as the input and Logz.io as the output.
Open the Fluentd configuration file:
$ sudo vi /etc/td-agent/td-agent.conf
Define Apache as the input source for Fluentd:
<source> @type tail format none path /var/log/apache2/access.log Pos_file /tmp/access_log.pos tag apache </source>
Note: Make sure you have full permissions to access Apache files. If you do not, Fluentd will fail to pull the logs and send them on to Logz.io.
Next, we’re going to define Logz.io as a “match” (the Fluentd term for an output destination):
<match **.**> type logzio_buffered endpoint_url https://listener.logz.io:8071?token=<token>&type=<logtype> output_include_time true output_include_tags true buffer_type file buffer_path <pathtobuffer> flush_interval 10s buffer_chunk_limit 1m # Logz.io has bulk limit of 10M. We recommend set this to 1M, to avoid oversized bulks </match>
Fine-tune this configuration as follows:
- <token>: Use your token in the token placeholder (which can be found in the Logz.io Settings section)
- <logtype> : Specify the log type (e.g. ‘apache-access’) in the type placeholder. This helps Logz.io to parse and grok your data. A complete list of known types is available here. If your type is not listed here, please let us know.
- <pathtobuffer>: Enter a path to the folder in your file system for which you have full permissions (e.g. /tmp/buffer). The buffer file helps to aggregate logs together and ship them in bulk.
Last but not least, restart Fluentd:
$ sudo /etc/init.d/td-agent restart
That’s it. After a minute or two, your Apache logs will show up in the Logz.io user interface. To create some log files, run this ab command to simulate traffic (you’ll need to place a file on your web server to use first):
$ sudo ab -k -c 350 -n 1000 localhost/<file.html>