In a previous post, I explained the role Apache Kafka plays in production-grade ELK deployments, as a message broker and a transport layer deployed in front of Logstash. As I mentioned in that piece, Redis is another common option. I recently found out that it is even more popular than Kafka!

Known for its flexibility, performance and wide language support, Redis is used both as a database and cache but also as a message broker. For ELK-based data pipelines, Redis can be placed between Beats and Logstash, as a buffering layer, giving downstream components better chances of processing and indexing the data successfully.

In this article, I’ll show how to deploy all the components required to set up a data pipeline using the ELK Stack and Redis:

  • Filebeat – to collect logs and forward them to Redis
  • Redis – to brokers the data flow and queue it
  • Logstash – to subscribe to Redis, process the data and ship it to Elasticsearch
  • Elasticsearch – to index and store the data
  • Kibana – to analyze the data.

beats to redis

My setup

I installed all the pipeline components on a single Ubuntu 18.04 machine on Amazon EC2 using local storage. Of course, in real-life scenarios, you will probably have some or all of these components installed on separate machines.

I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.

The example logs used for the tutorial are Apache access logs.

Step 1: Installing Elasticsearch

Let’s start with installing the main component in the ELK Stack — Elasticsearch. Since version 7.x, Elasticsearch is bundled with Java so we can jump right ahead with adding Elastic’s signing key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key
add -

For installing Elasticsearch on Debian, we also need to install the apt-transport-https package:

sudo apt-get update
sudo apt-get install apt-transport-https

Our next step is to add the repository definition to our system:

echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo
tee -a /etc/apt/sources.list.d/elastic-7.x.list

All that’s left to do is to update your repositories and install Elasticsearch:

sudo apt-get update && sudo apt-get install elasticsearch

Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml:

sudo su
vim /etc/elasticsearch/elasticsearch.yml

Since we are installing Elasticsearch on AWS, we will bind Elasticsearch to localhost. Also, we need to define the private IP of our EC2 instance as a master-eligible node:

network.host: "localhost"
http.port:9200
cluster.initial_master_nodes: ["<AWSInstancePrivateIP"]

Save the file and run Elasticsearch with:

sudo service elasticsearch start

To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (allow a minute or two for Elasticsearch to start):

{
  "name" : "ip-172-31-26-146",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Oz1na_L6RaWk4euSp1GTgQ",
  "version" : {
    "number" : "7.2.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "508c38a",
    "build_date" : "2019-06-20T15:54:18.811730Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Step 2: Installing Logstash

Next up, the “L” in ELK — Logstash. Logstash will require us to install Java 8:

sudo apt-get install default-jre

Verify Java is installed:

java -version

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

To install Logstash, and since we already defined the repository in the system, simply run:

sudo apt-get install logstash

Next, we will configure a Logstash pipeline that pulls our logs from a Redis channel, processes these logs and ships them on to Elasticsearch for indexing.

Let’s create a new config file:

sudo vim /etc/logstash/conf.d/apache.conf

Paste the following configurations:

input {
  redis {
    host => "localhost"
    key => "apache"
    data_type => "list"
  }
}

filter {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  geoip {
      source => "clientip"
    }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
  }
}

As you can see — we’re using the Logstash Redis input plugin to define the Redis host and the specific Redis channel we want Logstash to pull from. The data_type setting is set to list which means Logstash will use the BLPOP operation to pull from the Redis channel.

Save the file. We will start Logstash later, when we have all the other pieces of the puzzle ready.

Step 3: Installing Kibana

Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana:

sudo apt-get install kibana

We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined:

server.port: 5601
elasticsearch.url: "http://localhost:9200"

These specific configurations tell Kibana which Elasticsearch to connect to and which port to use.

Now, we can start Kibana with:

sudo service kibana start

Open up Kibana in your browser with: http://localhost:5601. You will be presented with the Kibana home page.

Of course, we have no data to analyze yet, but we’re getting there. Bear with me!

Step 4: Installing Filebeat

To collect our Apache access logs, we will be using Filebeat.

To install Filebeat, we will use:

sudo apt-get install filebeat

Let’s open the Filebeat configuration file at: /etc/filebeat/filebeat.yml

sudo vim /etc/filebeat/filebeat.yml

Enter the following configurations:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/apache2/access.log

output.redis:
  hosts: ["localhost"]
  key: "apache"
  db: 0
  timeout: 5
  data_type: "list"

In the input section, we are telling Filebeat what logs to collect — Apache access logs. In the output section, we are telling Filebeat to forward the data to our local Redis server and the relevant channel to subscribe to, “apache”.

The data_type setting is set to list, which in this case means that Filebeat will use RPUSH to push the logs into the Redis channel.

Save the file but don’t start Filebeat yet.

Step 5: Installing Redis

Last but not least, our last installation step — Redis.

Install Redis with:

sudo apt install redis-server

And start it using:

sudo service redis start

To make sure all is running as expected, open a second terminal to access the Redis CLI with:

redis-cli

127.0.0.1:6379>

Step 6: Starting the data pipeline

Finally, now that we have all the components we need in place, it’s time to start our data pipeline.

Before we do that, in our second terminal, let’s access the Redis-CLI monitor mode to be able to see all the Redis operations taking place. This is done by simply entering the following command:

monitor

For now, all you’ll see is an OK message:

OK

Now, let’s switch terminals and start Filebeat:

sudo service filebeat start

As soon as a new Apache access log is collected by Filebeat, the Redis monitor will report that it has been pushed using RPUSH into an “apache” channel:

1562667208.214860 [0 127.0.0.1:34254] "PING"
1562667208.215050 [0 127.0.0.1:34254] "INFO"
1562667208.215416 [0 127.0.0.1:34254] "RPUSH" "apache"
"{\"@timestamp\":\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\
":\"filebeat\",\"type\":\"_doc\",\"version\":\"7.2.0\"},\"agent\":{\"id\"
:\"736b2ac9-9062-4705-9405-f2233250a82e\",\"version\":\"7.2.0\",\"type\":
\"filebeat\",\"ephemeral_id\":\"9df401b8-38ed-4c57-8119-88f72caea021\",
\"hostname\":\"ip-172-31-26-146\"},\"ecs\":{\"version\":\"1.0.0\"},
\"host\":{\"name\":\"ip-172-31-26-146\"},\"log\":{\"file\":{\"path\":
\"/var/log/apache2/access.log\"},\"offset\":691053},\"message\":
\"110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\"GET http:/
/110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\" 400 0 \\\"-\\\"
\\\"-\\\"\",\"input\":{\"type\":\"log\"}}" "{\"@timestamp\"
:\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\":\"filebeat\",
\"type\":\"_doc\",\"version\":\"7.2.0\"},\"log\":{\"offset\":691176,\
"file\":{\"path\":\"/var/log/apache2/access.log\"}},\"message\":\
"110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\"
GET http://110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\"
400 0 \\\"-\\\" \\\"-\\\"\",\"input\":{\"type\":\"log\"},\"ecs\":
{\"version\":\"1.0.0\"},\"host\":{\"name\":\"ip-172-31-26-146\"},
\"agent\":{\"version\":\"7.2.0\",\"type\":\"filebeat\",\"ephemeral_id\":
\"9df401b8-38ed-4c57-8119-88f72caea021\",\"hostname\":\"ip-172-31-26-146\"
,\"id\":\"736b2ac9-9062-4705-9405-f2233250a82e\"}}"

So we know Filebeat is collecting our logs and publishing them to a Redis channel. It’s now time to start Logstash:

sudo service logstash start

After a few seconds, Logstash is started and the Redis monitor will report…

1562695696.555882 [0 127.0.0.1:34464] "script" "load" "local batchsize
= tonumber(ARGV[1])\n local result = redis.call('lrange', KEYS[1], 0,
 batchsize)\n redis.call('ltrim', KEYS[1], batchsize + 1, -1)\n
return result\n"
1562695696.645514 [0 127.0.0.1:34464] "evalsha"
"3236c446d3b876265fe40ac665cb6dc17e6242b0" "1" "apache" "124"
1562695696.645578 [0 lua] "lrange" "apache" "0" "124"
1562695696.645630 [0 lua] "ltrim" "apache" "125" "-1"

It looks like our pipeline is working but to make sure Logstash is indeed aggregating the data and shipping it into Elasticsearch, use:

curl -X GET "localhost:9200/_cat/indices?v"

If all is working as expected, you should see a logstash-* index listed:

health status index                      uuid pri rep docs.count docs.deleted store.size pri.store.size

green  open .kibana_task_manager       EBqPqbkDS4eRBN8F7kQYrw 1 0       2 0 45.5kb 45.5kb

yellow open   logstash-2019.07.09-000001 53zuzPvJQGeVy43qw7gLnA   1 1 3488 0 945.4kb 945.4kb

green  open .kibana_1                  -jmBDdBVS9SiIhvuaIOj_A 1 0       4 0 15.4kb 15.4kb

All we have to do now is define the index pattern in Kibana to begin analysis. This is done under Management → Kibana Index Patterns.

Kibana will identify the index, so simply define it in the relevant field and continue on to the next step of selecting the timestamp field:

Once you create the index pattern, you’ll see a list of all the parsed and mapped fields:

Open the Discover page to begin analyzing your data!

Kibana Discover

Summing it up

The last thing you need when troubleshooting an issue in production is your logging pipelines crashing. Unfortunately, when issues occur is precisely the time when all the components in the ELK Stack come under pressure.

Message brokers like Redis and Kafka help with dealing with sudden data bursts and to relieve the pressure from downstream components. There are of course some differences between these two tools, and I recommend taking a look at this article to help you choose between them.

Happy message brokering!

Get started for free

Completely free for 14 days, no strings attached.