Deploying Redis with the ELK Stack
In a previous post, I explained the role Apache Kafka plays in production-grade ELK deployments, as a message broker and a transport layer deployed in front of Logstash. As I mentioned in that piece, Redis is another common option. I recently found out that it is even more popular than Kafka! More on the subject:
Known for its flexibility, performance and wide language support, Redis is used both as a database and cache but also as a message broker. For ELK-based data pipelines, Redis can be placed between Beats and Logstash, as a buffering layer, giving downstream components better chances of processing and indexing the data successfully.
In this article, I’ll show how to deploy all the components required to set up a data pipeline using the ELK Stack and Redis:
- Filebeat – to collect logs and forward them to Redis
- Redis – to brokers the data flow and queue it
- Logstash – to subscribe to Redis, process the data and ship it to Elasticsearch
- Elasticsearch – to index and store the data
- Kibana – to analyze the data.
My setup
I installed all the pipeline components on a single Ubuntu 18.04 machine on Amazon EC2 using local storage. Of course, in real-life scenarios, you will probably have some or all of these components installed on separate machines.
I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.
The example logs used for the tutorial are Apache access logs.
Step 1: Installing Elasticsearch
Let’s start with installing the main component in the ELK Stack — Elasticsearch. Since version 7.x, Elasticsearch is bundled with Java so we can jump right ahead with adding Elastic’s signing key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
For installing Elasticsearch on Debian, we also need to install the apt-transport-https package:
sudo apt-get update sudo apt-get install apt-transport-https
Our next step is to add the repository definition to our system:
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
All that’s left to do is to update your repositories and install Elasticsearch:
sudo apt-get update && sudo apt-get install elasticsearch
Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml:
sudo su vim /etc/elasticsearch/elasticsearch.yml
Since we are installing Elasticsearch on AWS, we will bind Elasticsearch to localhost. Also, we need to define the private IP of our EC2 instance as a master-eligible node:
network.host: "localhost" http.port:9200 cluster.initial_master_nodes: ["<AWSInstancePrivateIP"]
Save the file and run Elasticsearch with:
sudo service elasticsearch start
To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (allow a minute or two for Elasticsearch to start):
{ "name" : "ip-172-31-26-146", "cluster_name" : "elasticsearch", "cluster_uuid" : "Oz1na_L6RaWk4euSp1GTgQ", "version" : { "number" : "7.2.0", "build_flavor" : "default", "build_type" : "deb", "build_hash" : "508c38a", "build_date" : "2019-06-20T15:54:18.811730Z", "build_snapshot" : false, "lucene_version" : "8.0.0", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
Step 2: Installing Logstash
Next up, the “L” in ELK — Logstash. Logstash will require us to install Java 8:
sudo apt-get install default-jre
Verify Java is installed:
java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
To install Logstash, and since we already defined the repository in the system, simply run:
sudo apt-get install logstash
Next, we will configure a Logstash pipeline that pulls our logs from a Redis channel, processes these logs and ships them on to Elasticsearch for indexing.
Let’s create a new config file:
sudo vim /etc/logstash/conf.d/apache.conf
Paste the following configurations:
input { redis { host => "localhost" key => "apache" data_type => "list" } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "clientip" } } output { elasticsearch { hosts => ["localhost:9200"] } }
As you can see — we’re using the Logstash Redis input plugin to define the Redis host and the specific Redis channel we want Logstash to pull from. The data_type setting is set to list which means Logstash will use the BLPOP operation to pull from the Redis channel.
Save the file. We will start Logstash later, when we have all the other pieces of the puzzle ready.
Step 3: Installing Kibana
Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana:
sudo apt-get install kibana
We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined:
server.port: 5601 elasticsearch.url: "http://localhost:9200"
These specific configurations tell Kibana which Elasticsearch to connect to and which port to use.
Now, we can start Kibana with:
sudo service kibana start
Open up Kibana in your browser with: http://localhost:5601. You will be presented with the Kibana home page.
Of course, we have no data to analyze yet, but we’re getting there. Bear with me!
Step 4: Installing Filebeat
To collect our Apache access logs, we will be using Filebeat.
To install Filebeat, we will use:
sudo apt-get install filebeat
Let’s open the Filebeat configuration file at: /etc/filebeat/filebeat.yml
sudo vim /etc/filebeat/filebeat.yml
Enter the following configurations:
filebeat.inputs: - type: log enabled: true paths: - /var/log/apache2/access.log output.redis: hosts: ["localhost"] key: "apache" db: 0 timeout: 5 data_type: "list"
In the input section, we are telling Filebeat what logs to collect — Apache access logs. In the output section, we are telling Filebeat to forward the data to our local Redis server and the relevant channel to subscribe to, “apache”.
The data_type setting is set to list, which in this case means that Filebeat will use RPUSH to push the logs into the Redis channel.
Save the file but don’t start Filebeat yet.
Step 5: Installing Redis
Last but not least, our last installation step — Redis.
Install Redis with:
sudo apt install redis-server
And start it using:
sudo service redis start
To make sure all is running as expected, open a second terminal to access the Redis CLI with:
redis-cli 127.0.0.1:6379>
Step 6: Starting the data pipeline
Finally, now that we have all the components we need in place, it’s time to start our data pipeline.
Before we do that, in our second terminal, let’s access the Redis-CLI monitor mode to be able to see all the Redis operations taking place. This is done by simply entering the following command:
monitor
For now, all you’ll see is an OK message:
OK
Now, let’s switch terminals and start Filebeat:
sudo service filebeat start
As soon as a new Apache access log is collected by Filebeat, the Redis monitor will report that it has been pushed using RPUSH into an “apache” channel:
1562667208.214860 [0 127.0.0.1:34254] "PING" 1562667208.215050 [0 127.0.0.1:34254] "INFO" 1562667208.215416 [0 127.0.0.1:34254] "RPUSH" "apache" "{\"@timestamp\":\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\ ":\"filebeat\",\"type\":\"_doc\",\"version\":\"7.2.0\"},\"agent\":{\"id\" :\"736b2ac9-9062-4705-9405-f2233250a82e\",\"version\":\"7.2.0\",\"type\": \"filebeat\",\"ephemeral_id\":\"9df401b8-38ed-4c57-8119-88f72caea021\", \"hostname\":\"ip-172-31-26-146\"},\"ecs\":{\"version\":\"1.0.0\"}, \"host\":{\"name\":\"ip-172-31-26-146\"},\"log\":{\"file\":{\"path\": \"/var/log/apache2/access.log\"},\"offset\":691053},\"message\": \"110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\"GET http:/ /110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\" 400 0 \\\"-\\\" \\\"-\\\"\",\"input\":{\"type\":\"log\"}}" "{\"@timestamp\" :\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\":\"filebeat\", \"type\":\"_doc\",\"version\":\"7.2.0\"},\"log\":{\"offset\":691176,\ "file\":{\"path\":\"/var/log/apache2/access.log\"}},\"message\":\ "110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\" GET http://110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\" 400 0 \\\"-\\\" \\\"-\\\"\",\"input\":{\"type\":\"log\"},\"ecs\": {\"version\":\"1.0.0\"},\"host\":{\"name\":\"ip-172-31-26-146\"}, \"agent\":{\"version\":\"7.2.0\",\"type\":\"filebeat\",\"ephemeral_id\": \"9df401b8-38ed-4c57-8119-88f72caea021\",\"hostname\":\"ip-172-31-26-146\" ,\"id\":\"736b2ac9-9062-4705-9405-f2233250a82e\"}}"
So we know Filebeat is collecting our logs and publishing them to a Redis channel. It’s now time to start Logstash:
sudo service logstash start
After a few seconds, Logstash is started and the Redis monitor will report…
1562695696.555882 [0 127.0.0.1:34464] "script" "load" "local batchsize = tonumber(ARGV[1])\n local result = redis.call('lrange', KEYS[1], 0, batchsize)\n redis.call('ltrim', KEYS[1], batchsize + 1, -1)\n return result\n" 1562695696.645514 [0 127.0.0.1:34464] "evalsha" "3236c446d3b876265fe40ac665cb6dc17e6242b0" "1" "apache" "124" 1562695696.645578 [0 lua] "lrange" "apache" "0" "124" 1562695696.645630 [0 lua] "ltrim" "apache" "125" "-1"
It looks like our pipeline is working but to make sure Logstash is indeed aggregating the data and shipping it into Elasticsearch, use:
curl -X GET "localhost:9200/_cat/indices?v"
If all is working as expected, you should see a logstash-* index listed:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .kibana_task_manager EBqPqbkDS4eRBN8F7kQYrw 1 0 2 0 45.5kb 45.5kb yellow open logstash-2019.07.09-000001 53zuzPvJQGeVy43qw7gLnA 1 1 3488 0 945.4kb 945.4kb green open .kibana_1 -jmBDdBVS9SiIhvuaIOj_A 1 0 4 0 15.4kb 15.4kb
All we have to do now is define the index pattern in Kibana to begin analysis. This is done under Management → Kibana Index Patterns.
Kibana will identify the index, so simply define it in the relevant field and continue on to the next step of selecting the timestamp field:
Once you create the index pattern, you’ll see a list of all the parsed and mapped fields:
Open the Discover page to begin analyzing your data!
Summing it up
The last thing you need when troubleshooting an issue in production is your logging pipelines crashing. Unfortunately, when issues occur is precisely the time when all the components in the ELK Stack come under pressure.
Message brokers like Redis and Kafka help with dealing with sudden data bursts and to relieve the pressure from downstream components. There are of course some differences between these two tools, and I recommend taking a look at this article to help you choose between them.
Happy message brokering!
Get started for free
Completely free for 14 days, no strings attached.