Deploying Redis with the ELK Stack

July 16, 2019
Deploying Redis with the ELK Stack

    In a previous post, I explained the role Apache Kafka plays in production-grade ELK deployments, as a message broker and a transport layer deployed in front of Logstash. As I mentioned in that piece, Redis is another common option. I recently found out that it is even more popular than Kafka!

    Known for its flexibility, performance and wide language support, Redis is used both as a database and cache but also as a message broker. For ELK-based data pipelines, Redis can be placed between Beats and Logstash, as a buffering layer, giving downstream components better chances of processing and indexing the data successfully.

    In this article, I’ll show how to deploy all the components required to set up a data pipeline using the ELK Stack and Redis:

    • Filebeat – to collect logs and forward them to Redis
    • Redis – to brokers the data flow and queue it
    • Logstash – to subscribe to Redis, process the data and ship it to Elasticsearch
    • Elasticsearch – to index and store the data
    • Kibana – to analyze the data.

    beats to redis

    My setup

    I installed all the pipeline components on a single Ubuntu 18.04 machine on Amazon EC2 using local storage. Of course, in real-life scenarios, you will probably have some or all of these components installed on separate machines.

    I started the instance in the public subnet of a VPC and then set up a security group to enable access from anywhere using SSH and TCP 5601 (for Kibana). Finally, I added a new elastic IP address and associated it with the running instance.

    The example logs used for the tutorial are Apache access logs.

    Step 1: Installing Elasticsearch

    Let’s start with installing the main component in the ELK Stack — Elasticsearch. Since version 7.x, Elasticsearch is bundled with Java so we can jump right ahead with adding Elastic’s signing key:

    wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key
    add -
    

    For installing Elasticsearch on Debian, we also need to install the apt-transport-https package:

    sudo apt-get update
    sudo apt-get install apt-transport-https

    Our next step is to add the repository definition to our system:

    echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo
    tee -a /etc/apt/sources.list.d/elastic-7.x.list

    All that’s left to do is to update your repositories and install Elasticsearch:

    sudo apt-get update && sudo apt-get install elasticsearch
    

    Before we bootstrap Elasticsearch, we need to apply some basic configurations using the Elasticsearch configuration file at: /etc/elasticsearch/elasticsearch.yml:

    sudo su
    vim /etc/elasticsearch/elasticsearch.yml

    Since we are installing Elasticsearch on AWS, we will bind Elasticsearch to localhost. Also, we need to define the private IP of our EC2 instance as a master-eligible node:

    network.host: "localhost"
    http.port:9200
    cluster.initial_master_nodes: ["<AWSInstancePrivateIP"]

    Save the file and run Elasticsearch with:

    sudo service elasticsearch start

    To confirm that everything is working as expected, point curl to: http://localhost:9200, and you should see something like the following output (allow a minute or two for Elasticsearch to start):

    {
      "name" : "ip-172-31-26-146",
      "cluster_name" : "elasticsearch",
      "cluster_uuid" : "Oz1na_L6RaWk4euSp1GTgQ",
      "version" : {
        "number" : "7.2.0",
        "build_flavor" : "default",
        "build_type" : "deb",
        "build_hash" : "508c38a",
        "build_date" : "2019-06-20T15:54:18.811730Z",
        "build_snapshot" : false,
        "lucene_version" : "8.0.0",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }

    Step 2: Installing Logstash

    Next up, the “L” in ELK — Logstash. Logstash will require us to install Java 8:

    sudo apt-get install default-jre

    Verify Java is installed:

    java -version
    
    openjdk version "1.8.0_191"
    OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
    OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
    

    To install Logstash, and since we already defined the repository in the system, simply run:

    sudo apt-get install logstash

    Next, we will configure a Logstash pipeline that pulls our logs from a Redis channel, processes these logs and ships them on to Elasticsearch for indexing.

    Let’s create a new config file:

    sudo vim /etc/logstash/conf.d/apache.conf

    Paste the following configurations:

    input {
      redis {
        host => "localhost"
        key => "apache"
        data_type => "list"
      }
    }
    
    filter {
        grok {
          match => { "message" => "%{COMBINEDAPACHELOG}" }
        }
        date {
        match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
        }
      geoip {
          source => "clientip"
        }
    }
    
    output {
      elasticsearch {
        hosts => ["localhost:9200"]
      }
    }

    As you can see — we’re using the Logstash Redis input plugin to define the Redis host and the specific Redis channel we want Logstash to pull from. The data_type setting is set to list which means Logstash will use the BLPOP operation to pull from the Redis channel.

    Save the file. We will start Logstash later, when we have all the other pieces of the puzzle ready.

    Step 3: Installing Kibana

    Let’s move on to the next component in the ELK Stack — Kibana. As before, we will use a simple apt command to install Kibana:

    sudo apt-get install kibana

    We will then open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure we have the correct configurations defined:

    server.port: 5601
    elasticsearch.url: "http://localhost:9200"

    These specific configurations tell Kibana which Elasticsearch to connect to and which port to use.

    Now, we can start Kibana with:

    sudo service kibana start

    Open up Kibana in your browser with: http://localhost:5601. You will be presented with the Kibana home page.

    Of course, we have no data to analyze yet, but we’re getting there. Bear with me!

    Step 4: Installing Filebeat

    To collect our Apache access logs, we will be using Filebeat.

    To install Filebeat, we will use:

    sudo apt-get install filebeat

    Let’s open the Filebeat configuration file at: /etc/filebeat/filebeat.yml

    sudo vim /etc/filebeat/filebeat.yml

    Enter the following configurations:

    filebeat.inputs:
    - type: log
      enabled: true
      paths:
        - /var/log/apache2/access.log
    
    output.redis:
      hosts: ["localhost"]
      key: "apache"
      db: 0
      timeout: 5
      data_type: "list"

    In the input section, we are telling Filebeat what logs to collect — Apache access logs. In the output section, we are telling Filebeat to forward the data to our local Redis server and the relevant channel to subscribe to, “apache”.

    The data_type setting is set to list, which in this case means that Filebeat will use RPUSH to push the logs into the Redis channel.

    Save the file but don’t start Filebeat yet.

    Step 5: Installing Redis

    Last but not least, our last installation step — Redis.

    Install Redis with:

    sudo apt install redis-server

    And start it using:

    sudo service redis start

    To make sure all is running as expected, open a second terminal to access the Redis CLI with:

    redis-cli
    
    127.0.0.1:6379>

    Step 6: Starting the data pipeline

    Finally, now that we have all the components we need in place, it’s time to start our data pipeline.

    Before we do that, in our second terminal, let’s access the Redis-CLI monitor mode to be able to see all the Redis operations taking place. This is done by simply entering the following command:

    monitor

    For now, all you’ll see is an OK message:

    OK

    Now, let’s switch terminals and start Filebeat:

    sudo service filebeat start

    As soon as a new Apache access log is collected by Filebeat, the Redis monitor will report that it has been pushed using RPUSH into an “apache” channel:

    1562667208.214860 [0 127.0.0.1:34254] "PING"
    1562667208.215050 [0 127.0.0.1:34254] "INFO"
    1562667208.215416 [0 127.0.0.1:34254] "RPUSH" "apache"
    "{\"@timestamp\":\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\
    ":\"filebeat\",\"type\":\"_doc\",\"version\":\"7.2.0\"},\"agent\":{\"id\"
    :\"736b2ac9-9062-4705-9405-f2233250a82e\",\"version\":\"7.2.0\",\"type\":
    \"filebeat\",\"ephemeral_id\":\"9df401b8-38ed-4c57-8119-88f72caea021\",
    \"hostname\":\"ip-172-31-26-146\"},\"ecs\":{\"version\":\"1.0.0\"},
    \"host\":{\"name\":\"ip-172-31-26-146\"},\"log\":{\"file\":{\"path\":
    \"/var/log/apache2/access.log\"},\"offset\":691053},\"message\":
    \"110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\"GET http:/
    /110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\" 400 0 \\\"-\\\"
    \\\"-\\\"\",\"input\":{\"type\":\"log\"}}" "{\"@timestamp\"
    :\"2019-07-09T10:12:30.742Z\",\"@metadata\":{\"beat\":\"filebeat\",
    \"type\":\"_doc\",\"version\":\"7.2.0\"},\"log\":{\"offset\":691176,\
    "file\":{\"path\":\"/var/log/apache2/access.log\"}},\"message\":\
    "110.249.212.46 - - [09/Jul/2019:10:12:28 +0000] \\\"
    GET http://110.249.212.46/testget?q=23333&port=80 HTTP/1.1\\\"
    400 0 \\\"-\\\" \\\"-\\\"\",\"input\":{\"type\":\"log\"},\"ecs\":
    {\"version\":\"1.0.0\"},\"host\":{\"name\":\"ip-172-31-26-146\"},
    \"agent\":{\"version\":\"7.2.0\",\"type\":\"filebeat\",\"ephemeral_id\":
    \"9df401b8-38ed-4c57-8119-88f72caea021\",\"hostname\":\"ip-172-31-26-146\"
    ,\"id\":\"736b2ac9-9062-4705-9405-f2233250a82e\"}}"
    

    So we know Filebeat is collecting our logs and publishing them to a Redis channel. It’s now time to start Logstash:

    sudo service logstash start

    After a few seconds, Logstash is started and the Redis monitor will report…

    1562695696.555882 [0 127.0.0.1:34464] "script" "load" "local batchsize
    = tonumber(ARGV[1])\n local result = redis.call('lrange', KEYS[1], 0,
     batchsize)\n redis.call('ltrim', KEYS[1], batchsize + 1, -1)\n
    return result\n"
    1562695696.645514 [0 127.0.0.1:34464] "evalsha"
    "3236c446d3b876265fe40ac665cb6dc17e6242b0" "1" "apache" "124"
    1562695696.645578 [0 lua] "lrange" "apache" "0" "124"
    1562695696.645630 [0 lua] "ltrim" "apache" "125" "-1"

    It looks like our pipeline is working but to make sure Logstash is indeed aggregating the data and shipping it into Elasticsearch, use:

    curl -X GET "localhost:9200/_cat/indices?v"

    If all is working as expected, you should see a logstash-* index listed:

    health status index                      uuid pri rep docs.count docs.deleted store.size pri.store.size
    
    green  open .kibana_task_manager       EBqPqbkDS4eRBN8F7kQYrw 1 0       2 0 45.5kb 45.5kb
    
    yellow open   logstash-2019.07.09-000001 53zuzPvJQGeVy43qw7gLnA   1 1 3488 0 945.4kb 945.4kb
    
    green  open .kibana_1                  -jmBDdBVS9SiIhvuaIOj_A 1 0       4 0 15.4kb 15.4kb

    All we have to do now is define the index pattern in Kibana to begin analysis. This is done under Management → Kibana Index Patterns.

    Kibana will identify the index, so simply define it in the relevant field and continue on to the next step of selecting the timestamp field:

    Once you create the index pattern, you’ll see a list of all the parsed and mapped fields:

    Open the Discover page to begin analyzing your data!

    Kibana Discover

    Summing it up

    The last thing you need when troubleshooting an issue in production is your logging pipelines crashing. Unfortunately, when issues occur is precisely the time when all the components in the ELK Stack come under pressure.

    Message brokers like Redis and Kafka help with dealing with sudden data bursts and to relieve the pressure from downstream components. There are of course some differences between these two tools, and I recommend taking a look at this article to help you choose between them.

    Happy message brokering!

    Get started for free

    Completely free for 14 days, no strings attached.