The current popularity of Redis is well deserved; it’s one of the best caching engines available and it addresses numerous use cases – including distributed locking, geospatial indexing, rate limiting, and more. Redis is so widely used today that many major cloud providers, including The Big 3 — offer it as one of their managed services. In this article, we’ll look at how to monitor Redis performance using Prometheus, the similarly popular open-source monitoring system.
Redis produces a wealth of data, and generating high quantities of such data requires effective management and monitoring. This is where Prometheus proves extremely helpful as it can collect, store, read, and monitor metrics from the locations that are indicated in a properly designed configuration file.
With Prometheus, we can add queries using PromQL (Prometheus Query Language) and then monitor those metrics – including via the use of related alerts — with help of an Alertmanager. To get started monitoring Redis with Prometheus, we need an exporter that can extract data from Redis and then expose it as Prometheus metrics. This open-source exporter does the job, for example. Next, let’s set up Prometheus, starting with installing the Redis exporter.
Installing the exporter on virtual machines is a two-step process:
1. Build and install the Redis exporter.
git clone https://github.com/oliver006/redis_exporter.git cd redis_exporter go build . ./redis_exporter --version
This will run the exporter in our virtual machine. We can then check port 9121 for the metrics endpoint.
2. Configure Prometheus.
3. Now, let’s configure Prometheus to start scraping this target, using the following configuration:
scrape_configs: - job_name: redis_exporter static_configs: - targets: ['<<REDIS-EXPORTER-HOSTNAME>>:9121']
Prometheus will now begin scraping the metrics. We can plot these metrics in this open source Grafana dashboard.
For the widely used Kubernetes open-source container orchestration system, we have to run Redis in pods. To do so, we’ll attach a disk to the involved Redis pod to address the persistent data, and export Redis port 6379. The Redis exporter needs to talk to Redis in order to run queries and draw down metrics. There are two ways to run the exporter:
1. Deploy a sidecar exporter in the same Redis pod. This way, the Redis exporter can talk to the Redis service on the localhost port.
2. Run a separate set of pods and point it to the Redis pod. With this option, we have to take care of the problem of supplying the Redis endpoint to the Redis exporter, which is generally far more simple than the first method.
At the same time, the first method is usually the better choice, because the Redis exporter can read on the localhost 6379 port for Redis servers. The second option can prove useful if we are concerned that the Redis sidecar might cause issues with the main container. It’s also a good choice if we are running Redis out of Kubernetes, but want to monitor using an exporter running on Kubernetes.
Here are the configurations we need in order to run the Redis exporter pod as a sidecar container.
apiVersion: apps/v1 kind: StatefulSet metadata: name: web spec: serviceName: "redis" replicas: 2 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:latest ports: - containerPort: 6379 name: redis-port volumeMounts: - name: data mountPath: /data - name: redis-exporter image: oliver006/redis_exporter ports: - containerPort: 9121 name: exporter-port volumeMounts: - name: data mountPath: /data volumeClaimTemplates: - metadata: name: data spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi
With the above yaml configuration, we can now deploy Redis as a stateful application on Kubernetes operating two containers. One of these will serve Redis traffic at 6379 port and the other—at 9121—will export Redis metrics.
After this, there are two ways we can scrape metrics with the help of Prometheus, depending on whether we’re running Prometheus with its own operator, or running it on Kubernetes. In the former instance, we can write the scrape configuration in the Prometheus configuration file. In the latter – using Kubernetes – we need to review the service monitor concept for our metrics. Let’s explore both methods.
This one’s easy. We simply need to put the following annotation on our pod and Prometheus will start scraping the metrics from that pod.
annotations: prometheus.io/path: /metrics prometheus.io/scrape: "true"
When we’re running the Prometheus operator, we first need to deploy the service monitor objects. Once the below yaml is deployed, Prometheus will start scraping the metrics from the Redis exporters.
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor Metadata: labels: app: prometheus-redis-exporter name: redis-exporter-prometheus-redis-exporter namespace: monitor spec: endpoints: - interval: 15s port: redis-exporter selector: matchLabels: app: redis
Now that we’re familiar with the two common modes of monitoring Redis with help of Prometheus on Kubernetes, let’s look at which metrics we need to monitor closely to ensure we’re implementing related best practices.
This is the number of clients that are connected to the Redis server. Any abnormal increase in this value can indicate an increase in instances of the application server or connection not being closed properly. The default max of connected clients is 10,000. It is important to monitor this metric because once this number is reached, no further clients can connect with Redis, meaning that errors might appear on the client side.
This metric is relevant only when we are running Redis in cluster mode. It provides an overview of the entire cluster, including any details about which we might be concerned.
Network utilization indicates the amount of data being transferred in and out of the Redis server. It’s important to make sure that this value remains below the maximum bandwidth of our machines in the cluster or master slave mode.
Redis keeps all of its data in memory. The benefit is that the data can be served quickly, but the potential pitfall is that we need to be careful not to exceed the maximum memory that is allocated to the involved machine. Exceeding the memory limit would slow down reads, and writes may even cease altogether, depending upon our eviction configuration.
Redis continuously persists the data that is in the memory to the disk, so that it can recover that data when needed. During a BGSAVE, which results in a background database backup, Redis requires extra memory. This can cause problems if the throughput increases while BGSAVE is being performed, since increased throughput also draws on memory.
When monitoring Redis memory consumption, we need to monitor BGSAVE memory demands as part of our calculations. Our combined memory usage should be less than Redis’ maximum memory.
Eviction keys are the keys that are about to expire. Keeping an eye on them will ensure that even if memory utilization maxes out, these eviction keys will be lost but, importantly, our writes will continue on our clusters or master/slave.
These are the number of GET and SET operations that we are performing on our cluster. This can also tell us about the data pattern that we are saving. Redis is used as a cache, which means that there will be more GET operations than SET operations. If we see more SET operations it’s time to fundamentally rethink the way that we’re using Redis.
These are the number of queries that produced slow results. It’s important to identify if latency is a result of bad queries or a bad data pattern in Redis. Redis itself doesn’t offer a metric to identify latency, but slow logs can indicate where latency may be found. We also need to instrument the latency from the application side, as these slow queries do not include the network latency that the application can see.
Throughput indicates the number of commands that are issued to Redis. This helps us to identify how much data Redis is processing over a specific time period.
As we’ve seen, connected clients, cluster status, network I/O, memory utilization (memory for Redis data and BGSAVE), eviction keys, GET/SET operations, slow logs/latency, and throughput are the most important metrics to review. Any anomaly in these metrics indicates a reduction in Redis’ performance, meaning that a change in one of these key metrics requires proper investigation and resolution.
Prometheus is easy to install and run when we’re dealing with a small-scale operation. However, as we scale to larger volumes, Prometheus becomes exponentially more complex to manage. So, we need to think about caching, HA, horizontal scaling, and so on, and should employ various additional tools such as Thanos and Trickster to cover the broader use case, to name just two.
Additionally, Prometheus separates metric data from log and trace data – which are also needed to gain full observability into your Redis health and performance. Monitoring Redis is far easier and much more meaningful when logs, metrics, and traces are unified in a single location.
To make scaling manageable, we can send our metrics to Logz.io — a full-stack observability platform designed with optimal scaling in mind. This way we can address metrics management along with related logs and traces to enable rapid data correlation.
Sending logs to Logz.io is simple and simply requires us to leverage the remote write feature of Prometheus itself. Here’s how:
1. Create a Logz.io account.
2. Select the correct region and listener.
3. Get a metrics account token to push metrics to Logz.io. This token can be found under Settings > Manage tokens > Data shipping tokens > Metrics.
4. Configure Prometheus to start sending the metrics to Logz.io servers using the following configs:
global: external_labels: p8s_logzio_name: <labelvalue> remote_write: - url: https://<<LISTENER-HOST>>:8053 bearer_token: <<PROMETHEUS-METRICS-SHIPPING-TOKEN>> remote_timeout: 30s queue_config: batch_send_deadline: 5s #default = 5s max_shards: 10 #default = 1000 min_shards: 1 max_samples_per_send: 500 #default = 100 capacity: 10000 #default = 500
5. Restart the Prometheus server. Metrics will now be flowing into the Logz.io dashboard, which can be built using Metrics Explore.
Now that we’re up and running, we can build alerts on top of these metrics for real-time notification of anomalies, enabling fast remediation and performance improvements.
Below is a Redis monitoring dashboards we built in Logz.io.
As a critical infrastructure component, any issues with Redis performance can have a knock-on effect, leading to problems including latency and higher database call volumes. Prometheus is a useful time-series database when it comes to saving metrics, but has some limitations when applied at scale.
Offering easy integration, rapid time to value, and Prometheus-as-a-Service as part of our Open 360™ platform, Logz.io helps dramatically in managing Prometheus logs and greatly simplifies monitoring at scale. Logz.io also offers Redis logging using the ELK stack. Check out a demo or get started with a Logz.io free trial today!