Monitoring Kubernetes with Grafana and InfluxDB
Most software architects and developers know that they need to monitor their systems. What often prevents them from implementing an effective monitoring solution is the plethora of choices they face. To set up a monitoring solution, you first have to decide what you want to monitor and how you want to monitor it. Then you have to settle for a collection and storage method. Next, you have to implement the interesting metrics in your system and start collecting them. Finally, you have to figure out a way to visualize the collected metrics. The solution to the last problem can often be straightforward: just use Grafana. With its pluggable backends, Grafana can visualize data from multiple monitoring solutions. It presents nice dashboards and comes with built-in alerting. Plug in your favorite data source, and you’re ready to go.
InfluxDB is a time series database optimized for high-availability storage and rapid retrieval of time series data. It can work as a stand-alone solution, or it can be used to process data from Graphite. In addition to monitoring, InfluxDB is used for the Internet of things, sensor data, and home automation solutions. Alternatively, you can check our article on monitoring with Prometheus and our comparison of Metricbeat and Telegraf. This Grafana tutorial will describe the setup and configuration of an InfluxDB monitoring system used in conjunction with Grafana. Both will run inside a Kubernetes cluster.
Deploying InfluxDB and Grafana to Kubernetes
If you already have a cluster running, make sure your kubectl is configured to use it. If you don’t, you can use a local solution such as minikube, microk8s or k3s. We will use Helm charts to install both the InfluxDB and Grafana packages. To use Helm, we first need the Helm client itself. Depending on the operating system, we can go with on macOS or Linux:
brew install helm
or for Windows:
choco install kubernetes-helm
The next step is to use Helm to install charts for Grafana and InfluxDB:
helm repo add stable https://kubernetes-charts.storage.googleapis.com/ helm install stable/influxdb --name-template influxdb helm install stable/grafana --name-template grafana
After a few minutes of installation, you should be able to forward the Grafana port and open the browser. The kubectl command for port-forwarding is:
kubectl port-forward $(kubectl get pods -l "app=grafana,release=grafana" -o jsonpath="{.items[0].metadata.name}") 3000:3000
After opening the browser and navigating to http://localhost:3000, you should be welcomed by the Grafana login page. In order to access the panel, you have to log in with user admin-user and password admin-password.
Example Heapster Configuration
You can use Heapster to gather metrics from the Kubernetes cluster. If your cluster doesn’t have Heapster running, you can use Helm to install it manually using this command:
helm install --name-template heapster stable/heapster
You want Heapster to feed the data into InfluxDB. This requires a slight change in Heapster’s configuration. First, open an editor with Heapster deployment using:
kubectl edit deployment heapster
Inside the editor, search for the spec.containers.command section, and add the following as the last line of the command:
--sink=influxdb:http://influxdb-influxdb.default:8086
Then, go to the Grafana dashboard, add the InfluxDB data source pointing to http://influxdb-influxdb.default:8086, and select k8s as a database. Grafana should now be ready to present dashboards based on InfluxDB values.
Adding Grafana Dashboards
Now, you’ll want to make use of the beautiful Grafana dashboard to make monitoring clean and visually pleasing.
InfluxDB uses a SQL-like syntax for querying data from its store called InfluxQL. To visualize a single query result, open a query editor and try the example:
SELECT sum("value") FROM "cpu/usage_rate" WHERE ("type" = 'node') AND $timeFilter GROUP BY time($__interval) fill(null)
The above query will graph CPU usage of the heapster pod in the namespace kube-system. To get the memory usage, you can try another expression:
SELECT sum("value") FROM "memory/usage" WHERE ("type" = 'node') AND $timeFilter GROUP BY time($__interval) fill(null)
This will allow you to see what the graphs look like. You can now go to the Create New Dashboard dialog box and add both graphs there. The query editor features a value explorer which makes it easier to select values that are interesting to you.
In addition to creating your own dashboards, you can use one of the many user-generated options available on the Grafana website. In the dashboard gallery, you will find a lot of InfluxDB examples. Some are suitable for monitoring, while others are aimed at IoT and sensors. If you want more inspiration for configuring monitoring dashboards, check out our article about Grafana templates.
Grafana Alerts
One of the reasons we implement monitoring solutions is to know when our systems and applications behave in unexpected ways. We anticipate that things will eventually go wrong, and we want to be ready when they do. That’s why monitoring often goes hand-in-hand with alerting.
Grafana has built-in alerting support. Each time you add a graph to a dashboard, you can also configure alerting. This can be done manually by setting the desired conditions, or you can use the graph to set the warning and alert thresholds in a visual way.
The supported notification integrations include email, Slack, PagerDuty, and Webhooks, among others.
There’s also a single place where you can see all of the configured alerts and their states. In the Alerting tab, green alerts are the ones that haven’t been triggered. When the alert condition is met, they turn red. Grey indicates that the state of the alert is yet to be determined.
Future Improvements
InfluxDB can be used to effectively monitor your systems, and, when used with Grafana, it can also present easy-to-read graphs and manage alerting. In this blog post, we used a single input from Heapster to feed InfluxDB.
However, there are other related technologies that make better use of InfluxDB capabilities:
1) Telegraf, a collection and reporting agent;
2) Chronograf, a dashboard not unlike Grafana; and
3) Kapacitor, a data streaming and processing engine are three such technologies.
Together, the four form the TICK stack.
If you like the features InfluxDB provides and want to base your monitoring solution around it, take a look at Telegraf. For those interested in anomaly detection or building machine learning models for monitoring, Kapacitor may be useful as well. Finally, Chronograf provides a complete UI for your entire stack—which might enable you to replace Grafana altogether.
Get started for free
Completely free for 14 days, no strings attached.