With over 58K stars on GitHub and over 2,200 contributors across the globe, Kubernetes is the de facto standard for container orchestration. While solving some of the key challenges involved in running distributed microservices, it has also introduced some new ones.
Not surprisingly, when asked, engineers list monitoring as one of the main obstacles for adopting Kubernetes. After all, monitoring distributed environments has never been easy and Kubernetes adds additional complexity. What isn’t surprising as well is the development of various open-source monitoring solutions to help overcome the challenge.
These tools tackle different aspects of the challenge. Some help with logs, others with metrics. Some are data collectors while others provide an interface for operating Kubernetes from a birdseye view. Some are Kubernetes-native, others are more agnostic in nature. This variety and depth attest to the strength of Kubernetes as an ecosystem and community, and in this article, we’ll take a look at some of the more popular open-source tools available.
There is a long list of open-source time-series databases in the market today — Graphite, InfluxDB, Cassandra, for example, but none are as popular among Kubernetes users as Prometheus is. Initially a SoundCloud project and now part of CNCF (Cloud Native Computing Foundation), Prometheus has emerged as the de-facto open-source standard for monitoring Kubernetes.
In a nutshell, what makes Prometheus stand out among other time-series databases, is its multi-dimensional data model, PromQL (the Prometheus querying language), built-in alerting mechanisms, a pull vs. push model, and of course, the ever-growing community. These differentiators make Prometheus a great solution for Kubernetes users, and the two projects are now closely integrated — users can easily run Prometheus on top of Kubernetes using the Prometheus Operator.
Pros: Kubernetes-native, simple to use, huge community
Cons: Challenges at scale, storage
For slicing and dicing Kubernetes metrics and constructing beautiful monitoring dashboards, Grafana is second to none. When used to monitor Kubernetes, Grafana will usually sit on top of Prometheus, although Grafana together with InfluxDB or Graphite are also common setups.
There are a number of reasons Grafana is so popular, its ability to integrate with a long list of data sources being one of them. Grafana is extremely robust, featuring a long list of capabilities such as alerts, annotations, filtering, data source-specific querying, visualization and dashboarding, authentication/authorization, cross-organizational collaboration, and plenty more.
Grafana is also super easy to set up on Kubernetes — there are numerous deployment specifications that include a Grafana container by default and there are plenty of Kubernetes monitoring dashboards for Grafana available for use.
Pros: Large ecosystem, rich visualization capabilities, alerting
Cons: Not optimized for Kubernetes log management
ELK (ala Elastic Stack)
For logging Kubernetes, the most popular open-source solution is, of course, the ELK Stack. An acronym for Elasticsearch, Logstash and Kibana, ELK also includes a fourth component — Beats, which are lightweight data shippers. Each component in the stack takes care of a different step in the logging pipeline, and together, they all provide a comprehensive and powerful logging solution for Kubernetes.
Logstash is capable of aggregating and processing logs before sending them on for storage. Elasticsearch was designed to be scalable, and will perform well even when storing and searching across millions of documents. Kibana does a great job of providing users with the analysis interface needed to make sense of the data.
All the different components of the stack can be deployed easily into a Kubernetes environment. You can run the components as pods using various deployment configurations or using helm charts. Both Metricbeat and Filebeat can be deployed as daemonsets and will append Kubernetes metadata to the documents.
Pros: Huge community, easy to deploy and use in Kubernetes, rich analysis capabilities
Cons: Difficult to maintain at scale
For log aggregation and processing, another popular solution used by Kubernetes users is Fluentd. Written in Ruby, Fluentd was created to act as a unified logging layer — a one stop component that can aggregate data from multiple sources, unify the differently formatted data into JSON objects, and route it to different output destinations. Fluentd is so widely used that the ELK acronym has been replaced by a new acronym – the EFK Stack.
Fluentd owes its popularity among Kubernetes users to Logstash’s fallacies, especially those performance-related. Design-wise — performance, scalability and reliability are some of Fluentd’s more outstanding features. Adding new inputs or outputs is relatively simple and has little effect on performance. Fluentd uses disk or memory for buffering and queuing to handle transmission failures or data overload and supports multiple configuration options to ensure a more resilient data pipeline.
A more recent spin-off project is Fluent Bit. Similar to ELK’s beats, Fluent Bit is an extremely lightweight data shipper that excels as acting as an agent on edge-hosts, collecting and pushing data down the pipelines. In a Kubernetes cluster, Fluent Bit can be an excellent alternative to Fluentd if you’re limited for CPU and RAM capacity.
Both Fluentd and Fluent Bit are also CNCF projects and Kubernetes-native — they are designed to seamlessly integrate with Kubernetes, enrich data with relevant pod and container metadata, and as mentioned — all this with a low resource footprint.
Pros: Huge plugin ecosystem, performance, reliability
Cons: Difficult to configure
cAdvisor is an open-source agent designed for collecting, processing, and exporting resource usage and performance information about running containers. It’s also built into Kubernetes and integrated into the Kubelet binary.
Unlike other agents, cAdvisor is not deployed per pod but on the node level. It will auto-discover all the containers running on a machine and collects system metrics such as memory, CPU, network, etc.
cAdvisor is one of the more basic open-source, Kubernetes-native monitoring tools out there. It’s easy to use (it exposes Prometheus metrics out-of-the-box) but definitely not robust enough to be considered an all-around monitoring solution.
Pros: Built into Kubernetes, easy to use
Cons: Basic, lacks analytical depth, limited functionality
As the name implies, kubewatch watches for specific Kubernetes events and pushes notifications on these events to various endpoints such as Slack and PagerDuty. More specifically, kubewatch will look for changes made to specific Kubernetes resources that you ask it to watch — daemon sets, deployments, pods, replica sets, replication controllers, services, secrets, and configuration maps. kubewatch is easy to configure and can be deployed using either helm or a custom deployment.
Pros: Supports multiple endpoints, easy to deploy
Cons: Just a watcher
Official documentation for this project clearly states that kube-ops-view is NOT a monitoring tool, so why is it listed here? Well, while it can’t be used to monitor and alert on production issues, it can give you a nice operational picture of your Kubernetes clusters — the different nodes deployed and their status, as well as the different pods running on the nodes.. That’s what it was built for, and only that.
Pros: Simple to use, easy to deploy
Cons: Read-only tool, not for managing Kubernetes resources
This Kubernetes-native metrics service was designed to listen to the Kubernetes API and generate metrics on the state of various objects such as pod, service, deployment, node, etc. A full list of the metrics generated by kube-state-metrics can be found here.
Extremely easy to use, kube-state-metrics is only a metrics service and as such requires a few more bit and pieces to become part of a complete monitoring solution for Kubernetes. kube-state-metrics exports the metrics on the HTTP endpoint /metrics in plaintext format. Those using Prometheus will be happy to learn that the metrics were designed to be easily consumed/scraped.
Pros: Simple to use, Kubernetes-native, integrates seamlessly with Prometheus
Cons: Only an agent for generating metrics
Distributed tracing is gradually becoming a monitoring and troubleshooting best practice for Kubernetes environments. Among the various open-source tracing tools available, Jaeger seems to be leading the pack.
Developed by Uber and open sourced in 2016, Jaeger was actually inspired by other existing tracing tools, Zipkin and Dapper, enabling users to perform root cause analysis, performance optimization and distributed transaction monitoring.
Jaeger features OpenTracing-based instrumentation for Go, Java, Node, Python and C++ apps, uses consistent upfront sampling with individual per service/endpoint probabilities, and supports multiple storage backends — Cassandra, Elasticsearch, Kafka and memory.
There are multiple ways of getting started with Jaeger on Kubernetes. Users can either use the new Jaeger Operator or, if they prefer, a daemonset configuration. There is also an all-in-one deployment available for testing and demoing purposes.
Pros: User interface, various instrumentation options, easy to deploy
Cons: Limited backend integration
Last but not least, Weave Scope is a monitoring tool developed by the folks at Weaveworks that allows you to gain operational insights into your Kubernetes cluster.
This might sound a bit like kube-ops-view, but Weave Scope takes it up a few notches by providing a much nicer user interface, but more importantly, by allowing the user to manage containers and run diagnostic commands on them from within this interface.
It’s an effective tool for gaining context on your deployment. You’ll be able to see the application, the infrastructure it’s deployed on, and the different connections between the different components.
Pros: User interface, zero-configuration
Cons: Lacks analytical depth
This was of course just a partial list of the open-source tools available for monitoring Kubernetes, but if you’re just beginning to design your observability stack for Kubernetes, it’s a good place to start.
With the exception of Jaeger, all the other tools should begin providing value without extra instrumentation or too much configuration. All of these tools are easy to test and deploy — set up a small sandbox environment, start small, and try and understand whether these tools are what you need.
Kubernetes is extremely community-driven. The super-active community contributing to the project continues to add and improve built-in and add-on monitoring capabilities and I have little doubt the near future will see some additional developments. We’ll cover these as they are introduced.