Grafana & Prometheus: A Match Made in Heaven?

By: Daniel Berman

December 24, 2019

Grafana & Prometheus: A Match Made in Heaven?

Prometheus and Grafana are two monitoring tools that, in combination, provide all of the information DevOps and Dev teams need to build and maintain applications. Prometheus collects many types of metrics from almost every variety of service written in any development language, and open source grafana effectively queries, visualizes, and processes these metrics.

Together, these two tools serve the needs of most R&D groups supporting on-premises or cloud applications—from organizations that do not have high service-level objectives (SLOs) to businesses with mission-critical production environments and high-frequency traffic. Time series databases, document databases, SQL databases, cloud providers’ monitoring services, and other applications can all be data sources for Grafana’s visualizations.

Grafana has become the industry standard because it gathers data from many different data sources, collects the data in one place, and displays it in a unified way. For these reasons, Grafana’s usage has increased, and DevOps teams prefer it over the available individual monitoring and logging user interfaces.

The Grafana edition’s ability to measure mean time to repair (MTTR) is another reason for its popularity. MTTR is a key metric for production teams to measure DevOps efficiency and productivity. In order to maintain high SLA and meet high service standards, a team needs to be able to continuously monitor, identify, and instantly act upon incidents. This is critical to ensuring that the recovery process can start right away. Using the right graphs in the Grafana version gives teams the ability to keep track of each dedicated service and the system as a whole. That said, the team still needs to know what to track, and this is where Prometheus comes in.

What Is Prometheus?

Prometheus is a tool that every DevOps professional should be familiar with. It’s an open-source system for monitoring services and alerts based on a time series data model. Prometheus collects data and metrics from different services and stores them according to a unique identifier—the metric name—and a time stamp. This storage system allows Prometheus to quickly query metrics and provide data sets that can be easily manipulated for visualization. Labels are another aspect of Prometheus that enables its dimensional data model. Labels and metrics can combine to identify a certain dimension of a specific metric and extract it. This makes querying more precise and efficient.

Unlike other monitoring tools which communicate with an agent deployed on the monitored and measured service’s host, Prometheus uses exporters. In order to employ Prometheus, users must either instrument their code to implement Prometheus’ metric types or have the monitored service push their metrics to the relevant exporter if the code cannot be changed. The exporter compiles the log entries to a Prometheus metric and sends this compilation to the Prometheus server.

Prometheus’s query language, PromQL, facilitates metrics acquisition and allows other tools—like Grafana—to capture data.

Prometheus (together with Kubernetes) has been adopted by the CNCF, its new official owner. It has a long list of exporters, making it possible to collect metrics for almost every available software. Databases, http servers, other monitoring systems, and even issue trackers or continuous integration tools can all be monitored using Prometheus. Kubernetes itself can be a data source for Prometheus when the Prometheus Operator is used.

According to CoreOS.com documentation, Prometheus Operator makes “running Prometheus on top of Kubernetes as easy as possible...”

Every day, new exporters are created. It seems the dev community is uniting around Prometheu, determined to continue investing in it.

A Winning Integration

Every DevOps professional wants the following features from a monitoring system:

Deployment simplicity,
Minimal code intrusion,
High-value ROI, and
Low maintenance effort.

When put together, Grafana and Prometheus (as a monitoring backend and a UI) provide all these capabilities.

Deployment of either tool is simple. They both have Docker images, helm charts, and other easy ways of being deployed. The configuration steps are relatively quick to execute, and both tools work together out of the box. Prometheus queries are easily defined, and you can use template variables to dynamically change values in your dashboards.

Simple Grafana configuration for querying Prometheus

There are also some hosted options available on the market, including the recently released Infrastructure Monitoring by Logz.io, which offers an easy correlation between metrics and logs within the same user interface.

Grafana was built to support the time-series data model that Prometheus is based on. Therefore, it is the ideal tool to visualize the metrics Prometheus provides. Prometheus is designed for working with modern technologies like Kubernetes, serverless architecture, and microservices. As a result, it can provide the kind of data DevOps staff need to maintain a high-availability production environment.

Grafana comes with the ability to upload ready-made dashboards for use with each of these modern technologies. Additionally, the user community at large has developed dashboards with many visualizations for a variety of use cases related to these technologies. Preconfigured dashboards work with Prometheus servers and provide valuable information for DevOps teams from the moment they enter production.

Metamonitoring: Monitor the Monitoring Systems

Another way to employ the Grafana-Prometheus combination is by monitoring the monitoring tools themselves. Prometheus is an excellent metrics collector, but when in use to monitor an application, you must also keep track of the other systems. Those systems monitor the application, track their users’ experiences, and constantly ensure that all the monitoring services a work.

For log management, most DevOps teams use the ELK Stack, Splunk, or one of the many other logging systems that can help with root cause analysis. For continual validation of the application’s availability, tools like Pingdom and Uptime Robot are very popular.

You have to validate the availability and performance of these tools or services as well, as they are part of the production environment’s mission-critical stack. With that in mind, both Grafana and Prometheus come with built-in solutions for these tools. Prometheus has exporters that collect relevant metrics, and Grafana visualizes those metrics in a variety of dashboards.

Uptime robot metrics visualization with Grafana and Prometheus

Grafana, through its template variables feature, allows you to work with different Prometheus servers by simply switching between them in the dashboard view.

Here, one centralized Grafana instance pulls data from several Prometheus servers and displays each’s respective data.

Alerts

Both tools have an alerting module. However, the UI of Prometheus’ alert manager doesn’t meet the needs of most DevOps and production teams. Grafana provides a more straightforward solution to this problem. Whenever you create a table and query the label “alerts,” all of the alerts that Prometheus offers can display in a single pane.

Grafana alerts can display alongside Prometheus alerts, since they close gaps in the Prometheus alerting module.

The software also has many integrations with collaboration tools. In one of its latest versions, a new feature allows a user to send an image of a graph representing the specific generated alert.

Grafana renders the panel associated with the alert rule as a PNG image and includes it in the notification. This way, all of the collaboration systems can easily display the image.

Monitoring the Monitoring Tools

Just like other applications and systems, Grafana and Prometheus are not fail-safe. The monitoring state of mind requires DevOps to keep all production services available. Since both Grafana and Prometheus are production-supporting systems, they also need monitoring.

In the case of large-scale applications, DevOps teams have already invested significant effort in production maintenance. Requiring them to also maintain a large-scale monitoring stack is rarely worth the financial and energetic investment, since Grafana can easily be outsourced to a managed solution.

Doing so creates confidence and peace of mind for DevOps teams while providing the same visibility and functionality as a self-managed system.

Conclusion

Grafana and Prometheus work so well together that tools like RabbitMQ now have built-in support for both. Both continue to develop new features and capabilities. Likewise, the community extends both, making the future of this partnership very bright.

Every DevOps and SRE team that chooses to implement Prometheus and Grafana together will gain from their integration. Anticipated metrics for newly supported tools, measurements, visualizations, and dashboards promise to make this pairing even more effective.