How to “Translate” Grafana Dashboards from Prometheus to Elasticsearch

Migration from Prometheus to Elasticsearch

In the field of open-source metrics and time series monitoring, it is quite clear today that Grafana is the most popular tool of choice. One of Grafana’s main advantages is its storage backend flexibility. It can support almost all the major time series datastores (Prometheus, InfluxDB, Elasticsearch, Graphite etc.), when each datastore has its own query language syntax, and slight differences in the actual Grafana UI and capabilities resulting from these differences.

Today, Prometheus is the most popular time series OSS Grafana data store that the community uses. Some of its main advantages are its easy setup, its pull-based metrics system, and of course a large community that contributes out-of-the-box content for other users to have. This isn’t to say that Prometheus doesn’t have its drawbacks, though. The first is that Prometheus is not scalable enough for large organizations. The second is that Prometheus doesn’t provide a durable long-term storage, and requires a separate data store for that.

However, there’s another option. An interesting choice, perhaps a counterintuitive one, for a powerful Grafana datastore is Elasticsearch. Elasticsearch offers exactly what Prometheus can’t offer. It is highly scalable and offers great long-term storage. Elasticsearch certainly does have its own disadvantages, but it’s up to you to decide which datastore is most suitable for your use case.

Looking for a versatile logging and monitoring solution? Try Logz.io Infrastructure Monitoring!

If you have just started using Elasticsearch as the monitoring datastore in your Grafana, and you’re wondering if there’s a way to “translate” Prometheus-based Grafana dashboards you once had, or you saw on the web – you came to the right place. As long as the desired service you want to monitor is a supported Metricbeat module, you can absolutely do it. In this post, I will give you the main principles for that.

Migrating Dashboard Content

No matter which datastore an OSS Grafana dashboard is built over, they all share the same structure. That is, aside from the query languages’ syntactical differences. Prometheus uses its specialized PromQL language; Elasticsearch uses its own Lucene-based Query DSL.

This means that a simple dive into a dashboard’s JSON file will uncover all its features and content. In our case, the ability to read and understand a PromQL dashboard’s JSON, and then to “translate” it to a Lucene query language-based Elasticsearch dashboard, can unlock a large pool of community-based dashboards.

It is worth mentioning that some query languages have inherent capabilities that other query languages lack. For example, Prometheus’ capability to perform a function over another function (max on sum, for example) doesn’t exist in Elasticsearch’s Lucene. On the other hand, I have yet to encounter a case in which such differences kept me from translating specific content from one to the other. Furthermore, in many cases, Elasticsearch’s metrics are more refined and polished. They can already come as percentages, or as core units etc., which makes life much easier in many cases.

Here are some basic tips for how to “translate” Prometheus dashboards into Elasticsearch dashboards:

Prerequisites

  • First, in order to have the most convenient viewing options, open the Prometheus dashboard’s JSON in a code editor (Sublime Text, Atom or any other editor of your choice).
  • Second, I suggest having the Metricbeat modules metrics list ready, so we can find the Elasticsearch metric equivalent for any given Prometheus metric we encounter. Here you can find the link to the list of Metricbeat’s modules’ metrics. The metrics will group by parent module.

Variables

Now we’re ready to dive into the dashboard’s JSON itself. In order to have the best understanding of the queries behind the visualizations in the dashboard, it’s best we start with understanding dashboard variables. A simple search for the word ‘templating’ will take us to the JSON section that holds all the dashboard’s variables.

In this section, if we look under the ‘query’ field, we can see what variables are in the dashboard, and to which data they refer (each query defines a single variable).

Let’s look at an example:

‘query’ field

‘query’ field

In this Kubernetes dashboard JSON’s templating section, under the ‘query’ field, we can see a ‘Node’ variable that derives its data from the PromQL label ‘kubernetes_io_hostname’. This is clearly a variable that filters Kubernetes nodes, so we know now that we too should create a ‘Node’ variable in our dashboard.

A quick look in our Metricbeat Kubernetes list of exported fields shows that the equivalent field is ‘kubernetes.node.name’. It should look like this:

kubernetes.node.name

kubernetes.node.name

Follow the same steps for all other PromQL variables in your source dashboard’s JSON.

Query Editor

After figuring out the basics of PromQL variable syntax, we can safely look into where our monitoring actually happens. In OSS Grafana’s dashboards’ JSON, we should look for the ‘targets’ section. When using Elasticsearch datasource, you will find the equivalent for that under OSS Grafana’s Query Editor, which includes all the queries of a specific visualization.

In PromQL JSON, each query has its own “mini section.”

This section includes the function (performed on the metric,e.g. sum or count) and/or metric name. Additionally, it might have the visualization’s query, group by, or interval and mathematical actions using the metric. This “mini section” is found under the phrase ‘expr’. Let’s look on an example that will help us understand the structure of this phrase:

‘expr’ in PromQL JSON

‘expr’ in PromQL JSON

Here we can see a query block that presents the sum of the metric ‘kube_node_status_allocatable_memory_bytes’. The query in this query block simply presents data according to the selected node variable value (it’s the phrase {node=~\”^$Node$\”}). At the end, we can see that visualization is also grouped by the node name. We can even see below that this query block has an alias in the ‘legend format’ phrase. This alias adds the phrase ‘avlbl’ before each node name (the full phrase is avlbl: {{ node }}).

Looking in our list of Metricbeat exported fields shows that the equivalent metric we should use in this case is ‘kubernetes.node.memory.allocatable.bytes’. This means that the equivalent Elasticsearch query editor settings should look like this:

‘kubernetes.node.memory.allocatable.bytes’

‘kubernetes.node.memory.allocatable.bytes’

Now let’s look on a more complex example:

Complex 'expr' query in PromQL

Complex ‘expr’ query in PromQL

PromQL Query Calculations

This time we can see that there are calculations in the query. These calculations create a “new metric” (as we can see in the ‘title’ field, which states “Cluster CPU Usage”) that Kubernetes does not provide in Prometheus. Those metrics might include the usage, by percentage, of the CPU by the cluster.

In order to do so, the total number of pods’ requested CPU cores was divided by the total number of allocatable nodes CPU cores. Then, it was multiplied by 100 (to present the result as percentage on a scale of 0-100).

Again, the first thing we should do when we want to understand how to “translate” this query into Elasticsearch-based Grafana is to look at the list of exported fields. Many times we’ll see that there’s no need for such calculations in Elasticsearch, because many metrics already arrive in the form of percentages of the total, percentages of the limit, as derivatives, etc.

Exported Fields

In this particular case we will need to perform a calculation of our own. A quick look at the exported fields give us good candidates for that: the ‘kubernetes.node.cpu.usage.nanocores’ and ‘kubernetes.node.cpu.capacity.cores’ metrics. In order to recreate this “new metric” ourselves, we’ll need to get familiar with the ‘Bucket Script’ function. The Bucket Script function is a powerful tool that lets us perform calculations on the metrics. In our case, we’ll need to do the following steps:

  1. Sum each metric separately
  2. Add the ‘Bucket Script’ function and set a usage parameter for the metric kubernetes.node.cpu.usage.nanocores’, and a capacity parameter for the metric kubernetes.node.cpu.capacity.cores’
  3. Divide parameter usage by a billion (in order to convert its values from nanocores to cores, so its values will be in the same scale as kubernetes.node.cpu.capacity.cores’), then divide it by parameter capacity, and then multiple the result by the 100 (to present it on a scale of 0-100)
  4. Hide the original Sum metrics, so that only the bucket script result will be shown. This is done by clicking the eye sign on the left hand side of the metric row, just right of “Metric”:
  5. Voilà!
Cluster CPU Usage from Prometheus in Grafana

Cluster CPU Usage from Prometheus in Grafana

Conclusion

Migrating Prometheus-based Grafana content to Elasticsearch-based Grafana is definitely a doable task. If you know where to look, and can follow some basic steps and principles, you’ll do just fine. These are main steps you should follow:

  • Open JSON file of the Prometheus-based Grafana dashboard you want to migrate
  • Try to understand what variables it uses (under ‘templating’ section)
  • See what queries this dashboard includes (under ‘targets’ section) and create their equivalent in your Elasticsearch-based Grafana
  • Rely on Metricbeat’s documentation to help you with fields and metrics names translation

If you wish to start experimenting with monitoring dashboards based on open source grafana, with Elasticsearch backend storage and without the hassle of setting it up, check out Logz.io’s Infrastructure Monitoring managed service.

Looking for a versatile logging and monitoring solution? Try Logz.io Infrastructure Monitoring!

Stay updated with us!

By submitting this form, you are accepting our Terms of Use and our Privacy Policy

Thank you for subscribing!

Internal