Elasticsearch is increasingly being used for analyzing health metrics for a variety of systems. While not designed for this purpose, new capabilities added to Kibana are converting users from other time series analyzers such as Grafana.
But who watches the watcher?
Elasticsearch also needs to be monitored just like any other tool in your stack, and we have discussed using a combination of Graphite and Grafana for this purpose. In this article, though, I’d like to introduce a simple Docker container that we developed for Logz.io users that have their own Elasticsearch deployment they wish to monitor.
This lightweight container exposes metrics that can be used for basic health monitoring of your cluster (if you’re a Nagios fan, the container also provides a Nagios plugin script) and ships them to Logz.io.
So, what data is shipped?
Here’s the complete list of the metrics and fields this container reports to Logz.io:
- Number of initializing shards
- Number of pending tasks
- Description of pending tasks (name, time in queue, urgency, etc..)
- Number of relocating shards
- Number of unassigned shards
- Cluster status (red, yellow, green)
- Mapping size – entire cluster (python length of the cluster state)
- Mapping size – per index (python length of a specific index from cluster state)
- Cluster state version
- JVM heap per node
- Threadpools queue and rejected per node
- Number of docs between readings (to calculate index rate)
Let’s take a look at how to use the container and some visualization examples.
Running the Health Monitor
For the purpose of this article, I set up an Elasticsearch cluster consisting of 1 master node and 2 data nodes. To simulate some load, I installed and started filebeat and metricbeat on the data nodes.
To run the Health Monitor, all you have to do is run the command below.
There are some running options available as well, such as defining a different Elasticsearch port, or setting the interval between each health check. Be sure to check these options out before running the command.
sudo docker run -d \ -e LOGZ_TOKEN="<yourLogzioToken>" \ -e ELASTICSEARCH_ADDR="<ElasticsearchIP>" \ --restart=always \ logzio/logzio-es-health
Be sure to replace the relevant fields in the command above. The Logz.io token can be retrieved from the Settings page in Logz.io.
Once run, you should begin to see data in the Discover tab in Kibana:
Monitoring performance in Kibana
Using the shipped data, we can now slice and dice the data in any way we want using Kibana’s visualization capabilities. What visualizations you choose to build is up to you of course, but here are a few basic examples.
Simple metric visualizations are useful for displaying single metrics on specific fields. For example, this metric visualization shows the number of nodes in the cluster using the ‘number_of_fields’ field:
A pie chart visualization using the ‘status’ field gives us a breakdown of the different statuses of the nodes in our cluster:
We can use a bar chart to view a breakdown, over time, of the heap usage of the different nodes. To do this, we will use the ‘heap_used_percent’ field, and an x-axis comprised of two aggregations — date histogram and terms:
A data table visualization gives us a list of the largest mappings in our Elasticsearch cluster:
One dashboard to rule them all!
Of course, once you’ve got your visualizations lined up — add ‘em up into one comprehensive dashboard:
To help you guys save time and hit the ground running, this dashboard is available in ELK Apps, Logz.io’s library of pre-made dashboards and visualizations. Just go to ELK Apps and search for ‘Elasticsearch’.
Getting notified on abnormal behavior
As mentioned above, if you’re using Nagios this container supports a Nagios integration as well.
If you’re already shipping the data into Logz.io, however, you might as well make use of the built-in alerting mechanism to get notified in real time when your cluster is misbehaving.
Examples abound, but say you want to be notified if your cluster enters a red state.
Using a basic Kibana query, “status:red”, you can get a notification via email, Slack, or any other messaging application you’re using, in real time.
Another example is getting notified should the average heap size exceed a specific value. Note, in this case, I’m using a grouping option to see the results per Elasticsearch node.
The resulting alert looks as follows:
Summing it up
Elasticsearch monitoring is an art unto itself and there are plenty of tools out there, open source and proprietary that do the job quite well. Some will require additional configuration, most will require additional cost.
If you’re using Logz.io, this Docker container is an easy-to-implement, lightweight solution, that might suffice for your needs. The supplied dashboard makes it a tempting one as well, saving Kibana dashboarding time.
Got any feedback? Monitor Elasticsearch using a different method? We’d love to hear! Just ping us: firstname.lastname@example.org.