elasticsearch cheat sheet

If you have worked with Elasticsearch (whether as part of the ELK Stack or not), I’m sure you know how awesome of a product it is. But the problem is that it can go from awesome to frustrating in less than a minute. Really, really frustrating.

That’s why we at log analytics platform Logz.io have compiled a cheat sheet of API calls to solve the problems that we have frequently encountered among our customers. We’re sure that they’re not the only ones!

Instead of going through Elasticsearch’s documentation yet another time or trying to find the solutions in random Stack Overflow answers, just save this in your favorites and visit in a time of need.

What is a time of need? Oh, you’ll know it when it comes.

A brief introduction to Elasticsearch APIs

Elasticsearch has an extensive set of APIs that you can query or change at runtime. Each API call has a context, which is usually “cluster,” “node,” or “index.” That means that some APIs change things cluster-wide, some are only for a specific node, and some are for a specific index.

When changing cluster settings, you have two options:

  • Transient – Changes that will not persist after a full cluster restart
  • Persistent – Changes that will be saved after a full cluster restart

Most of the changes below will be transient. If you them to be persistent, change them in your Elasticsearch configuration (and make them transient in the meantime).

Important note: Transient changes will be persistent when a node restarts or a new node joins a cluster. The master node will sync the changes for you automatically.

Another important note: Transient changes have precedence over persistent changes, which have precedence over the configuration file. Just keep that in mind.

Whenever you change transient settings, make sure that you revert them back to your prior configurations. Some changes can exhaust clusters and should be used only to help with recoveries.

The cheat sheet

Move shards from one node to another

When to do it: When too many hot shards reside in one data node and you want to spread them out manually. Elasticsearch does not take these things into consideration when placing shards across the cluster, so sometimes it is necessary to move them manually.

The cURL command:

Force the allocation of an unassigned shard with a reason

When to do it: Sometimes you have unassigned shards in a cluster, but you just cannot figure out why. It can have plenty of causes such as lack of space or shard awareness. This will output a lot of verbose data. If you look at the end of the output, you will see the reason for the non-allocation. (Note: The “?explain” part can be also applied to the previous curl to get a reason if you cannot move a shard around.)

The cURL command:

Remove nodes from clusters gracefully

When to do it: When you want to decommission a node or perform any type of maintenance without the cluster turning yellow or red (depending on your replicas settings). Note: If you drain a node and want to return it to the cluster afterward, you need to call that endpoint again with the IP field blank.

The cURL command:

Force a synced flush

When to do it: Before you restart a node that you are not gracefully removing from the cluster. This will place a sync ID on all indices, and as long as you are not writing to them, the recovery time of those shards will be significantly faster.

The cURL command:

Change the number of moving shards to balance the cluster

When to do it: Setting the cURL command (see below) to 0 will be useful if you have a planned maintenance and do not want the cluster to start to move shards under your feet. Setting a higher value will help to rebalance the cluster when a new node joins it.

The cURL command:

Change the number of shards being recovered simultaneously per node

When to do it: If a node has been disconnected from the cluster, all of its shards will be unassigned. After a certain delay, the shards will be allocated somewhere else. The number of concurrent shards per node that will be recovered is determined by that setting.

The cURL command:

Change the recovery speed

When to do it: To avoid overloading the cluster, Elasticsearch limits the speed that is allocated to recovery. You can carefully change that setting to make it recover more quickly.

The cURL command:

Change the number of concurrent streams for a recovery on a single node

When to do it: If a node has failed and you want to speed up recovery, you can increase this setting. Make sure to monitor the cluster so that you will not end uploading it too much.

The cURL command:

Change the size of the search queue

When to do it: If your cluster is loaded and takes too much time to answer search queries, you can carefully increase that setting so that you will not drop searches. (If you see an increase in the “rejected” metric for any queue, this recommendation is applicable.)

The cURL command:

Clear the cache on a node

When to do it: If a node reaches a high JVM value, you can call that API as an immediate action on a node level to make Elasticsearch drop caches. It will hurt performance, but it can save you from OOM (Out Of Memory).

The cURL command:

Adjust the circuit breakers

When to do it: To avoid not getting to OOM in Elasticsearch, you can tweak the settings on the circuit breakers. This will limit the search memory and drop all searches that are estimated to consume more memory than that desired level. You can read more about that here. Note: This is a really delicate setting that you need to calibrate carefully.

The cURL command:

These are solutions to only a few of the problems that we have frequently seen among our customers. What are your favorite Elasticsearch quick tips in times of trouble? Feel free to add more in the comments below.

 

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack -- Elasticsearch, Logstash, and Kibana