Have you ever had trouble working with Elasticsearch clusters? You’re not alone. In this post, I will discuss a problem I’ve encountered working with large Elasticsearch clusters and how I solved it. I will share a lot of knowhow on major technical Elasticsearch concepts, some diagrams for illustration, and of course a cool solution! In particular, I will go into Elasticsearch nodes, indices, and shards.
Elasticsearch Clusters: What to Know
You already understand that Elasticsearch is a distributed search and analytics engine. Before we begin, we should review how Elasticsearch organizes its data—indices, nodes and shards.
- Nodes: Elasticsearch nodes consist of three types: 1) master nodes, 2) client nodes, and 3) data nodes. All the data is in the data nodes, so I will be referring to them exclusively .
- Indices: The logical data unit in Elasticsearch is called an index. You can think of it as the ‘database’ in a relational database.
- Shards: The data of each index is stored in shards. You can think of it as the ‘table’ in a relational database. Every index has at least one primary shard, and it should also include replica shards.
Our Use Case
We will walk through what is a typical use case, at this point. Even though it’s the norm, let’s just illustrate the details so we can keep the scenario straight in our heads:
We have customers. They just send data and we take care of all the other details. Each customer has its own index.
The number of shards per index is set according to the index’s expected size. The actual index size depends on how much data the customer sends us.
We create a whole new set of indices everyday, and we are also versioning the indices. Index examples include:
We always store data in today’s index with the latest version, so these indices are active while the old ones are inactive. Expired indices are deleted daily.
Balancing Elasticsearch Clusters
This is how a typical cluster will look:
Now, let’s say that our data nodes are almost full, so we want to add an additional one. What will happen?
First, the new data nodes will be empty:
Then, Elasticsearch will try to spread the shards evenly between the data nodes.
Since moving a shard full of data takes time, their distribution will happen gradually.
So great, Elasticsearch will move some shards around until the cluster is balanced, right?
Elasticsearch takes into account two factors before shards and nodes could be said to be in balance within a cluster:
- A primary shard and a replica shard will never be on the same data node, and
- The number of shards on all the data nodes should be equal.
Elasticsearch does not take into account two other important factors:
- The size of the shards—they are not equal! And,
- Which shards belong to active indices. These shards are open to read and write operations, while the shards of inactive indices are only open to read operations. This means that the more active shards a data node has, the harder it works.
So, the cluster actually looks more like this:
Now, how the heck do we balance that?!
Well, I will tell you a secret: we already have a very cool algorithm that balances the indices’ newly created shards on the cluster.So, if I just wait until midnight, all the active shards will be magically spread all over the cluster, and if I just wait until all the existing indices expire and are replaced by new, magically spread ones, my cluster will be perfectly balanced.
So, why am I writing this?
Because production waits for no one!
Usually, when we add a data node, it’s because production calls. At that point, we have no time to waste—we need to take the load off the old data nodes immediately.
A bonus challenge comes from the fact that just one overloaded data node is enough for Elasticsearch to start choking and slow down our entire data ingestion pipeline.
So, this cluster better be in balance, and it better get there fast.
Our existing algorithm, which balances future indices, can only move empty shards, but moving non-empty shards takes forever.
So, our existing algorithm can’t move the shards that are already on the cluster.
Our solution was creating the following cheat:
First, for every active index in the cluster, we create a new index version without activating the new index.
For example, if we have the active index
customer1_today’sDate_v2, we will create
customer1_today’sDate_v2 will remain the active one.
Secondly, we use our existing algorithm to balance all the newly created indices—since they are new and not active, their shards are still empty.
Thirdly and lastly—only after the algorithm is done—activate the new indices versions so they will start collecting incoming data.
How does it work?
Now, all the active shards are balanced over the data nodes. In other words, the new node is fully participating in write operations.
Additionally, since the write operations are a lot heavier than the read operations, the new node is actually taking on a lot of the load from the other nodes.
Finally, this is enough to stabilize the cluster. We can relax and let the old shards be balanced by Elasticsearch and replaced over time.
Elasticsearch is not easy to operate at scale. At Logz.io, we manage dozens of clusters with enormous amounts of data. We encounter new issues everyday, and we design and implement new solutions to every new problem.