Elasticsearch Tutorial: Creating an Elasticsearch cluster

By: Daniel Berman

Creating an Elasticsearch Cluster: Getting Started

Unless you are using Elasticsearch for development and testing, creating and maintaining an Elasticsearch cluster will be a task that will occupy quite a lot of your time. Elasticsearch is an extremely powerful search and analysis engine, and part of this power lies in the ability to scale it for better performance and stability.

This tutorial will provide some information on how to set up an Elasticsearch cluster, and will add some operational tips and best practices to help you get started. It should be stressed though that each Elasticsearch setup will likely differ from one another depending on multiple factors, including the workload on the servers, the amount of indexed data, hardware specifications, and even the experience of the operators.

As you’ll see in this article, there is a lot to consider as you’re configuring your Elasticsearch clusters. And as data volumes grow, it can take considerable effort to manage, monitor, and maintain these clusters at scale.

For this reason, Logz.io Log Management delivers OpenSearch (the new open source version of Elasticsearch) as a service – so you can use the leading open source logging platform, without configuring or managing the data infrastructure yourself. Logz.io also builds capabilities like metrics monitoring, alerting, and RBAC around OpenSearch. Learn about the way Logz.io unifies and enhances the leading open source observability technologies here.

All that said, if you prefer to build your own Elasticsearch clusters, let’s go through the steps.

What is an Elasticsearch cluster?

As the name implies, an Elasticsearch cluster is a group of one or more Elasticsearch nodes instances that are connected together. The power of an Elasticsearch cluster lies in the distribution of tasks, searching and indexing, across all the nodes in the cluster.

The nodes in the Elasticsearch cluster can be assigned different jobs or responsibilities:

Data nodes — stores data and executes data-related operations such as search and aggregation
Master nodes — in charge of cluster-wide management and configuration actions such as adding and removing nodes
Client nodes — forwards cluster requests to the master node and data-related requests to data nodes
Ingest nodes — for pre-processing documents before indexing
*Note: Tribe nodes, which were similar to cross-cluster or federated nodes, were deprecated with Elasticsearch 5.4

By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment.

When installed, a single Elasticsearch node will form a new single-node cluster entitled “elasticsearch,” but as we shall see later on in this article it can also be configured to join an existing cluster using the cluster name. Needless to say, these nodes need to be able to identify each other to be able to connect.

Installing an Elasticsearch Cluster

As always, there are multiple ways of setting up an Elasticsearch cluster. You can use a configuration management tool such as Puppet or Ansible to automate the process. In this case, though, we will be showing you how to manually set up a cluster consisting of one master node and two data nodes, all on Ubuntu 16.04 instances on AWS EC2 running in the same VPC. The security group was configured to enable access from anywhere using SSH and TCP 5601 (Kibana).

Installing Java

Elasticsearch is built on Java and requires at least Java 8 (1.8.0_131 or later) to run. Our first step, therefore, is to install Java 8 on all the nodes in the cluster. Please note that the same version should be installed on all Elasticsearch nodes in the cluster.

Repeat the following steps on all the servers designated for your cluster.

First, update your system:

sudo apt-get update

Then, install Java with:

sudo apt-get install default-jre

Checking your Java version now should give you the following output or similar:

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

Installing Elasticsearch nodes

Our next step is to install Elasticsearch. As before, repeat the steps in this section on all your servers.

First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

For Debian, we need to then install the apt-transport-https package:

sudo apt-get install apt-transport-https

The next step is to add the repository definition to your system:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

All that’s left to do is to update your repositories and install Elasticsearch:

sudo apt-get update 
sudo apt-get install elasticsearch

Configuring the Elasticsearch cluster

Our next step is to set up the cluster so that the nodes can connect and communicate with each other.

For each node, open the Elasticsearch configuration file:

sudo vim /etc/elasticsearch/elasticsearch.yml

This file is quite long, and contains multiple settings for different sections. Browse through the file, and enter the following configurations (replace the IPs with your node IPs):

#give your cluster a name.
cluster.name: my-cluster

#give your nodes a name (change node number from node to node).
node.name: "es-node-1"

#define node 1 as master-eligible:
node.master: true

#define nodes 2 and 3 as data nodes:
node.data: true

#enter the private IP and port of your node:
network.host: 172.11.61.27
http.port: 9200

#detail the private IPs of your nodes:
discovery.zen.ping.unicast.hosts: ["172.11.61.27", "172.31.22.131","172.31.32.221"]

Save and exit.

Running your Elasticsearch cluster

You are now ready to start your Elasticsearch nodes and verify they are communicating with each other as a cluster.

For each instance, run the following command:

sudo service elasticsearch start

If everything was configured correctly, your Elasticsearch cluster should be up and running. To verify everything is working as expected, query Elasticsearch from any of the cluster nodes:

curl -XGET 'http://localhost:9200/_cluster/state?pretty'

The response should detail the cluster and its nodes:

{
  "cluster_name" : "my-cluster",
  "compressed_size_in_bytes" : 351,
  "version" : 4,
  "state_uuid" : "3LSnpinFQbCDHnsFv-Z8nw",
  "master_node" : "IwEK2o1-Ss6mtx50MripkA",
  "blocks" : { },
  "nodes" : {
    "IwEK2o1-Ss6mtx50MripkA" : {
      "name" : "es-node-2",
      "ephemeral_id" : "x9kUrr0yRh--3G0ckESsEA",
      "transport_address" : "172.31.50.123:9300",
      "attributes" : { }
    },
    "txM57a42Q0Ggayo4g7-pSg" : {
      "name" : "es-node-1",
      "ephemeral_id" : "Q370o4FLQ4yKPX4_rOIlYQ",
      "transport_address" : "172.31.62.172:9300",
      "attributes" : { }
    },
    "6YNZvQW6QYO-DX31uIvaBg" : {
      "name" : "es-node-3",
      "ephemeral_id" : "mH034-P0Sku6Vr1DXBOQ5A",
      "transport_address" : "172.31.52.220:9300",
      "attributes" : { }
    }
  },
 …

Elasticsearch cluster configurations for production

We already defined the different roles for the nodes in our cluster, but there are some additional recommended settings for a cluster running in a production environment.

Avoiding “Split Brain”

A “split-brain” situation is when communication between nodes in the cluster fails due to either a network failure or an internal failure with one of the nodes. In this kind of scenario, more than one node might believe it is the master node, leading to a state of data inconsistency.

For avoiding this situation, we can make changes to the discovery.zen.minimum_master_nodes directive in the Elasticsearch configuration file which determines how many nodes need to be in communication (quorum) to elect a master.

A best practice to determine this number is to use the following formula to decide this number: N/2 + 1. N is the number of master eligible nodes in the cluster. You then round down the result to the nearest integer.

In the case of a cluster with three nodes, then:

discovery.zen.minimum_master_nodes: 2

Adjusting JVM heap size

To ensure Elasticsearch has enough operational leeway, the default JVM heap size (min/max 1 GB) should be adjusted.

As a rule of the thumb, the maximum heap size should be set up to 50% of your RAM, but no more than 32GB (due to Java pointer inefficiency in larger heaps). Elastic also recommends that the value for maximum and minimum heap size be identical.

These value can be configured using the Xmx and Xms settings in the jvm.options file.

On DEB:

sudo vim /etc/elasticsearch/jvm.options

-Xms2g
-Xmx2g

Disabling swapping

Swapping out unused memory is a known behavior but in the context of Elasticsearch can result in disconnects, bad performance and in general — an unstable cluster.

To avoid swapping you can either disable all swapping (recommended if Elasticsearch is the only service running on the server), or you can use mlockall to lock the Elasticsearch process to RAM.

To do this, open the Elasticsearch configuration file on all nodes in the cluster:

sudo vim /etc/elasticsearch/elasticsearch.yml

Uncomment the following line:

bootstrap.mlockall: true

Next, open the /etc/default/elasticsearch file:

sudo vim /etc/default/elasticsearch

Make the following configurations:

MAX_LOCKED_MEMORY=unlimited

Restart Elasticsearch when you’re done.

Adjusting virtual memory

To avoid running out of virtual memory, increase the amount of limits on mmap counts:

sudo vim /etc/sysctl.conf

Update the relevant setting accordingly:

vm.max_map_count=262144

On DEB/RPM, this setting is configured automatically.

Increasing open file descriptor limit

Another important configuration is the limit of open file descriptors. Since Elasticsearch makes use of a large amount of file descriptors, you must ensure the defined limit is enough otherwise you might end up losing data.

The common recommendation for this setting is 65,536 and higher. On DEB/RPM the default settings are already configured to suit this requirement but you can of course fine tune it.

sudo vim  /etc/security/limits.conf

Set the limit:

  - nofile 65536

Elasticsearch Cluster APIs

Elasticsearch supports a large number of cluster-specific API operations that allow you to manage and monitor your Elasticsearch cluster. Most of the APIs allow you to define which Elasticsearch node to call using either the internal node ID, its name or its address.

Below is a list of a few of the more basic API operations you can use. For advanced usage of cluster APIs, read this blog post.

Cluster Health

This API can be used to see general info on the cluster and gauge its health:

curl -XGET 'localhost:9200/_cluster/health?pretty'

Response:

{
  "cluster_name" : "my-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Cluster State

This API can be sued to see a detailed status report on your entire cluster. You can filter results by specifying parameters in the call URL.

curl -XGET 'localhost:9200/_cluster/state?pretty'

Response:

{
  "cluster_name" : "my-cluster",
  "compressed_size_in_bytes" : 347,
  "version" : 4,
  "state_uuid" : "uMi5OBtAS8SSRJ9hw1-gUg",
  "master_node" : "sqT_y5ENQ9SdjHiE0oco_g",
  "blocks" : { },
  "nodes" : {
    "sqT_y5ENQ9SdjHiE0oco_g" : {
      "name" : "node-1",
      "ephemeral_id" : "-HDzovR0S0e-Nn8XJ-GWPA",
      "transport_address" : "172.31.56.131:9300",
      "attributes" : { }
    },
    "mO0d0hYiS1uB--NoWuWyHg" : {
      "name" : "node-3",
      "ephemeral_id" : "LXjx86Q5TrmefDoq06MY1A",
      "transport_address" : "172.31.58.61:9300",
      "attributes" : { }
    },
    "it1V-5bGT9yQh19d8aAO0g" : {
      "name" : "node-2",
      "ephemeral_id" : "lCJja_QtTYauP3xEWg5NBQ",
      "transport_address" : "172.31.62.65:9300",
      "attributes" : { }
    }
  },
  "metadata" : {
    "cluster_uuid" : "8AqSmmKdQgmRVPsVxyxKrw",
    "templates" : { },
    "indices" : { },
    "index-graveyard" : {
      "tombstones" : [ ]
    }
  },
  "routing_table" : {
    "indices" : { }
  },
  "routing_nodes" : {
    "unassigned" : [ ],
    "nodes" : {
      "it1V-5bGT9yQh19d8aAO0g" : [ ],
      "sqT_y5ENQ9SdjHiE0oco_g" : [ ],
      "mO0d0hYiS1uB--NoWuWyHg" : [ ]
    }
  },
  "snapshots" : {
    "snapshots" : [ ]
  },
  "restore" : {
    "snapshots" : [ ]
  },
  "snapshot_deletions" : {
    "snapshot_deletions" : [ ]
  }
}

Cluster Stats

Extremely useful for monitoring performance metrics on your entire cluster:

curl -XGET 'localhost:9200/_cluster/stats?human&pretty'

Response:

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "my-cluster",
  "timestamp" : 1517224098451,
  "status" : "green",
  "indices" : {
    "count" : 0,
    "shards" : { },
    "docs" : {
      "count" : 0,
      "deleted" : 0
    },
    "store" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "0b",
      "memory_size_in_bytes" : 0,
      "total_count" : 0,
      "hit_count" : 0,
      "miss_count" : 0,
      "cache_size" : 0,
      "cache_count" : 0,
      "evictions" : 0
    },
    "completion" : {
      "size" : "0b",
      "size_in_bytes" : 0
    },
    "segments" : {
      "count" : 0,
      "memory" : "0b",
      "memory_in_bytes" : 0,
      "terms_memory" : "0b",
      "terms_memory_in_bytes" : 0,
      "stored_fields_memory" : "0b",
      "stored_fields_memory_in_bytes" : 0,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "0b",
      "norms_memory_in_bytes" : 0,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "0b",
      "doc_values_memory_in_bytes" : 0,
      "index_writer_memory" : "0b",
      "index_writer_memory_in_bytes" : 0,
      "version_map_memory" : "0b",
      "version_map_memory_in_bytes" : 0,
      "fixed_bit_set" : "0b",
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : -9223372036854775808,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "data" : 3,
      "coordinating_only" : 0,
      "master" : 3,
      "ingest" : 3
    },
    "versions" : [
      "6.1.2"
    ],
    "os" : {
      "available_processors" : 3,
      "allocated_processors" : 3,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "mem" : {
        "total" : "10.4gb",
        "total_in_bytes" : 11247157248,
        "free" : "4.5gb",
        "free_in_bytes" : 4915200000,
        "used" : "5.8gb",
        "used_in_bytes" : 6331957248,
        "free_percent" : 44,
        "used_percent" : 56
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 10
      },
      "open_file_descriptors" : {
        "min" : 177,
        "max" : 178,
        "avg" : 177
      }
    },
    "jvm" : {
      "max_uptime" : "6m",
      "max_uptime_in_millis" : 361766,
      "versions" : [
        {
          "version" : "1.8.0_151",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "25.151-b12",
          "vm_vendor" : "Oracle Corporation",
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used" : "252.1mb",
        "heap_used_in_bytes" : 264450008,
        "heap_max" : "2.9gb",
        "heap_max_in_bytes" : 3195076608
      },
      "threads" : 63
    },
    "fs" : {
      "total" : "23.2gb",
      "total_in_bytes" : 24962703360,
      "free" : "19.4gb",
      "free_in_bytes" : 20908818432,
      "available" : "18.2gb",
      "available_in_bytes" : 19570003968
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "netty4" : 3
      },
      "http_types" : {
        "netty4" : 3
      }
    }
  }
}

You can also target specific groups of nodes with node filters.

Nodes Stats

If you want to inspect metrics for specific nodes in the cluster, use this API. You can see info for all nodes, a specific node, or ask to see only index or OS/process specific stats.

All nodes:

curl -XGET 'localhost:9200/_nodes/stats?pretty'

A specific node:

curl -XGET 'localhost:9200/_nodes/node-1/stats?pretty'

Index-only stats:

curl -XGET 'localhost:9200/_nodes/stats/indices?pretty'

You can get any of the specific metrics for any single node with the following structure:

curl -XGET 'localhost:9200/_nodes/stats/ingest?pretty'

Or multiple nodes with the following structure:

curl -XGET 'localhost:9200/_nodes/stats/ingest,fs?pretty'

Or all metrics with either of these two formats:

curl -XGET 'localhost:9200/_nodes/stats/_all?pretty'

curl -XGET 'localhost:9200/_nodes/stats?metric=_all?pretty'

Nodes Info

If you want to collect information on any or all of your cluster nodes, use this API.

Retrieve for a single node:

curl -XGET ‘localhost:9200/_nodes/?pretty’

Or multiple nodes:

curl -XGET ‘localhost:9200/_nodes/node1,node2?pretty’

Retrieve data on plugins or ingest:

curl -XGET ‘localhost:9200/_nodes/plugins

curl -XGET ‘localhost:9200/_nodes/ingest

Information about ingest processors should appear like this (with many more than the three types shown in the example):

{
  "_nodes": …
  "cluster_name": "elasticsearch",
  "nodes": {
   "toTaLLyran60m5amp13": {
	"ingest": {
	   "processors": [
	     {
	       "type": "uppercase"
	     },
	     {
	       "type": "lowercase"
	     },
	     {
	       "type": "append"
	     }
	   ]
	 }
    }
  }
}

Pending Cluster Tasks

This API tracks changes at the cluster level, including but not limited to updated mapping, failed sharding, and index creation.

The following GET should return a list of tasks:

curl -XGET ‘localhost:9200/_cluster/pending_tasks?pretty’

Task Management

Similar to the Pending Cluster Tasks API, the Task Management API will get data on currently running tasks on respective nodes.

To get info on all currently executing tasks, enter:

curl -XGET "localhost:9200/_tasks

To get current tasks by specific nodes, AND additionally cluster-related tasks, enter the node names as such and then append &actions to the GET:

curl -XGET ‘localhost:9200/_tasks?nodes=node1,node2&actions=cluster:*&pretty’

Retrieve info about a specific task (or its child tasks) by entering _tasks/ and then the task’s individual ID:

curl -XGET ‘localhost:9200/_tasks/43r315an3xamp13’

And for child tasks:

curl -XGET ‘localhost:9200/_tasks?parent_task_id=43r315an3xamp13’

This API also supports reindexing, search, task grouping and task canceling.

Remote Cluster Info

Get remote cluster info with:

curl -XGET 'localhost:9200/_remote/info?pretty'

Voting Configuration Exclusions

This will remove master-eligible nodes.
Remove all exclusions by:

curl -X DELETE ‘localhost:9200/_cluster/voting_config_exclusions?pretty’

Or add a node to the exclusions list:

curl -X POST ‘localhost:9200/_cluster/voting_config_exclusions/node1?pretty’

What next?

“When all else fails, read the fuc%^&* manual” goes the famous saying. Thing is, the manual in question, and the technology it documents, are not straightforward to say the least.