What is Elasticsearch: Tutorial for Beginners

By: Gedalyah Reback Daniel Berman

May 28, 2019

An Elasticsearch Tutorial: Getting Started

#Note: Elastic recently announced it would implement closed-source licensing for new versions of Elasticsearch and Kibana beyond Version 7.9. For more details, read our CEO Tomer Levy’s comments on Truly Doubling Down on Open Source.

Elasticsearch is the living heart of what is today’s the most popular log analytics platform — the ELK Stack (Elasticsearch, Logstash and Kibana). Elasticsearch’s role is so central that it has become synonymous with the name of the stack itself. Primarily for search and log analysis, Elasticsearch is one of the most popular database systems available today. This Elasticsearch tutorial provides new users with the prerequisite knowledge and tools to start using Elasticsearch. It includes installation instructions, and initial indexing and data handling instructions.

It’s worth noting that Elasticsearch is no longer an open source component, like it used to be. In January 2021, Elastic announced that Elasticsearch and Kibana (as of the 7.11 release) would move to a proprietary dual license (under the SSPL license) and away from the open source Apache-2.0 license.

This prompted AWS to fork Elasticsearch and Kibana into OpenSearch and OpenSearch Dashboards, which fulfills the same use cases of the ELK Stack under the open source Apache 2.0 license.

Elasticsearch: a Brief Introduction

Initially released in 2010, Elasticsearch (sometimes dubbed ES) is a modern search and analytics engine which is based on Apache Lucene. Its built with Java, Elasticsearch is a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it.

This Elasticsearch tutorial could also be considered a NoSQL tutorial. However, unlike most NoSQL databases, Elasticsearch has a strong focus on search capabilities and features — so much so, in fact, that the easiest way to get data from ES is to search for it using the extensive Elasticsearch API.

In the context of data analysis, Elasticsearch is used together with the other components in the ELK Stack, Logstash and Kibana, and plays the role of data indexing and storage. These days, Logstash is usually replaced with smaller and more lightweight components like Fluentd or FluentBit – which can accomplish most of what Logstash can, without the heavy computing footprint and common challenges.

As you’ll see in this tutorial, getting started with Elasticsearch isn’t rocket science. Especially when you’re setting up a small cluster, implementing an ELK logging pipeline is straightforward.

However, once you start sending more data, ELK management requires more work. You’ll need to manage and scale larger clusters, implement more data parsing, install and manage data queuing like Kafka to buffer your logs, maybe upgrade your ELK components, and monitor and tune your stack for performance issues.

For those that don’t want to manage these tasks themselves and need additional features like RBAC, Logz.io built a log management tool that delivers ELK-as-a-service (which is now OpenSearch-as-a-service!), so you can embrace the world’s leading logging platform, without having to run it yourself. Logz.io also support metric and trace analytics – learn about the way Logz.io unifies and enhances the leading open source observability technologies here.

All that said, with small clusters, running Elasticsearch yourself is a great choice. Let’s see how to get started.

Installing Elasticsearch

The requirements for Elasticsearch are simple: Java 8 (specific version recommended: Oracle JDK version 1.8.0_131). Take a look at this Logstash tutorial to ensure that you are set. Also, you will want to make sure your operating system is on the Elastic support matrix, otherwise you might run up against strange and unpredictable issues. Once that is done, you can start by installing Elasticsearch.

You can download Elasticsearch as a standalone distribution or install it using the apt and yum repositories. We will install Elasticsearch on an Ubuntu 16.04 machine running on AWS EC2 using apt.

First, you need to add Elastic’s signing key so you can verify the downloaded package (skip this step if you’ve already installed packages from Elastic):

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

For Debian, we need to then install the apt-transport-https package:

sudo apt-get install apt-transport-https

The next step is to add the repository definition to your system:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

All that’s left to do is to update your repositories and install Elasticsearch:

sudo apt-get update
sudo apt-get install elasticsearch

Configuring Elasticsearch

Elasticsearch configurations are done using a configuration file whose location depends on your operating system. In this file, you can configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.

For development and testing purposes, the default settings will suffice yet it is recommended you do some research into what settings you should manually define before going into production.

For example, and especially if installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to either a private IP or localhost:

sudo vim /etc/elasticsearch/elasticsearch.yml
network.host: "localhost"
http.port:9200

Running Elasticsearch

Elasticsearch will not run automatically after installation and you will need to manually start it. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems you can use this command:

sudo service elasticsearch start

And that’s it! To confirm that everything is working fine, simply point curl or your browser to http://localhost:9200, and you should see something like the following output:

{
  "name" : "33QdmXw",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "mTkBe_AlSZGbX-vDIe_vZQ",
  "version" : {
    "number" : "6.1.2",
    "build_hash" : "5b1fea5",
    "build_date" : "2018-01-10T02:35:59.208Z",
    "build_snapshot" : false,
    "lucene_version" : "7.1.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

To debug the process of running Elasticsearch, use the Elasticsearch log files located (on Deb) in /var/log/elasticsearch/.

Creating an Elasticsearch Index

Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.
Elasticsearch behaves like a REST API, so you can use either the POST or the PUT method to add data to it. You use PUT when you know the or want to specify the id of the data item, or POST if you want Elasticsearch to generate an id for the data item:

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'
{
	"timestamp": "2018-01-24 12:34:56",
	"message": "User logged in",
	"user_id": 4,
	"admin": false
}
'
curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '
{
  "id": 4,
  "username": "john",
  "last_login": "2018-01-25 12:34:56"
}
'

And the response:

{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
{"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with any other NoSQL database, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define Elasticsearch mappings according to data types. More on this later.

If you are using any of the Beats shippers (e.g. Filebeat or Metricbeat), or Logstash, those parts of the ELK Stack will automatically create the indices.

To see a list of your Elasticsearch indices, use:

curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2018.01.23 y_-PguqyQ02qOqKiO6mkfA   5   1      17279            0      9.9mb          9.9mb
yellow open   app                 GhzBirb-TKSUFLCZTCy-xg   5   1          1            0      5.2kb          5.2kb
yellow open   .kibana             Vne6TTWgTVeAHCSgSboa7Q   1   1          2            0      8.8kb          8.8kb
yellow open   logs                T9E6EdbMSxa8S_B7SDabTA   5   1          1            0      5.7kb          5.7kb

The list in this case includes the indices we created above, a Kibana index and an index created by a Logstash pipeline.

Elasticsearch Querying

Once you index your data into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item. Read our article focused exclusively on Elasticsearch queries.

Once again, via the Elasticsearch REST API, we use GET:

curl -XGET 'localhost:9200/app/users/4?pretty'

And the response:

{
  "_index" : "app",
  "_type" : "users",
  "_id" : "4",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "id" : 4,
    "username" : "john",
    "last_login" : "2018-01-25 12:34:56"
  }
}

The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.
We also use GET to do searches by calling the _search endpoint:

curl -XGET 'localhost:9200/_search?q=logged'
{"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source":
{
    "timestamp": "2018-01-24 12:34:56",
    "message": "User logged in",
    "user_id": 4,
    "admin": false
}
}]}}

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

took: The time in milliseconds the search took
timed_out: If the search timed out
_shards: The number of Lucene shards searched, and their success and failure rates
hits: The actual results, along with meta information for the results

The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word. You can build more specific searches by using Lucene queries:

username:johnb – Looks for documents where the username field is equal to “johnb”
john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

There are many other ways to search including the use of boolean logic, the boosting of terms, the use of fuzzy and proximity searches, and the use of regular expressions.

Elasticsearch Query DSL

URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require.

It contains two kinds of clauses: 1) leaf query clauses that look for a value in a specific field, and 2) compound query clauses (which might contain one or several leaf query clauses).

Elasticsearch Query Types

There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Query types include:

Geo queries,
“More like this” queries
Scripted queries
Full text queries
Shape queries
Span queries
Term-level queries
Specialized queries

Creating an Elasticsearch Cluster

Maintaining an Elasticsearch cluster can be time-consuming, especially if you are doing DIY ELK. But, given Elasticsearch’s powerful search and analytic capabilities, such clusters are indispensable. We have a deeper dive on the subject with our Elasticsearch cluster tutorial, so we will use this as a springboard for that more thorough walk-through.

What is an Elasticsearch cluster, precisely? Elasticsearch clusters group multiple Elasticsearch nodes and/or instances together. Of course, you can always choose to maintain a single Elasticsearch instance or node inside a given cluster. The main point of such a grouping lies in the cluster’s distribution of tasks, searching, and indexing across its nodes. Node options include data nodes, master nodes, client nodes, and ingest nodes.

Installing nodes can involve a lot of configurations, which our aforementioned tutorial covers. But here’s the basic Elasticsearch cluster node installation:

First and foremost, install Java:

sudo apt-get install default-jre

Next, add Elasticsearch’s sign-in key:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Next, install the new iteration of Elasticsearch:

sudo apt-get update && apt-get install elasticsearch

You will have to create and/or set up each Elasticsearch node’s own elasticsearch.yml config file (sudo vim /etc/elasticsearch/elasticsearch.yml). From there, start Elasticsearch and then check your Elasticsearch cluster status. Responses will look something like this:

{
  "cluster_name" : "elasticsearch-cluster-demo",
  "compressed_size_in_bytes" : 255,
  "version" : 7,
  "state_uuid" : "50m3ranD0m54a531D",
  "master_node" : "IwEK2o1-Ss6mtx50MripkA",
  "blocks" : { },
  "nodes" : {
    "m4-aw350m3-n0D3" : {
      "name" : "es-node-1",
      "ephemeral_id" : "x50m33F3mr--A11DnuM83r",
      "transport_address" : "172.31.50.123:9200",
      "attributes" : { }
    },
  },
}

Elasticsearch cluster health will be next on your list. Periodically check your cluster’s health with the following API call:

curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&local=false&level=shards&pretty"

This example shows the parameter local as false, (which is actually by default). This will show you the status of the master node. To check the local node, change to true.

The level parameter will, by default, show you cluster health, but ranks beyond that include indices and shards (as in the above example).

There are additional optional parameters for timeouts…

timeout master_timeout

…or, to wait for certain events to occur:

wait_for_active_shards wait_for_events wait_for_no_initializing_shards wait_for_no_relocating_shards wait_for_nodes wait_for_status

Of course, with Logz.io, creating an Elasticsearch (now OpenSearch) cluster is as easy as starting a free trial. And scaling up your Elasticsearch cluster requires nothing from the user.

Removing Elasticsearch Data

Deleting items from Elasticsearch is just as easy as entering data into Elasticsearch. The HTTP method to use this time is—surprise, surprise—DELETE:

$ curl -XDELETE 'localhost:9200/app/users/4?pretty'
{
  "_index" : "app",
  "_type" : "users",
  "_id" : "4",
  "_version" : 2,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

To delete an index, use:

$ curl -XDELETE 'localhost:9200/logs?pretty'

To delete all indices (use with extreme caution) use:

$ curl -XDELETE 'localhost:9200/_all?pretty'$

The response in both cases should be:

{
 "acknowledged" : true
}

To delete a single document:

$ curl -XDELETE 'localhost:9200/index/type/document'

What’s Next?

This tutorial helps beginners with Elasticsearch and as such provides just the basic steps of CRUD operations in Elasticsearch. Elasticsearch is a search engine, and as such features an immense depth to its search features.

Of course, if you don’t want to do any of this yourself, you can start a Logz.io free trial to get started logging with OpenSearch – which at this point is very similar to Elasticsearch – without having to manually install anything or run anything on your own infrastructure.

To embrace an open source alternative to ELK, check out our guide on OpenSearch and OpenSearch Dashboards or AWS’s OpenSearch documentation.

For next steps with Elasticsearch, consider exploring the official Elasticsearch documentation as well as our Logstash tutorial and Kibana tutorial.