elasticsearch tutorial

Elasticsearch is the living heart of what is today’s the most popular log analytics platform — the ELK Stack (Elasticsearch, Logstash and Kibana). The role played by Elasticsearch is so central that it has become synonymous with the name of the stack itself. Used primarily for search and log analysis, Elasticsearch is today one of the most popular database systems available today.

This guide provides new users with the knowledge and tools needed to get started with Elasticsearch, and includes installation instructions, and initial indexing and data handling instructions.

What is Elasticsearch?

Initially released in 2010, Elasticsearch is a modern search and analytics engine which is based on Apache Lucene. Completely open source and built with Java, Elasticsearch is categorized as a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it.

Unlike most NoSQL databases, though, Elasticsearch has a strong focus on search capabilities and features — so much so, in fact, that the easiest way to get data from Elasticsearch is to search for it using its extensive REST API.

In the context of data analysis, Elasticsearch is used together with the other components in the ELK Stack, Logstash and Kibana, and plays the role of data indexing and storage.

Installing Elasticsearch

The requirements for Elasticsearch are simple: Java 8 (specific version recommended: Oracle JDK version 1.8.0_131). Take a look at this Logstash tutorial to ensure that you are set. Also, you will want to make sure your operating system is on the Elastic support matrix, otherwise you might run up against strange and unpredictable issues. Once that is done, you can start by installing Elasticsearch.

Elasticsearch can be downloaded as a standalone distribution or installed using the apt and yum repositories. We will be installing Elasticsearch on an Ubuntu 16.04 machine running on AWS EC2 using apt.

First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):

For Debian, we need to then install the apt-transport-https package:

The next step is to add the repository definition to your system:

All that’s left to do is to update your repositories and install Elasticsearch:

Configuring Elasticsearch

Elasticsearch configurations are done using a configuration file that is located in different locations depending on your operating system. In this file, you can configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.

For development and testing purposes, the default settings will suffice yet it is recommended you do some research into what settings can be manually defined before going into production.

For example, and especially if installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to either a private IP or localhost:

Running Elasticsearch

Elasticsearch will not run automatically after installation and you will need to manually start it. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems you can use this command:

And that’s it! To confirm that everything is working fine, simply point curl or your browser to http://localhost:9200, and you should see something like the following output:

To debug the process of running Elasticsearch, use the Elasticsearch log files located (on Deb) in /var/log/elasticsearch/.

Creating an Index in Elasticsearch

The process of adding data to Elasticsearch is called “indexing.” This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.

Elasticsearch behaves like a REST API, so you can use either the POST or the PUT method to add data to it. You use PUT when you know the or want to specify the ID of the data item, or POST if you want Elasticsearch to generate an ID for the data item:

And the response:

The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with most other NoSQL databases, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define mappings for data types. More on this later.

If you are using any of the Beats shippers (e.g. Filebeat or Metricbeat), or Logstash — the indices are automatically created.

To see a list of your Elasticsearch indices, use:

The list in this case includes the indices we created above, a Kibana index and an index created by a Logstash pipeline.

Elasticsearch Querying

Once you have your data indexed into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item.

Once again, because Elasticsearch is a REST API, we use GET:

And the response:

The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.

We also use GET to do searches by calling the _search endpoint:

 

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

  • took: The time in milliseconds the search took
  • timed_out: If the search timed out
  • shards: The number of Lucene shards searched, and their success and failure rates
  • hits: The actual results, along with meta information for the results

The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, all of the fields of all the documents are searched for that word. You can build more specific searches by using Lucene queries:

  • username:johnb – Looks for documents where the username field is equal to “johnb”
  • john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
  • john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

There are many other ways to search including the use of boolean logic, the boosting of terms, the use of fuzzy and proximity searches, and the use of regular expressions.

Elasticsearch Query DSL

URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Some of the options include geo queries, “more like this” queries, and scripted queries.

The DSL also makes a distinction between a filtering and a query context for query clauses. Clauses used as filters test documents in a boolean fashion: Does the document match the filter, “yes” or “no”? Filters are also generally faster than queries, but queries, can also calculate a score based on how closely a document matches the query. This is used to determine the ordering and inclusion of documents:

And the result:

Removing Elasticsearch Data

Deleting items from Elasticsearch is just as easy as entering data into Elasticsearch. The HTTP method to use this time is — surprise, surprise! — DELETE:

To delete an index, use:

To delete all indices (use with extreme caution) use:

The response in both cases should be:

What’s Next?

This tutorial helps beginners to get started with Elasticsearch and as such provides just the basic steps of CRUD operations in Elasticsearch. Elasticsearch is a search engine, and as such features an immense depth to its search features.

Recent versions have introduced some incredible new Elasticsearch features and also some significant changes to the underlying data structure.

It’s always a good idea to explore the official Elasticsearch documentation as well as our Logstash tutorial and Kibana tutorial.

Looking for an auto-scaling Elasticsearch service? Logz.io has you covered.