As you may have already heard, the recent release of Elasticsearch 6 signaled the beginning of the end to the concept of mapping types. Defined as a breaking change, and relating to the core functionality and usage of Elasticsearch, this change was somewhat controversial. Some users even went as far as viewing this change as a “casus belli” of sorts.

Tweet

 

While there is little doubt that the removal of types has huge implications for Elasticsearch users, it is important to chill the atmosphere by understanding the reasons for this move and its meaning as well as ways to adapt to it.

What are Elasticsearch mapping types?

Let’s begin with trying to understand what mapping types are and how they were used up until now.

Within Elasticsearch, mapping defines how a document is indexed and how its fields are indexed and stored. Each Elasticsearch index had one or more mapping types that were used to divide documents into logical groups. In other words, a type in Elasticsearch represented a class of similar documents.

Comparing to the world of relational databases (a problematic comparison that we will get to later in the article), types can be compared with tables.

Let’s say we are indexing different types of students (e.g. honor students, failing students, etc.)

The mapping type for such an index would consist of a combination of fields as seen in this example:

The index would then be built with a  _type field which was combined with the document’s _id field to generate a _uid field, allowing documents of different types to co-exist in the same index.  

So to index a new student, we would use:

And when querying Elasticsearch for a student, we would use the mapping type by including it in the URL:

This kind of search would return:

You can read more about Elasticsearch mapping types here.

So, why are mapping types being removed?

Looking back in time to an issue opened in 2015 on the Elasticsearch GitHub page gives us insight into the main reason mapping types are being removed — to simplify the understanding and usage of the underlying data structure in Elasticsearch and to optimize performance.

Remove Support Types

When trying to understand or explain the data structure Elasticsearch is based upon, users often referred to a comparison with relational databases, where: index = database, type = table.

While there is a certain logic to this analogy, it is simply incorrect from a technical standpoint.  

Unlike SQL databases, where tables are totally independent from each other, and where columns in different tables do not affect one another, Elasticsearch fields and mapping types in the same index are interrelated.

Fields that share the same name but are used in different mapping types are backed by the same Lucene field. This means, in turn, that fields with the same name, and which are used in different types in the same index must have the same mapping definition. A constraint that can be extremely limiting in some scenarios.

If we wanted to use the performance field name in another type within the same index, it would have to have the same mapping definition as defined in the students type:

The second issue is related to how Lucene handles documents with empty fields (field with no value), and issue otherwise known as data sparsity. Multiple types in the same index results in most cases in a large amount of empty fields, which because of the way Lucene stores data, results in suboptimal resource utilization.

Using types in Elasticsearch 6

We can start with the good news, which is that types have not been totally removed. Yet.

While indices created in version 6 are only allowed one mapping type, indices created in version 5 containing multiple types can continue to work as before. You can also use the Elasticsearch ReIndex API to convert these indices to single-type indices.

To achieve the same functionality in Elasticsearch 6, what alternative methods are there?

Other than the obvious method of placing all your properties under one single type, there are two additional methods for accomplishing the same goal achieved with multiple mapping types in previous Elasticsearch versions.

The first is to have an index per document type. In our example, you could put honor students in one index, failing students in another index, and so forth for each type of student.

Another solution is to create a custom type field which works pretty similar to how the good old _type meta-field worked:

Continuing with our previous example, we can populate this field with some data:

So to search for our honor students, we would use:

Summing it up

The removal of mapping types is a process that started already in version 5, has been taken a serious step further in step 6, and is planned to continue in the next versions until the complete removal in version 9 (stay updated, and take a look at the planned changes here).

Breaking changes are always a pain, especially fundamental changes that require rethinking the way we operate, but at the end of the day, the end game is all that matters. While indeed a breaking change in any Elasticsearch index with multiple types, it also seems to be an important change with obvious benefits in terms of ease-of-use and performance. Ultimately, forcing us to use indices in a way that is more suited to the underlying data structure should speed up searches.  

It seems that most users tend to agree with this assertion:

As we begin the migration process to Elasticsearch 6, we will document and report any issues we come across in respect to this change, so stay tuned for news.

Need help managing Elasticsearch? Logz.io can do the heavy lifting for you.