Elasticsearch Mapping: The Basics, Updates & Examples
Within a search engine, mapping defines how a document is indexed and how it indexes and stores its fields. We can compare mapping to a database schema in how it describes the fields and properties that documents hold, the datatype of each field (e.g., string
, integer
, or date
), and how those fields should be indexed and stored by Lucene. It is very important to define the mapping after we create an index—an inappropriate preliminary definition and mapping may result in the wrong search results.
In an earlier article, we ran an elaborate comparison of the two search engine market leaders, Elasticsearch and Apache Solr. Here, we will delve deep into the Elasticsearch mappings using a stable Elasticsearch v2.4 configuration. We will discuss the basics, the different field types, and then give examples for both static and dynamic mapping.
About Mapping
Mapping is intended to define the structure and field types as required based on the answers to certain questions. For example:
- Which string fields should be full text and which should be numbers or dates (and in which formats)?
- When should you use the _all field, which concatenates multiple fields to a single string and helps with analyzing and indexing?
- What custom rules should be set to update new field types automatically as they are added (e.g., the dynamic mapping type, which we will discuss further later on)?
Elasticsearch GET
Mapping Requests
The basic request is rather simple, with an optional parameter focusing the GET _mapping
request at a specific index:
GET /_mapping
GET /<index>/_mapping
You can also retrieve Elasticsearch mapping for multiple indices at once rather easily:
GET /,/_mapping
As a more concrete example, say you’re looking for particular parts of speech in your NLP indices:
GET /verbs,nouns/_mapping
An example of a return Elasticsearch GET _mapping
JSON result would be:
{ "verbs" : { "mappings" : { "properties" : { }, "go" : { "type" : "text" }, "walk" : { "type" : "text" } } } } }
Deprecation of Mapping Types
With Elasticsearch 7.0.0, mapping types were deprecated (with limited support in Elasticsearch 6.0.0). However, knowing how they worked can help understand current versions of Elasticsearch, as well as aid in dealing with earlier versions. Each index had one or more mapping types that were used to divide documents into logical groups. Basically, a type in Elasticsearch represented a class of similar documents and had a name such as customer
or item
. Lucene has no concept of document data types, so Elasticsearch would store the type name of each document in a metadata field of a document called _type. When searching for documents within a particular type, Elasticsearch simply used a filter on the _type field to restrict the search.
In addition, mappings are the layer that Elasticsearch still uses to map complex JSON documents into the simple flat documents that Lucene expects to receive. Each mapping type had fields or properties that meta-fields and various data types would define.
Hence, mapping types would appear as such:
curl -X PUT 'http://localhost:9200/students' -d '{ "mappings": { "student": { "properties": { "name": { "type": "keyword" }, "degree" { "type": "keyword" }, "age": { "type": "integer" } }, "properties": { "performance": { "type": "keyword" } } } } }'
Combining the _type field with the _id field of each document generated a new _uid field that combined multiple documents in a unified index.
So to index a new student, for example, we would use:
curl -X PUT 'http://localhost:9200/students/student/1' -d ' { "name" : "Isaac Newton", "age": 14, "performance": "honor student" }'
And when querying Elasticsearch for a student, we would use the mapping type by including it in the URL:
curl -X GET 'http://localhost:9200/students/student/_search' -d ' { "query": { "match": { "name": "Isaac Newton" } } }'
Elasticsearch Mapping Types Alternatives
Two main alternatives to mapping types are recommended: 1) to index per document type, OR 2) to create a customer type field.
Index per document type
First, let’s look at indexing according to document type. Indices are independent from one another, so you can use the same name for a field type in each index without issue. As per the explanation a few paragraphs above, you lose the _uid
field, but retain the _type
and _id
fields. If you are indexing comments on an e-commerce page, you would index comments
and user
but not combined in a single index. This also has the added advantages of 1) more accurate term statistics because of more precise, single entity documents, AND 2) that it will work better with Lucene’s dense data storage strategy setting for between 4,096 and 65,535 documents (65,535 being a block’s capacity)
Custom type field
Implement a custom type
field that operates in a similar manner to the deprecated _type
field.
Of course, there is a limit to how many primary shards can exist in a cluster so you may not want to waste an entire shard for a collection of only a few thousand documents. In this case, you can implement your own custom type
field which will work in a similar way to the old _type
.
Data-Type Fields
When we create mapping, each mapping type will be a combination of multiple fields or lists with various types. For example, a “user
” type may contain fields for title, first name, last name, and gender whereas an “address
” type might contain fields for city, state, and zip code.
Elasticsearch supports a number of different data types for the fields in a document:
Core data types: String, Date, Numeric (long, integer, short, byte, double, and float), Boolean, Binary
Complex data types:
Array: Array support does not require a dedicated type
Object: Object for single JSON objects
Nested: Nested for arrays of JSON objects
Geo data types:
Geo-point: Geo_point for latitude/longitude points
Geo-Shape: Geo_shape for complex shapes such as polygons
Specialized data types:
IPv4: ip for IPv4 addresses
Completion: completion to provide autocomplete suggestions
Token count: token_count to count the number of tokens in a string
Attachment: Mapper-attachments plugin which supports indexing attachments in formats such as Microsoft Office, Open Document, ePub, and HTML, into an attachment datatype
Note: In versions 2.0 to 2.3, dots were not a permitted form in field names. Elasticsearch 2.4.0 adds a system property called mapper.allow_dots_in_name that disables the check for dots in field names.
Meta Fields
Meta fields customize how a document’s associated metadata is treated. Each document has associated metadata such as the _index, mapping _type, and _id meta-fields. The behavior of some of these meta-fields could be custom when a mapping type was created.
Identity meta-fields
_index: The index to which the document belongs.
_uid: A composite field consisting of the _type and the _id.
_type: The document’s mapping type.
_id: The document’s ID.
Document source meta-fields
_source: The original JSON representing the body of the document.
_size:The size of the _source field in bytes, provided by the mapper-size plugin.
Indexing meta-fields
_all: A catch-all field that indexes the values of all other fields.
_field_names: All fields in the document that contain non-null values.
_timestamp: A timestamp associated with the document, either specified manually or auto-generated.
_ttl: How long a document should live before it is automatically deleted.
Routing meta-fields
_parent: Used to create a parent-child relationship between two mapping types.
_routing: A custom routing value that routes a document to a particular shard.
Other meta-field
_meta: Application specific metadata.
Example
To create a mapping, you will need the Put Mapping API that will help you to set a specific mapping definition for a specific type, or you can add multiple mappings when you create an index.
An example of mapping creation using the Mapping API:
PUT 'Server_URL/Index_Name/_mapping/Mapping_Name' { "type_1" : { "properties" : { "field1" : {"type" : "string"} } } }
In the above code:
- Index_Name: Provides the index name to be created
- Mapping_Name: Provides the mapping name
- type_1 : Defines the mapping type
- Properties: Defines the various properties and document fields
- {“type”}: Defines the data type of the property or field
Below is an example of mapping creation using an index API:
PUT /index_name { "mappings":{ "type_1":{ "_all" : {"enabled" : true}, "properties":{ "field_1":{ "type":"string"}, "field_2":{ "type":"long"} } }, "type_2":{ "properties":{ "field_3":{ "type":"string"}, "field_4":{ "type":"date"} } } } }
In the above code:
Index_Name
: The name of the index to be createdtype_1
: Defines the mapping type_all
: The configuration metafield parameter. If “true
,” it will concatenate all strings and search valuesProperties
: Defines the various properties and document fields{“type”}
: Defines the data type of the property or field
Two Mapping Types
Elasticsearch supports two types of mappings: “Static Mapping” and “Dynamic Mapping.” We use Static Mapping to define the index and data types. However, we still need ongoing flexibility so that documents can store extra attributes. To handle such cases, Elasticsearch comes with the dynamic mapping option that was mentioned at the beginning of this article.
Static Mapping
In a normal scenario, we know well in advance which kind of data the document will store, so we can easily define the fields and their types when creating the index. Below is an example in which we are going to index employee data into an index named “company
” under the type “employeeInfo
.”
Sample document data:
{ "name" : {"first" :"Alice","last":"John"}, "age" : 26, "joiningDate" : "2015-10-15" }
Example :
PUT /company { "mappings":{ "employeeinfo":{ "_all" : {"enabled" : true}, "properties":{ "name":{ "type":"object", "properties":{ "field_1":{ "type":"string" }, "field_2":{ "type":"string" } } }, "age":{ "type":"long" }, "joiningDate":{ "type":"date" } } } } }
In the above API:
- employeeinfo: Defines the mapping type name
- _all: The configuration metafield parameter. If “true,” it will concatenate all strings and search values
- Properties: Defines various properties and document fields
- {“type”}: Defines the data type of the property or field
Dynamic Mapping
Thanks to dynamic mapping, when you just index the document, you do not always need to configure the field names and types. Instead, these will be added automatically by Elasticsearch using any predefined custom rules. New fields can be added both to the top-level mapping type and to inner objects and nested fields. In addition, dynamic mapping rules can be configured to customize the existing mapping.
Custom rules help to identify the right data types for unknown fields, such as mapping true/false in JSON to boolean, while integer in JSON maps to long in Elasticsearch. Rules can be configured using dynamic field mapping or a dynamic template. When Elasticsearch encounters an unknown field in a document, it uses dynamic mapping to determine the data type of the field and automatically adds the new field to the type mapping.
However, there will be cases when this will not be your preferred option. Perhaps you do not know what fields will be added to your documents later, but you do want them to be indexed automatically. Perhaps you just want to ignore them. Or, especially if you are using Elasticsearch as a primary data store, maybe you want unknown fields to have an exception to alert you of the problem. Fortunately, you can control this behavior with the dynamic setting, which accepts the following options:
- true: Add new fields dynamically — this is the default
- false: Ignore new fields
- strict: Throw an exception if it encounters an unknown field
Example:
PUT /index_name { "mappings": { "my_type": { "dynamic": "strict", "properties": { "title": { "type":"string"}, "stash": { "type": "object", "dynamic": true } } } } }
In the above API:
index_name
– creates an index with this namemy_type
– defines the mapping type name- “dynamic”: “strict” – the “
my_type
” object will throw an exception if an unknown field is encountered - “dynamic”:
true
– the “stash” object will create new fields dynamically _all
– the configuration metafield parameter. If “true
,” it will concatenate all strings and search valuesproperties
– defines the various properties and document fields{“type”}
– defines the data type of the property or field
With dynamic mapping, you can add new searchable fields into the stash object:
Example:
PUT /my_index/my_type/1 { "title": "This doc adds a new field", "stash": { "new_field": "Success!" } }
But trying to do the same at the top level will fail:
PUT /my_index/my_type/1 { "title": "This throws a StrictDynamicMappingException", "new_field": "Fail!" }
What Was New in Elasticsearch 5.0 for Mapping?
Elasticsearch 2.X had a string
data type for full-text search and keyword identifiers. You use full-text search to discover relevant text in documents, while you would use keyword identifiers for sorting, aggregating, and filtering the documents. Back in Elasticsearch 2.x, we couldn’t explicitly tell the Elasticsearch engine which fields to use for full-text search and which to use for sorting, aggregating, and filtering the documents.
Elasticsearch 5.X—see our full post on the full ELK Stack 5.0 as well as our Complete Guide to the ELK Stack—comes with two new data types called text
and keyword
, replacing the string
data type in the earlier version.
Text
: Full-text and relevancy search in documentsKeyword
: Exact-value search for sorting, aggregation and filtering documents
Text fields support the full analysis chain while keyword fields will support only a limited analysis—just enough to normalize values with lower casing and similar transformations. Keyword fields support document values for memory-friendly sorting and aggregations while text fields have field data disabled by default to prevent the loading of massive amounts of data into the memory by mistake.
When to Use Text
or Keyword
Data Types
Ending this article with a practical tip, here is a rule of thumb for mapping in Elasticsearch:
Text
data types – Use when you require full-text search for particular fields such as the bodies of e-mails or product descriptionsKeyword
data types – Use when you require an exact-value search, particularly when filtering (“Find me all products where status is available
”), sorting, or using aggregations. Keyword fields are only searchable by their exact value. Use keyword data types when you have fields like email addresses, hostnames, status codes, zip codes, or tags.
Preventing Elasticsearch Mapping Explosions
If you create too many fields, you can overload your memory. These settings will help:
index.mapping.total_fields.limit
– The max number of indexable fields, which is set1000
by default but might not be enough. Therefore, if you need to go above that, you should increase theindices.query.bool.max_clause_count
setting in kind to limit the number of query boolean clauses.index.mapping.depth.limit
– The max depth for a field, defined by the number of layers of objects which is measured as the number of inner objects.index.mapping.nested_fields.limit
– The max number ofnested
mappings in an index (default50
)index.mapping.nested_objects.limit
– The max number ofnested
JSON objects within a single document across all nested types (default10000
).
Elastic’s docs also recommend looking at this setting, even though it has little to do with stemming a mapping explosion:
index.mapping.field_name_length.limit
– The max length of a field name. The default value is Long.MAX_VALUE (no limit).
Summary
Mapping in Elasticsearch can seem daunting at times, especially if you’re just starting out with ELK. At Logz.io, this is part of the service we provide our users. But if you’re using your own Elasticsearch deployment, pay careful attention to the details. We hope this article will help you to understand the basics.