elastic stack 6

A review of the GA release can be read here

This is somewhat of a presumptuous title as it is way too early to gauge what the final version of Elastic Stack 6.0 will look like. However, the second alpha was released yesterday, and it already includes some interesting new features across the entire stack.

So no, it’s not time to think about planning an upgrade (despite some promising news on the Elasticsearch upgrade experience), but it is a good chance as any to take a look at the goodies to expect in the upcoming major release.

Let’s take a closer look.

Elasticsearch 6.0

As deserved by being the heart of the stack, Elasticsearch has the largest number of meaningful changes. Some (sparse doc values and index sorting) are based on the change to Lucene 7, while others involve architectural and performance changes.

Upgrades

Historically, upgrades between major versions of Elasticsearch was an engineering headache and involved careful planning and implementation. Backups, testing, and orchestration, for example, were part of our upgrade to Elasticsearch 5.

While upgrading, especially in large deployments, will most likely still be a difficult task, some changes in version 6.0.0 promise to alleviate some of these challenges. Specifically, the new Rolling Restart feature negates the need for a full cluster restart and thus minimizes downtime.

Replacing Tribe Nodes, Elasticsearch 6.0.0 will also support cross-cluster searches — meaning you will be able to search indices created in 5.x (though not 2.x).

Another challenge when upgrading in the past was finding out (usually when it was too late) about deprecated features. Deprecation logs have been reinforced with important info on breaking changes.

Index Sorting

To ensure faster searches, Elasticsearch 6.0.0 will allow sorting to be performed at indexing time instead of during search. Search requests will be able to terminate earlier and thus save on throughput. Index Sorting promises enhanced search efficiency but is not suitable for every scenario (e.g., searches with aggregations).

Bye Bye Mapping Types

Another important change in Elasticsearch 6 is the slow demise of the concept of types. This process started in the current version, will be taken once step further in version 6.0, and will end, according to Elastic, in version 7.0.

The main reason for this move is to simplify the understanding and usage of the underlying data structure in Elasticsearch. Comparisons to RDBMS databases have led to a faulty understanding that types can be compared to tables that has led in turn to an expectation for fields to be independent across types whereas they must be of the same field type.

Thus, the current plan is to allow indices in version 6.0 to have only one type — version 5.x multi-type indices will continue to work in version 6.x — while a new special field will be introduced to store the relationship between indexed documents.

Better Shard Recovery

A new feature called Sequence Numbers promises to guarantee more successful and efficient  shard recovery.

Every index, update, and delete operation receives an ID that is logged in the primary shard’s transaction log. A replica can now refer to the operations recorded in this log and use them to update itself without needing to copy all the files, thus making recovery much faster. You’ll be able to configure how long to keep these transaction logs.

Replicas can run unacknowledged and different operations — meaning that in case of a primary shard failing, the replicas will be able to sync with the new primary shard without waiting for the next recovery.

Sparse Doc Values

A sparse values situation — when documents do not have values for each of the fields in our indices — results in the use of a large amount of disk space and file-system cache. Lucene 7 and Elasticsearch 6.0.0 supports Sparse Doc Values, a new encoding format for this kind of situation that promises to help us avoid wasting disk resources.

Logstash 6.0

Logstash is a critical component in the stack, but is also not easy to work with. Various issues with Logstash have pushed users to explore alternatives (see this comparison between Fluentd and Logstash) yet Logstash remains one of best methods to ship data into Elasticsearch due to its rich filtering options and native functionality with the other components in the stack.

Logstash 6.0.0 is still very much a “work-in-progress,” but the information Elastic has released so far talks of a feature called Logstash Intermediate Representation (LIR) that will allow users to “manipulate and introspect” Logstash configurations before and during runtime.

While this description is a bit ambiguous, it seems that LIR will support a graphical depiction of Logstash logging pipelines within Kibana.

logstash 6

(image by Elastic)

The alpha2 release added support for running multiple Logstash pipelines on the same JVM which will allow users to separate processing and guarantee that one failing pipeline won’t have any impact on other pipelines.

Another change in version 6.0.0 is an improvement to the GeoIP filter that allows users to use commercial GeoIP databases such as MaxMind’s GeoIP2.

Kibana 6.0

While the first alpha release of Elastic Stack 6 did not include major changes in Kibana, things are starting to warm up with the release of alpha2 with new features including CSV export, new gauge and map regions visualizations, and a new “Dashboard-Only” view for X-Pack roles.

There’s little doubt that future Kibana 6 alpha/beta releases will include more major features. Can’t wait!

Beats 6.0

The Beats family of log shippers is one of the fastest growing components in the stack. (Read how to use Metricbeat, Winlogbeat, and Packetbeat.) Every new version includes major changes, whether it’s a new beat or improved performance for existing beats.

The highlights in version 6.0.0 include support for monitoring Kubernetes, new Metricbeat modules, and a better upgrade experience.

Metricbeat supports a new Kubernetes module that provides details on container pods such as CPU and memory usage, and network data, while Filebeat now includes a new processor that adds pod names, namespaces, and labels to shipped data.

New Metricbeat modules include memcached, perfmon, Elasticsearch, Kibana, vSphere and RabbitMQ.

Rolling updates will be easier due to new naming conventions for Beats indices, which now include the Beat version. Thus, different mapping templates can be applied per version.

Endnote

The change to Lucene 7, together with some architectural and performance changes, promise to make Elastic Stack 6.0 another major and feature-rich release. More changes are expected, of course, and new features going into current and future 5.x versions should also be included (e.g., Elasticsearch range fields and the Kibana time series visual builder).

We will update this post as the subsequent alpha and beta versions are released.

REMEMBER — DO NOT USE IN PRODUCTION!

A review of the GA release can be read here