Elastic is prepping for Elasticsearch 8.0, but in the meantime is rolling out upgrades and features with Elasticsearch 7.7. The new version introduces asynchronous search as well as changes with Elasticsearch clusters, mapping, SQL enhancements, snapshots and machine learning. This post will cover just a few of the highlights of the new release. Besides asynchronous search, ES 7.7 also introduces multi-class classification, reduced heap usage, inference time features, and better password security.
Note: This covers the release of open source Elasticsearch 7.7, not its integration into Logz.io.
Elasticsearch asynchronous search, a.k.a. async, enables background searches that scour large cold stores of data. The design focuses on ending search timeouts and continuing longer-running search tasks while working on current data feeds.
Elasticsearch got some important search changes. The first is that range query rounding will now be more consistent. The second is that pipeline aggregation validation errors will return 400/Bad Request messages in place of 500/Internal Service Errors. Additionally,
BoolQueryBuilder’s mustNot field has been deprecated.
More new offerings include a cluster setting that disallows expensive queries, plus new X-Pack endpoints for asynchronous search. Further additions mean
async_search GET and DELETE request APIs and automatic computing of
pre_filter_shard_size whenever you don’t specify it.
Version 7.7 introduces stricter validation of mapping update times within an Elasticsearch index. This is in preparation for their eventual deprecation with Elasticsearch 8.0. Updates include more consistent range query rounding as well as listing validation errors instead of only returning the first encountered error.
ES 7.7 also introduces a new field:
constant_keyword. This further specifies the Elasticsearch keyword field for moments that documents return the same value. The idea behind its development was more efficient performance during a query and during rewrites. The field is auto-configurable.
constant_keyword supports the same aggregations and queries that the original keyword field does. By marking keywords as constant though, it saves processing time and speeds up Elasticsearch filtering.
For machine learning, the field field_mappings is now called field_map. There have also been efforts to fix mapping problems with index templates.
Elasticsearch Clusters and Cluster Coordination
Relevant updates include removing seeds dependency for remote cluster settings (#52796), a “
grant_api_key” for cluster privilege (#53527), and introducing a formal role for a remote cluster client (#53924). The
STALE_STATE_CONFIG will also be described in ClusterFormationFH.
Elastic Stack SQL enhancements were woven into the system for JDBC debugging (notably handling ES JDBC driver files), more lenient data parsing, and trace logging for server search responses. The calendar_inteval histogram option is now available for one-day and one-month intervals.
Devs fixed a laundry list of bugs, among them being:
- ORDER BY aggregates, ORDER BY YEAR() function, and GROUP BY fields
- Millisecond handling in intervals
- SQL cli sourcing for X-pack
- The handling of unsupported data type fields
Updates were in order here for Lucene 8.5.0 Snapshot (within the full Lucene 8.5.0 upgrade). There were also level-ups to AWS SDK 1.11.749, Azure SDK 8.6.2, GCS Dependency and GCS SDK.
Enhanced features here include Azure bulk deletes, blob down retries for the GCS repository, parallel snapshot restores and deletes, and finally better incrementality for snapshots of shards that are unchanged.
Fixes here cover problems with missing empty snapshots, inconsistent shard failure counts in failed snapshots, and an “overly aggressive” request deduplication.
The Elastic Stack’s machine learning capabilities saw major enhancements with the update. Here are a few of the highlights.
There are a number of changes for Data Frames. It adds improved data frame analytics audits, logging (#53179), stats (#53788), and data counts (#53998). Additional features include parsing and metrics on memory usage.
Updates include the addition and input field type to the trained model config (#53083), letting users get a sky’s eye view of the changes that the trained model will make. Namely, it will mark what the new input field should be after being dynamically decided by the model itself.
Added features to datafeeds include
indices_options to datafeed config and update, a GET request for
_cat/ml/datafeeds, and to make them work with nanosecond time fields.
Other add-ons include support for multi-value leaves in tree models, a feature importance option to the inference processor, and parsers to inference configuration classes.
Instrumentation is now also possible with data frame analytics jobs for outlier detection, supervised learning, and peak memory consumption. You will also find support for multi-class classification and feature importance.
Elasticsearch 7.7 Security
An API key will be created on behalf of newly authorized users. There is also new support for secondary authentication and exception metadata for disabled features. Password changes will now be disallowed whenever authenticated by a token.
kibana_system) will also get the ability to create and invalidate API keys, plus collect APM telemetry in the background. New maintenance and
grant_api_key privileges will be on offer in ES 7.7.
Authentication and authorization saw some a bunch of bug repairs. A few of them include:
- Concurrent token refresh
- Token API responses
- Requiring delegate API keys
- Preserving ApiKey credentials for async verifications, and
_rollup_searchwith read privileges