Elasticsearch SQL Support
Elasticsearch 6.3 included some major new features, including rollups and Java 10 support, but one of the most intriguing additions in this version is SQL support.
According to DB-Engines, Elasticsearch ranks 8th in popularity, trailing after Oracle, MySQL, PostgreSQL and other SQL databases. This explains why the ability to execute SQL queries on data indexed in Elasticsearch has been on the wishlist for many an Elasticsearch user for some time now.
The native querying language for searching in Elasticsearch, Query DSL, is a powerful tool, but not everyone found it easy to learn. The support added to Elasticsearch 6.3 makes it easier for those well versed in SQL statements to query the data in a more user-friendly method and benefit from the performance Elasticsearch has to offer.
Here’s a brief review of what is included in this new support and the capabilities it includes (still in experimental mode).
Part of X-Pack
There is no need to install an extra plugin or configure anything. SQL support is built into the default Elasticsearch package and is provided as part of the other X-Pack basic features under a special Elastic license.
Executing SQL statements
Before you begin testing your SQL statements, it’s important you understand how data is organized in Elasticsearch and the different terms SQL and Elasticsearch use. SQL columns, for example, are Elasticsearch fields. Rows are documents. Check out the different concepts here.
Let’s start with executing some simple SELECT statements using REST API that accepts JSON:
GET _xpack/sql { "query": "DESCRIBE logstash*" }
This will give me a breakdown of all my columns (or fields) in my Logstash index, in this case containing Apache access logs.
There is a default limitation of 100 columns for this query, so executing this statement against a large index (a Metricbeat index for example) would fail. You can change this by configuring the index.max_docvalue_fields_search index setting.
Results can be displayed in nicely structured JSON using an added parameter in the query:
GET _xpack/sql?format=json { "query": "DESCRIBE logstash*" }
Likewise, we can see the data in tabular format by changing the query as follows:
GET _xpack/sql?format=txt { "query": "DESCRIBE logstash*" }
GET _xpack/sql?format=json { "query": "SELECT avg(system.memory.free) FROM metricbeat*" }
Results in JSON:
{ "columns": [ { "name": "AVG(system.memory.free)", "type": "long" } ], "rows": [ [ 70962166.58631256 ] ] }
You can have a lot of fun with these queries, combining FROM and GROUP BY statements for example:
GET _xpack/sql?format=txt { "query": "SELECT avg(system.process.memory.size), system.process.name FROM metricbeat* system.process.name GROUP BY system.process.name", "fetch_size":5 }
Results, this time as displayed in tabular format:
AVG(system.process.memory.size)|system.process.name -------------------------------+------------------- 8.06273024E8 |agent 9277440.0 |docker-container 8.11589632E8 |dockerd 6.990266368E9 |dotnet 3.3151664758153844E9 |java
Multiple querying methods
What I think users will find extremely useful is the various ways in which SQL statements can be executed. In the example above, I used the console tool within Kibana to execute REST API, and I could just as easily have done the same from the command line:
curl -XGET "http://localhost:9200/_xpack/sql" -H 'Content-Type: application/json' -d' > { > "query": "SELECT count(message) FROM logstash*" > }' {"columns":[{"name":"COUNT(message)","type":"long"}],"rows":[[34560]]}
REST API is one way to go, but there is also a SQL CLI tool provided in Elasticsearch which can be run as follows within the Elasticsearch installation directory:
sudo ./bin/elasticsearch-sql-cli
The big news, however, is there is also a JDBC driver for Elasticsearch that can be installed as a standalone component or using Maven, so you can easily hook up Elasticsearch with your Java applications. More info on using this driver is available here.
Combining with Query DSL
In some cases where SQL syntax simply doesn’t cut it, you can combine Elasticsearch Query DSL in your SQL queries by adding additional parameters.
For example, use the filter parameter to narrow down results:
GET _xpack/sql?format=txt { "query": "SELECT avg(system.process.memory.size), system.process.name FROM metricbeat* system.process.name GROUP BY system.process.name", "filter": { "term": { "system.process.name" : "java" } }, "fetch_size":5 }
Resulting in:
AVG(system.process.memory.size)|system.process.name -------------------------------+------------------- 3.315269631096404E9 |java
Translate API
Another useful capability supported in the SQL support is the ability to convert the statement into Elasticsearch Query DSL:
GET _xpack/sql/translate { "query": "SELECT avg(system.process.memory.rss.bytes) FROM metricbeat* GROUP BY system.process.name" }
Returns:
{ "size": 0, "_source": false, "stored_fields": "_none_", "aggregations": { "groupby": { "composite": { "size": 1000, "sources": [ { "51711": { "terms": { "field": "system.process.name", "order": "asc" } } } ] }, "aggregations": { "51878": { "avg": { "field": "system.process.memory.rss.bytes" } } } } } }
Summing it up
So no, executing a JOIN statement is still not supported and probably will never be (due to the underlying data model in Elasticsearch), but the SQL support in Elasticsearch 6.3 is a big move in making Elasticsearch much more accessible to beginners and users coming from the world of relational databases.
As mentioned above, there are a limited number of SQL statements supported at this moment in Elasticsearch and there are some limitations to combining some of the SQL statements together, but hey — this is just a beta release. The multiple execution options, as well as the supported statements, make a strong case for checking this feature out.
Get started for free
Completely free for 14 days, no strings attached.