TL;DR: How to prevent Elasticsearch server data breach:
As engineers, you and I have a responsibility to protect both our customers’ and our respective companies’ data. After all, how would you feel if you suddenly found out your personal email, social media accounts, bank information and other private details were suddenly leaked online? I can say for myself, I’d feel shocked, betrayed, and extremely uneasy. And as a result, I’d lose trust in the vendor who had failed to protect my data.
Major Data Leaks In The News
That’s the reality that customers at Decathlon, Microsoft, and others have found themselves in recently.
Six months ago, 1.2 billion users woke up one day to find their private data was exposed via an open Elasticsearch server from People Data Labs (PDL) and OxyData.Io (OXY). In February, Decathlon leaked data on 123 million customers. The reason? Unsecured ES servers.
Even large corporations like Microsoft are not immune. Back in December 2019, an unprotected Elasticsearch database exposed customer info such as email addresses, IPs and customer support details.
And what would you say about a Dow Jones Watchlist leak containing 2.5 million records of senior Politically Exposed Persons with their relatives, close associates, and the companies they are linked to?
So, what can you do to make sure this nightmare doesn’t happen to you? I thought it’d be a good idea to refresh some best practices for protecting our Elasticsearch clusters from common vulnerabilities.
5 Steps For Securing Your Elasticsearch Cluster
1. Don’t Connect Elasticsearch to the Internet
Simply put, the internet is full of malware and malicious actors looking to expose your data. That’s why the default settings on Elasticsearch binds the nodes to localhost. Use the “network.host” on the Elasticsearch YAML configuration file to bind nodes to either a private IP or secure public IP.
It is good practice to set up separate security groups—one for internal communications (for the master and data) and a separate one for the client for external communications. You may also want to consider putting a proxy in front of your client.
Finally, make sure to disable HTTP where it’s not needed.
2. Encrypt your Data at Rest
Hackers are becoming more and more savvy and determined to collect data. As a result, we know that even if we follow all the steps described in this post, it’s still possible for them to breach Elasticsearch. Encryption will safeguard any data that might end up in attackers’ hands. I suggest using utilities such as dm-crypt, and strong encryption (no less than 256-bit keys guys!) to cover your bases.
3. Authenticate Users in Elasticsearch
For obvious reasons, only those working for your organization should be able to access Elasticsearch. So make sure to follow a clear RBAC (role-based access control) policy for roles, permissions and API tokens. In addition, Elasticsearch enables you to authenticate users in a variety of ways including Native user authentication, Active Directory user authentication, File-based user authentication, LDAP user authentication, PKI user authentication, SAML authentication, and Kerberos authentication. There’s also the new open source kid on the block, OPA (Open Policy Agent), which looks really promising and has big names to vouch for it such as Netflix. If none of these options work, you can also build your own integration.
4. Upgrade to the Latest Version of Elasticsearch
As time goes on, Elastic upgrades Elasticsearch to get rid of both bugs and vulnerabilities. So if you fail to upgrade, you may be exposing yourself to vulnerabilities that have already been taken care of. While sometimes it may feel like a bit of a nuisance, take a few moments to upgrade Elasticsearch (at the time of this writing, 7.5 was the last stable release) and you can rest assured that your system will be less susceptible as a result.
5. Backup Your Data
I gave you lots of tips to ensure your Elasticsearch clusters don’t get compromised. But none of them are foolproof. So make sure you backup your data so you can easily bounce back if breached. I suggest using the Snapshot API to backup your data on Amazon S3 buckets. There are a variety of other ways to backup Elasticsearch data that can be found here.
Final Thoughts on Elasticsearch Security
Elasticsearch is an extremely valuable and easy-to-use open source search engine. The number of breaches we’re seeing is largely not a result of vulnerabilities inherent to ELK, but rather, due to the widespread popularity of Elasticsearch and the fact that many users fail to follow security best practices.
While following these tips will help you protect your Elasticsearch clusters and ensure professional security posture for your production environment, many organizations opt to use a fully managed service to offload some of the headache. This way, many of these security best practices will be ensured without wasting your company’s time or resources.
Logz.io provides a fully secure solution built on ELK that is PCI, GDPR, HIPAA, and SOC 2 compliant. This way, you get all the benefits of using open source, with much less hassle.
Have you ever had your Elasticsearch clusters exposed? If so, how did you handle it? I’d love to hear your suggestions for protecting Elasticsearch clusters, so if you have any recommendations to contribute to the community, please share in the comments below.
Thanks and stay safe!