We at Logz.io provide the ELK Stack as an end-to-end service on the cloud, so we are always committed to providing the latest and greatest version of the stack.
As soon as Elasticsearch 5 was released back in October, we pushed the upgrading of our existing Elasticsearch 2 clusters to the top of our priority list. In this post, I’ll outline how we performed the upgrade, together with some tips that we learned the hard way.
Why Upgrade in the First Place?
There is a long list of new and improved features in Elasticsearch 5 (see our post on the entire ELK Stack 5.0 as well as Kibana 5 in particular), but the main reason we wanted to perform the upgrade as soon as possible was for the new version’s easier management and maintenance of large-scale clusters.
Two examples of the improvements:
- A rollover index. This new feature allows you to define the conditions during which the index will automatically perform a rollover — such as the numbers and ages of documents
- A way to limit the total number of mapping fields. A new dynamic index setting allows you to restrict the amount of mapping fields in a cluster instead of having to use an internal account management service
Things to Consider Before Upgrading
New and improved features notwithstanding, there are some issues that we recommend fully understanding before you decide to go ahead with the upgrade process (see Elasticsearch’s documentation for a full list of breaking changes).
- There is no rollback. You cannot revert from Elasticsearch 5.x back to version 2.x. So, you should backup your data using snapshots or any other solution before you upgrade to Elasticsearch 5. Also, we recommend that you create a testing cluster to validate that your system continues to work as expected after the upgrade.
- Marvel. If you are using the Marvel plugin to monitor your ELK Stack, you should know that it doesn’t exist anymore. In version 5, Marvel was merged into Elastic’s X-Pack, and most of its features are not included in the free Basic subscription. Marvel’s replacement is within the X-Pack’s Monitoring cluster of features.
- SDKs. Not all of Elasticsearch’s SDKs are compatible with Elasticsearch 5. This is a major disadvantage right now, but it will probably be resolved soon.
- Open-source plugins. Elasticsearch 5 is fairly new, therefore not all the open source plugins are compatible with the latest version. You should check each of them before you decide to upgrade.
How We Upgraded Our Environment with Minimum Downtime
Elastic’s documentation details upgrade instructions here, but our setup required a more specific and customized approach.
Our infrastructure is based entirely on Amazon Web Services and is managed using a set of tools including Puppet with the ec2tagfacts and the puppet-elasticsearch modules. To orchestrate the upgrade, we wrote an Ansible playbook to modify the instance’s AWS tags and then perform the full cluster restart and run Puppet again.
This process involves a certain downtime, so it goes without saying that we tested it internally and extensively before finally executing the upgrade at a time that would guarantee the least disruption.
It’s worth mentioning that we also developed some capabilities in our infrastructure that would help to mitigate downtime for most uses of our log analytics platform, and we will share them in the future.
The order in which we performed the upgrade was as follows:
- Master nodes. To allow all the other nodes to connect to the cluster, you should start the master nodes first.
- Coordinating nodes. The coordinating nodes are used to coordinate the work for the data nodes, therefore you should start them before the data nodes.
- Data nodes. The data nodes are those that actually hold the data, so you should leave them for last — if something did not work in the previous steps, you can skip the upgrade and create new master and coordinating nodes in version 2. Elasticsearch data node upgrades are dependent on the prior two upgrade steps to work. Another advantage is that most of the time spent on recovering the cluster will be spent on the data nodes.
Learn From Our Mistakes
Experience is everything, so here are two tips that we’re happy to share based on what we have learned:
- Remove older Java versions to make sure that you are using the latest version as required without having to change the default Java in the instance.
- One of the crucial changes in Elasticsearch 5 is that index default settings cannot be configured in the elasticsearch.yml configuration file. It’s crucial to apply all index settings in an order 0 template, but because the template is applied only on the indices that will be created in the future, it is important to apply the same settings to all of the old indices (using the /_settings endpoint) as well. And don’t forget about the new index settings that were introduced in version 5!
The downside of using the latest technology is that you are more prone to encounter bugs. But with the help of the open source community — of which we are a proud member — we have managed to overcome most of the obstacles.
We expected the Elasticsearch upgrade process for our entire production environment to take much more time than it actually did, and we can attribute that to good preparation and the talented team of people who executed the process.
A final tip: If you do decide to upgrade to Elasticsearch 5, first run it for a while in a non-production environment. This will verify that the deprecated features and breaking changes in the new version will not take down your operation post-upgrade.
Logz.io is an AI-powered log analysis platform that offers the open source ELK Stack as a cloud service with machine learning technology and can be used for log analysis, IT infrastructure and application monitoring, business intelligence, and more. Start your free trial today!