How to Install the ELK Stack on Azure
The ELK Stack (Elasticsearch, Logstash & Kibana) offers Azure users with all the key ingredients required for monitoring their applications — Elasticsearch for scalable and centralized data storage, Logstash for aggregation and processing, Kibana for visualization and analysis, and Beats for collection of different types of data and forwarding it into the stack.
The ELK Stack can be deployed in a variety of ways and in different environments and we’ve covered a large amount of these scenarios in previous articles in this blog. As mentioned, this article covers what is an increasingly popular workflow — installing the ELK stack on Azure. We’ll start with setting up our Azure VM and then go through the steps of installing Elasticsearch, Logstash, Kibana and Metricbeat to set up an initial data pipeline.
A few things to note about ELK
Before we get started, it’s important to note two things. First, while the ELK Stack leveraged the open source community to grow into the most popular centralized logging platform in the world, Elastic decided to close source Elasticsearch and Kibana in early 2021. To replace the ELK Stack as a de facto open source logging tool, AWS launched OpenSearch and OpenSearch Dashboards as a replacement.
Second, while getting started with ELK is relatively easy, it can be difficult to manage at scale as your cloud workloads and log data volumes grow – plus your logs will be siloed from your metric and trace data.
To get around this, Logz.io manages and enhances OpenSearch and OpenSearch Dashboards at any scale – providing a zero-maintenance logging experience with added features like alerting, anomaly detection, and RBAC. If you don’t want to manage ELK on your own, check out Logz.io Log Management.
Alas, this article is about setting up ELK, so let’s get started.
The Azure environment
Our first and initial step is to set up the Azure environment. In the case of this tutorial, this includes an Ubuntu 18.04 VM, with a Network Security Group configured to allow incoming traffic to Elasticsearch and Kibana from the outside.
We’ll start by creating a new resource group called ‘elk’:
In this resource group, I’m going to deploy a newUbuntu 18.04 VM:
Please note that when setting up the VM for testing or development purposes, you can make do with the default settings provided here. But for handling real production workloads you will want to configure memory and disk size more carefully.
Once your VM is created, open the Network Security Group created with the VM and add inbound rules for allowing ssh access via port 22 and TCP traffic to ports 9200 and 5601 for Elasticsearch and Kibana:
Now that our Azure environment is ready, we can now proceed with installing the ELK Stack on our newly created VM.
Installing Elasticsearch
To install Elasticsearch we will be using DEB packages.
First, you need to add Elastic’s signing key so that the downloaded package can be verified (skip this step if you’ve already installed packages from Elastic):
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
For Debian, we need to then install the apt-transport-https package:
sudo apt-get update sudo apt-get install apt-transport-https
The next step is to add the repository definition to your system:
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
All that’s left to do is to update your repositories and install Elasticsearch:
sudo apt-get update && sudo apt-get install elasticsearch
Elasticsearch configurations are done using a configuration file (On Linux: /etc/elasticsearch/elasticsearch.yml)that allows you to configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.
sudo su vim /etc/elasticsearch/elasticsearch.yml
Since we are installing Elasticsearch on Azure, we will bind Elasticsearch to localhost. Also, we need to define the private IP of our Azure VM (you can get this IP from the VMs page in the console) as a master-eligible node:
network.host: "localhost" http.port:9200 cluster.initial_master_nodes: ["<PrivateIP"]
Save the file and run Elasticsearch with:
sudo service elasticsearch start
To confirm that everything is working as expected, point curl or your browser to http://localhost:9200, and you should see something like the following output (give Elasticsearch a minute to run):
{ "name" : "elk", "cluster_name" : "elasticsearch", "cluster_uuid" : "nuDp3c-JSU6tW-kqqPW68A", "version" : { "number" : "7.0.1", "build_flavor" : "default", "build_type" : "deb", "build_hash" : "e4efcb5", "build_date" : "2019-04-29T12:56:03.145736Z", "build_snapshot" : false, "lucene_version" : "8.0.0", "minimum_wire_compatibility_version" : "6.7.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
Installing an Elasticsearch cluster requires a different type of setup. Read our Elasticsearch Cluster tutorial for more information on that.
Installing Logstash
Logstash requires Java 8 or Java 11 to run so we will start the process of setting up Logstash with:
sudo apt-get install default-jre
Verify Java is installed:
java -version openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
Since we already defined the repository in the system, all we have to do to install Logstash is run:
sudo apt-get install logstash
Before you run Logstash, you will need to configure a data pipeline. We will get back to that once we’ve installed and started Kibana.
Installing Kibana
As before, we will use a simple apt command to install Kibana:
sudo apt-get install kibana
Open up the Kibana configuration file at: /etc/kibana/kibana.yml, and make sure you have the following configurations defined:
server.port: 5601 elasticsearch.url: "http://localhost:9200"
These specific configurations tell Kibana which Elasticsearch to connect to and which port to use.
Now, start Kibana with:
sudo service kibana start
Open up Kibana in your browser with: http://localhost:5601. You will be presented with the Kibana home page.
Installing Beats
The various data shippers belonging to the Beats family can be installed in exactly the same way as we installed the other components.
As an example, let’s install Metricbeat to send some host metrics from our Azure VM:
sudo apt-get install metricbeat
To start Metricbeat, enter:
sudo service metricbeat start
Metricbeat will begin monitoring your server and create an Elasticsearch index which you can define in Kibana. Curl Elasticsearch to make sure:
curl 'localhost:9200/_cat/indices?v' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open metricbeat-7.0.1-2019.05.12-000001 ULSQBIigQqK2zCUL3G-4mQ 1 1 6523 0 3.8mb 3.8mb green open .kibana_1 tlx3dbPySR2CBQM58XTYng 1 0 3 0 12.2kb 12.2kb
To begin analyzing these metrics, open up the Management → Kibana → Index Patterns page in Kibana. You’ll see the newly created ‘metricbeat-*’ index already displayed:
All you have to do now is enter the index pattern, select the @timestamp field and hit the Create index pattern button. Moving over to the Discover page, you’ll see your Azure VM host metrics displayed:
Congratulations! You have set up your first ELK data pipeline! More information on using the different beats is available on our blog: Filebeat, Metricbeat, Winlogbeat, Auditbeat.
ELK on Azure…at scale
Azure offers users the flexibility and scalability they require to install and run the ELK Stack at scale. As seen in the instructions above, setting up ELK on Azure VMs is pretty simple. Of course, things get more complicated in full-blown production deployments of the stack.
Data pipelines built for collecting, processing, storing and analyzing production environments require additional architectural components and more advanced configuration. For example, a buffering layer (e.g. Kafka) should be put in place before Logstash to ensure a data flow that is resilient and that can stand the pressure of data growth and bursts. Replicating across availability zones is recommended to ensure high availability. And the list goes on.
This requires time and resources that not every organization can afford to spend. Logz.io offers Azure users with a fully managed ELK solution for monitoring their applications, including seamless integration with Azure and built-in dashboards for various Azure resources such as Active Directory, Application Gateway, Activity Logs and more.