Editor’s note: This authoritative guide to the ELK Stack from Logz.io will be continuously updated whenever new information is available. Be sure to visit periodically to read the latest version. You can also start a free trial of the Logz.io predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for functions including log analysis, application and infrastructure monitoring, security and compliance, and business intelligence.

Table of Contents

  1. What is the ELK Stack?
  2. An Elasticsearch Tutorial
  3. Elasticsearch Cluster Setup & Upgrading
  4. A Logstash Tutorial
  5. A Kibana Tutorial
  6. Creating Customized Kibana Visualizations
  7. The Lessons to Learn from Elasticsearch Cluster Disconnects
  8. How to Avoid and Fix the Top 5 Elasticsearch Mistakes
  9. 5 Logstash Pitfalls to Avoid
  10. A Comparison of Fluentd and Logstash
  11. A Guide to Logstash Plugins
  12. How to Deploy the ELK Stack in Production
  13. How to Install the ELK Stack on AWS: A Step-By-Step Guide
  14. Troubleshooting 5 Common ELK Glitches
  15. Using the ELK Stack for NGINX or IIS Log Analysis
  16. How to Use the ELK Stack to Monitor Performance
  17. Conclusion: The Logz.io Free ELK Apps Library
  18. Appendix: Our Additional ELK Stack Resources

What is the ELK Stack?

By Asaf Yigal

what is elk

The ELK Stack is downloaded 500,000 times every month, making it the world’s most popular log management platform. In contrast, Splunk — the historical leader in the space — self-reports 10,000 customers total.

But what exactly is ELK, and why is the software stack seeing such widespread interest and adoption (as seen in this Google Trends report)? Let’s take a deeper dive.

What is the ELK Stack?

The ELK Stack is a collection of three open-source products — Elasticsearch, Logstash, and Kibana — from Elastic. Elasticsearch is a NoSQL database that is based on the Lucene search engine. Logstash is a log pipeline tool that accepts inputs from various sources, executes different transformations, and exports the data to various targets. Kibana is a visualization layer that works on top of Elasticsearch.

Together, these three different open source products are most commonly used in log analysis in IT environments (though there are many more use cases for the ELK Stack starting including business intelligence, security and compliance, and web analytics). Logstash collects and parses logs, and then Elasticsearch indexes and stores the information. Kibana then presents the data in visualizations that provide actionable insights into one’s environment.

Why is ELK So Popular?

what is elk stack

Google Trends screenshot

The ELK Stack is popular because it fulfills a need in the log analytics space. Splunk’s enterprise software has long been the market leader, but its numerous functionalities are increasingly not worth the expensive price — especially for smaller companies such as SasS products and tech startups.

For that reason, Splunk has the aforementioned small number of customers while ELK is downloaded more times in a single month than Splunk’s total customer count — and many times over at that. ELK might not have all of the features of Splunk, but it does not need those analytical bells and whistles. ELK is a simple but robust log analysis platform that costs a fraction of the price.

The bigger picture: IT organizations are increasingly favoring open source products in general, and this is why newer proprietary log analysis software platforms such as Sumo Logic, which self-reports only 700 customers, might have a hard time gaining traction today.

After all, how do Netflix, Facebook, Microsoft, LinkedIn, and Cisco monitor their logs? With ELK.

Why is Log Analysis Becoming More Important?

As more and more IT infrastructures move to public clouds such as Amazon Web Services and Microsoft Azure, public cloud security tools and log analytics platforms are both becoming more and more critical.

In cloud-based infrastructures, performance isolation is extremely difficult to reach — particularly whenever systems are heavily loaded. The performance of virtual machines in the cloud can greatly fluctuate based on the specific loads, infrastructure servers, environments, and number of active users. As a result, reliability and node failures can become significant problems.

Log management platforms can monitor all of these infrastructure issues as well as process operating system logs, NGINX and IIS server logs for technical SEO and web traffic analysis, application logs, and ELB and S3 logs on AWS.

In all of these contexts, DevOps engineers, system administrators, site reliability engineers, and developers can all use logs to make better decisions that are data-informed (and not, as Facebook’s Adam Mosseri says, data-driven). After all, what is being called “big data analytics” is increasingly important for a number of reasons — particularly when it comes to the cloud.

“The future state of Big Data will be a hybrid of on-premises and cloud,” Forrester Research analyst Brian Hopkins told Computer World. Here are just a few examples from that analysis:

  • Hadoop, a framework for processing extremely large data sets, now works in the cloud and not only on physical machines
  • Intuit has moved towards cloud analytics because it needs a “secure, stable, and auditable environment”
  • Cheaper computational power is allowing engineers to create machine-learning algorithms that can perform predictive analytics in the cloud

How to Use the ELK Stack for Log Analysis

As I mentioned above, the ELK Stack is most commonly used in log analysis. However, the implementation and maintenance of a production-grade ELK Stack requires a lot of additional work and many additional products. More information on installing and deploying ELK are provided later in this guide.

What Does Logz.io Provide?

Logz.io takes ELK and provides the stack as a cloud-based service that DevOps engineers, system administrators, site reliability engineers, and developers can use to centralize log data, monitor infrastructure, troubleshoot problems, obtain business intelligence, and improve overall user experience.

We offer a method to use ELK as a service in a way that is highly available, infinitely scalable, and takes only five minutes to set up. Plus, we have added our own enterprise features including alerts, ELK-centric applications, and multi-user support with defined levels of role-based access.


An Elasticsearch Tutorial

By Jurgens du Toit

elasticsearch tutorial

Elasticsearch is often described as a search server. That might be confusing because we usually think of search as something that we do, not something that needs to be served. However, the reality is that search can be quite complex, and search servers have been developed in response to that fact.

Described in more familiar terms, Elasticsearch is a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it. Unlike most NoSQL databases, though, Elasticsearch has a strong focus on search capabilities and features — so much so, in fact, that the easiest way to get data from Elasticsearch is to search for it using the REST API.

How to Install Elasticsearch

The requirements for Elasticsearch are simple: Java 7. Take a look at my Logstash tutorial in this guide to ensure that you are set. Also, make sure that your operating system is on the Elastic support matrix, otherwise you might run up against strange and unpredictable issues. Once that is done, you can start with installing Elasticsearch.

Elasticsearch can be downloaded as a standalone distribution or installed using the apt and yum repositories. To keep things simple, let’s just download the distribution because it works for all operating systems. Be sure, though, to rethink this before you go into production:

On Linux and other Unix-based systems, you can now run bin/elasticsearch — or on Windows, bin/elasticsearch.bat — to get it up and running. And that’s it! To confirm that everything is working fine, point curl or your browser to http://127.0.0.1:9200, and you should see something like the following output:

Creating an Index in Elasticsearch

Adding data to Elasticsearch is called “indexing.” This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.

Elasticsearch behaves like a REST API, so you can use either the POST or the PUT method to add data to it. You use PUT when you know the or want to specify the ID of the data item, or POST if you want Elasticsearch to generate an ID for the data item:

The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with most other NoSQL databases, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define mappings for data types. More on this later.

If you are not comfortable with curl, look into the unofficial Sense Chrome plugin or the Sense Kibana app. You can also use log shippers like Beats and data pipelines like Logstash to automate the data-ingress process.

Elasticsearch Query: Getting Information Out

Once you have your data indexed into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item. Once again, because Elasticsearch is a REST API, we use GET:

The fields starting with an underscore are all meta fields of the result. The _source object is the original document that was indexed.

We also use GET to do searches by calling the _search endpoint:

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

  • took: The time in milliseconds the search took
  • timed_out: If the search timed out
  • shards: The number of Lucene shards searched, and their success and failure rates
  • hits: The actual results, along with meta information for the results

The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, all of the fields of all the documents are searched for that word. You can build more specific searches by using Lucene queries:

  • username:johnb – Looks for documents where the username field is equal to “johnb”
  • john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
  • john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”

There are many other ways to search including the use of boolean logic, the boosting of terms, the use of fuzzy and proximity searches, and the use of regular expressions.

What is even more awesome is that URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Some of the options include geo queries, “more like this” queries, and scripted queries.

The DSL also makes a distinction between a filtering and a query context for query clauses. Clauses used as filters test documents in a boolean fashion: Does the document match the filter, “yes” or “no”? Filters are also generally faster than queries, but queries, can also calculates a score based on how closely a document matches the query. This is used to determine the ordering and inclusion of documents:

Removing Elasticsearch Data

Deleting items from Elasticsearch is just as easy as entering data into Elasticsearch. The HTTP method to use this time is — surprise, surprise! — DELETE:

As with retrieving data, you don’t need to know the ID of the item you’re deleting. You can delete items by specifying a query:

What’s Next?

We have touched just the basics of CRUD operations in Elasticsearch. Elasticsearch is a search server, so it is not surprising that there is an immense depth to its search features. Since the release of Elasticsearch 2.0, there is also a wealth of available analytical tools.

What Does Logz.io Provide?

Elasticsearch as a service is included in the Logz.io complete ELK cloud-based service.


Elasticsearch Cluster Setup & Upgrading

By Noni Peri

elasticsearch cluster setup update

Setting up and then updating Elasticsearch clusters is a sensitive and error-prone process. As a company that provides ELK (Elasticsearch, Logstash, and Kibana) as a service, we know a thing or two about the process because we invest a lot of engineering time to make sure that our Elasticsearch version and plugins are always up to date.

So, to help those who are maintaining their own Elasticsearch installations, I wanted to provide a quick checklist of tips that DevOps engineers and system administrators need to remember when they upgrade Elasticsearch clusters. In a nutshell, it comes down to a three-pronged process: Read, Plan, and Test.

Reading Before Your Update

It’s crucial to read before you start to plan. First, look at the Elasticsearch documentation relevant for upgrades — it’s pretty straightforward. But remember this rule-of-thumb: Minor version changes (from 2.X to 2.Y) support rolling upgrades (one node at a time), but major version updates (from 1.X to 2.X) require full cluster restarts.

Second, one should always consult the breaking changes pages (see the right-hand sidebar) that are relevant to every version that is being upgraded. Elastic provides the basic steps for rolling upgrades as well as full cluster upgrades in their documentation. Here, I will walk you through some of the finer points to consider when upgrading your cluster and try to help you to avoid some of the mistakes that we have learned from the hard way.

Planning Your Cluster Update

The first thing to do when planning your Elasticsearch cluster upgrade is to prepare a cluster inventory. Use a top-to-bottom approach and answer the following questions:

  • What types of nodes are you using (master, data, or client)?
  • How many of each type are there?
  • What plugins and which versions are installed on each node?
  • Does your master election configuration (minimum master nodes) make sense?
  • Do you have enough disk space on your data nodes in case one of them should have problems?

Answer these questions to acquaint yourself with your cluster. Make sure that you have the best possible knowledge of the starting point of the cluster before the planned upgrade. The information gathered during this inventory is critical in moving forward with the planning.

The next step is to formulate a high-level plan for the upgrade. The first step in this plan is to create data snapshots of your entire cluster (if possible). Depending on your use case, you might be able to take a snapshot of only part of your data, or you might have to take a snapshot of part of the data in advance and then another immediately before the upgrade. Use your best judgement, but be sure to safeguard your data.

Make sure that your target version files or packages are accessible from your cluster, and do not forget to update all of your plugins to their latest versions! You do not want to have a node down and be unable to start it while you scramble to find a version of some plugin at the last minute. (For help, I would refer to Elastic’s upgrade guide.)

Think carefully about the order in which you will upgrade your nodes. We have been bitten in the past by the side-effects of not creating an upgrade-order process carefully. We have upgraded client nodes before data nodes and ended up with very risky “hotspot” data nodes that we being heavily loaded by the client nodes while the upgrade was in progress because there were not enough upgraded data nodes to share the load.

We recommend an upgrade order of master, data, and then client nodes, but use your own judgement and do a test beforehand. (Depending on the version from which you are migrating, you will need to aware of these specific changes.) Another useful upgrade pattern that we have been using is to create a new cluster and migrate the data to the new cluster instead of upgrading existing nodes.

Testing Your Cluster Upgrade

elasticsearch marvel software

Staging is where you want to find any problems — not production. It is also where you can streamline your process in terms of risk management, time, and cost.

I recommend setting up a staging cluster for the purpose of testing the upgrade procedure. Any node type that you have in your target cluster should be represented in your testing cluster, but the quantity can be lower. In most cases, there is no need for more than three master nodes, two client nodes, and a few data nodes (depending on your replication scheme). If there are multi-purpose nodes in your target cluster, they should be represented in testing as well.

Use the snapshots created from your target cluster to load data into your testing cluster. If your target cluster has a high load, you should also simulate this load against your testing cluster while practicing your upgrade procedure.

Make sure your testing cluster (and your plugins!) is the same version as your target cluster, and try to use machines with performance characteristics as similar to your target cluster as possible. (Tip: If you are upgrading from any version above 1.6, use of the synced-flush feature to place markers on your shards and significantly speed up your whole upgrade process.)

While you are testing your upgrade, make sure to monitor your cluster. Use the common measuring and monitoring tools with which you are comfortable, and try to mimic your end-users’ client connections and use-cases to measure the effect of the various upgrade steps on their perceived experience.

Here are some starting points of where to look:

  • Elasticsearch logs. Make sure to follow up on any messages that are out of the ordinary in your Elasticsearch log files as your processes come up after restart with the new version, and as the upgrade progresses
  • Marvel. Make sure you look at CPU usage, load, disk usage, shard relocation, and JVM Heap usage
  • Network usage

You might also want to freshen up on your manual shard relocation technique — and possibly automate it. We have found it to be sometimes useful in some edge cases. In addition, Be sure to look out for reports of timeouts, excessive load (of any kind), or any stress signals your cluster may be exhibiting because these tend to be precursors to Elasticsearch node disconnects.

What Does Logz.io Provide?

Logz.io sets up and upgrades our own Elasticsearch clusters so you do not need to worry about it. It’s all included in our complete ELK cloud-based service.


A Logstash Tutorial

By Jurgens du Toit

logstash tutorial

A great use for the ELK Stack is the storing, visualization, and analysis of logs and other time-series data. Logstash is an integral part of the data workflow from the source to Elasticsearch and further. Not only does it allow you to pull data from a wide variety of sources, it also gives you the tools to filter, massage, and shape the data so that it’s easier to work with. This tutorial gives you a crash course in getting started with Logstash.

How to Install Logstash

The only requirement for installing Logstash is Java 7 or higher. Everything else you need, including JRuby, the language Logstash was written in, is included in the Logstash bundle. The easiest way to confirm if you have the correct version of Java installed is to run the following in your CLI:

It should print out something like the following:

The important part is:

As long as the number behind the first 1 is 7 or higher, you’re good to go.

Once you’ve established that you have a supported Java version, you have two choices when it comes to installing Logstash: You can either download the Logstash bundle and use that, or you can install Logstash using your OS’s package manager. The package manager is the recommended route because it makes upgrading and patching Logstash so much easier.

The following steps are specific to Ubuntu and other Debian based OSes. Check out Elastic’s Package Repositories page for information on other OSes.

Firstly, you need to add Elastic’s signing key so that the downloaded package can be verified. This can be skipped if you’ve installed packages from Elastic before:

The next step is to add the Logstash repository definition to your system. It’s best to keep the definitions in different files, making them easier to manage. In this case, we’re adding it to:

Here is the definition:

All that’s left to do is to update your repositories and install Logstash:

Since we added the Logstash 2.1 repository definition, we’ll now have installed Logstash 2.1 and will have access to all of the updates for that version.

How to Configure Logstash

Logstash Inputs

One of the things that makes Logstash great is its ability to source logs and events from various sources. As of version 2.1, there are 48 different inputs on the Logstash documentation page. That’s 48 different technologies, locations, and services from where you can pull events and manipulate them. These include monitoring systems like collectd, databases like Redis, services like Twitter, and various others such as File and RabbitMQ. By using these inputs, you can import data from multiple sources and manipulate them however you want — and eventually send them to other systems for storage or processing.

Inputs are the starting point of any configuration. If you do not define an input, Logstash will automatically create a stdin input. Since you can create multiple inputs, it’s important to type and tag them so that you can properly manipulate them in filters and outputs.

Logstash Outputs

As with the inputs, Logstash comes with a number of outputs that enable you to push your events to various locations, services, and technologies. You can store events using outputs such as File, CSV, and S3, convert them into messages with RabbitMQ and SQS, or send them to various services like HipChat, PagerDuty, or IRC. The number of combinations of inputs and outputs in Logstash makes it a really versatile event transformer.

Logstash events can come from multiple sources, so as with filters, it’s important to do checks on whether or not an event should be processed by a particular output. If you define no output, Logstash will automatically create a stdout output.

Logstash Filters

If Logstash were just a dumb pipe between a number of inputs and outputs, you could easily replace it with a service like IFTTT or Zapier. Luckily, it isn’t. It also comes with a number of very powerful filters with which you can manipulate, measure, and create events. It’s the power of these filters that makes Logstash a very versatile and valuable tool.

Logstash events can come from multiple sources, so as with outputs, it’s important to do checks on whether or not an event should be processed by a particular filter.

logstash logo

A Logstash Configuration Example

Logstash has a simple configuration DSL that enables you to specify inputs, outputs, and filters along with their specific options. Order matters, specifically around filters and outputs, as the configuration is basically converted into code and then executed. Keep this in mind when you’re writing your configs, and try to debug them.

Structure

Your configurations will generally have three sections: inputs, outputs and filters. You can have multiple instances of each of these instances, which means that you can group related plugins together in a config file instead of grouping them by type. My Logstash configs are generally structured as follows:

You’ll see that I have a configuration file for each of the functions or integrations that I’d like Logstash to perform. Each of those files will contain the necessary inputs, filters, and outputs to perform that function. Let’s look at the apache_to_elasticsearch.conf file, as it’s typical of what you’d see in a Logstash config file:

The input section tells Logstash to pull logs from the Apache access log and specify the type of those events as apache-access. Setting the type is important, as it will be used to selectively apply filters and outputs later on in the event’s lifetime. It’s also used to organize the events when it’s eventually pushed to Elasticsearch.

In the filter section, we specifically apply a grok filter to events that have the apache-access type. This conditional ensures that only the apache-access events get filtered. If it is not there, Logstash will attempt to apply the grok filter to events from other inputs as well. This filter parses the log string and populates the event with the relevant information from the Apache logs.

Lastly, we see the output section. The first conditional ensures, once again, that we only operate on the apache-access events. The next, nested, conditional sends all of the events that didn’t match our grok pattern to the null output. Since they didn’t conform to the specified pattern, we assume that they are log lines that contain information we’re not interested in and discard it. Since order is important in filters and outputs, this will ensure that only events that were successfully parsed will make it to the Elasticsearch output.

Each of the configuration files can contain these three sections. Logstash will typically combine all of our configuration files and consider it as one large config. Since you can have multiple inputs, it’s recommended that you tag your events or assign types to them so that it’s easy to identify them at a later stage. Also ensure that you wrap your filters and outputs that are specific to a category or type of event in a conditional, otherwise you might get some surprising results.

Working with Logstash Plugins

Since version 1.5, Logstash has relied on a plugin infrastructure to give it access to various inputs, filters, codecs, and outputs. Plugins are essentially Ruby gems and can be managed through Logstash’s plugin utility:

All the plugins that originally resided in the logstash-core codebase are installed by default on Logstash 1.5 and up. Plugins that are part of logstash-contrib or are outside of the logstash ecosystem should be installed:

This will add the plugin / gem to Logstash’s Gemfile and make it available to you. Updating and removing a plugin is just as easy:

Start Stashing!

The only thing left to do now is to get your hands dirty. This chapter guided you through installing Logstash, configuring it, and making sure that you have access to all the functionality that you need through the plugin ecosystem. Since Logstash is the first element of the ELK stack, you should now have a solid grounding in how to use it for log and time-series data analysis.

What Does Logz.io Provide?

Don’t want to install and run Logstash yourself? Logstash as a service is part of the Logz.io complete ELK cloud-based log management platform.


A Kibana Tutorial

By Asaf Yigal

kibana tutorial

Kibana is the visualization layer of the ELK Stack, which is the most popular log analytics platform and is built on top of Elasticsearch, Logstash, and Kibana. This tutorial will cover both the simple and advanced features of Kibana over a few parts. This first part will explain how to run searches in Kibana using the Lucene query syntax.

Loading Data in Kibana

For the purpose of this tutorial, we will use a sample 24-hour period of Apache data (which can be downloaded here) that is being renewed every day. (This means that if you import the data into your ELK Stack, it will present data for today. Then, you can download additional data at that same URL the following day.)

You can use your own ELK Stack to run this tutorial, but for the sake of simplicity, we will use our Logz.io ELK as a service in this example.

To upload your data, take these steps:

  1. If you do not already have a Logz.io account, open one here
  2. Download the sample file from http://logz.io/sample-data
  3. Upload the data using the file upload method found in the Log Shipping tab.* Note that this is a simple cURL command:
  4. It should take about a minute for the file to upload and be visible in the Kibana Discovery tab. If the data is not visible, try refreshing after a minute
  5. Open one of the log lines and click on the Refresh button to refresh the Kibana mapping:kibana tutorial screenshot
  6. That’s it! You’re all done, and you will now have some data in your Kibana

* The token can be found on the settings page and the type of the file is apache_access

Kibana Search Syntax

This section will detail some simple searches that one can perform.

Free-Text Search

Free text search works within all fields — including the _source field, which includes all the other fields. If no specific field is indicated in the search, the search will be done on all of the fields that are being analyzed.

Try to run the following searches in the Discovery search field and see what you get (and set the time parameter on the top right of the dashboard to the prior twelve hours to capture more data):

  • category
  • Category
  • categ
  • cat*
  • categ?ry
  • “category”
  • category\/health
  • “category/health”
  • Chrome
  • chorm*

There are a few things to notice here:

  1. Text searches are not case sensitive. This means that [category] and [CaTeGory] will return the same results. When you put the text within double quotes (“”), you are looking for an exact match, which means that the exact string must match what is inside the double quotes. This is why [category\/health] and [“category/health”] will return different results
  2. You can use the wildcard symbols [*] or [?] in searches. [*] means any number of characters, and [?] means only one character

Field-Level Searches

In Kibana, you can search for data inside specific fields. To do that, you need to use the following format:

:search

Run the following searches to see what you get (some will return no results):

  • geoip.country_name:Canada
  • name:chrome
  • name:Chrome
  • name:Chr*
  • response:200
  • bytes:65
  • bytes:[65 TO *]
  • bytes:[65 TO 99]
  • bytes:{65 TO 99}
  • _exists_:name

There are a few things to notice here:

  1. Field-level searches depend on the type of field. In the Logz.io Kibana visualization, all fields are not analyzed by default, which means that searches are case-sensitive and cannot use wildcard searches. The reason we save all of the fields as “not analyzed” is to save space in the index because the data is also duplicated in an analyzed field called _source
  2. You can search a range within a field. If you use brackets [], this means that the results are inclusive. If you use {}, this means that the results are exclusive
  3. Using the _exists_ prefix for a field will search the documents to see if the field exists
  4. When using a range, you need to follow a very strict format and use capital letters TO to specify the range

Logical Statements

You can use logical statements in searches in these ways:

  • USA AND Firefox
  • USA OR Firefox
  • (USA AND Firefox) OR Windows
  • -USA
  • !USA
  • +USA
  • NOT USA

There are a few things to understand here:

  1. You need to make sure that you use the proper format such as capital letters to define logical terms like AND or OR
  2. You can use parentheses to define complex, logical statements
  3. You can use -,! and NOT to define negative terms

Escaping special characters

All special characters need to be properly escaped. The following is a list of all available special characters:

+ – && || ! ( ) { } [ ] ^ ” ~ * ? : \

Advanced Searches

Proximity searches

Proximity searches are an advanced feature of Kibana that takes advantage of the Lucene query language.

Using a proximity search

  • [categovi~2] means a search for all the terms that are within two changes from [categovi]. (This means that all category will be matched)
  • Proximity searches use a lot of system resources and often trigger internal circuit breakers in Elasticsearch. If you try something such as [catefujt~10], it is likely not to return any results due to the amount of memory that us used to perform that specific search

Bonus! Build a Kibana Dashboard with One Click

In the next part of our Kibana tutorial, we will talk about how to take these searches to the next level and build visualizations. In the meantime, if you go to our ELK Apps library and search for Apache apps, you will find a pre-made dashboard that will give you all of the information that you need to monitor Apache log data. To use that dashboard, just click on the Install button and then the Open button.

What Does Logz.io Provide?

Don’t want to bother with setting up and managing Kibana on your own? Kibana as a service is part of the Logz.io ELK cloud-based log management platform.


Creating Customized Kibana Visualizations

By Gilly Barr

kibana visualizations

Kibana, being the ‘K’ in ‘ELK’, is the amazing visualization powerhouse of the ELK Stack.

We use the software to create nice dashboards that display metrics including page visits, server JVM performance, messages from our client-side application, and technical SEO data. Kibana is great at creating these visualizations with a useful plugin infrastructure that allows users to extend Kibana’s capabilities to create many different custom visualizations.

In this tutorial, I will demonstrate how we extended Kibana to add a “traffic light” visualization to the software’s features. This will be very similar to the “metric” visualization but designed to work as a traffic light. Green is for a metric that is “good”; red is for a metric that is “bad.” (The background: Our devops team had been looking for a simple and intuitive way to visualize “good versus bad,” so I decided to add a stoplight for our NOC team.)

kibana traffic light visualization

We have also added a visualization to our ELK Apps library that can leverage Nginx log data to show average response time as a traffic light visualization.

This example and more will reside in our public GitHub directory.

(Note: All code snippets here are for Kibana v.4.1.2. Doing the same for subsequent versions of Kibana is similar, but the project structure will have changed, so you might need to find the updated folders. For more information, you can see our prior guides on upgrading to Kibana 4 and then to 4.1.)

Preparation

To make the code easy to maintain, we are going to place (most of) it in a separate directory for our visualization. Go to Kibana’s ‘/plugins’ directory and add a directory named ‘traffic_light_vis’.

Registering our visualization

Kibana has created a module called a “registry” that is essentially a list of static arrays that hold various lists in Kibana for different settings such as which apps, plugins, and visualizations that Kibana has. For visualizations, Kibana holds a registry called ‘vis_types’ which defines which types of visualizations are available. (If you’re interested in how this works, you can see more detail at GitHub.)

So, for our visualization, we’re going to add an ‘index.js’ file in our new folder and our registration to the ‘vis_types’ list. The file should look like this:

This will look for our traffic light visualization in a file called ‘traffic_light_vis’, so let’s create that file now.

Defining our visualization

We’re going to create another file in the ‘traffic_light_vis’ library named ‘traffic_light_vis.js’ to define the properties of our visualization.

For the definition of the visualization, Kibana expects to get back a ‘VisType’ object. This object represents the visualization, and it’s worth explaining that Kibana works with two main types of visualizations:

  • Template visualizations. This is an object called ‘TemplateVisType’, which inherits directly from the ‘VisType’ object. Template visualizations are for visualizations that do not require special canvas rendering. They are good for visualizations such as metric visualizations or those for a data table. You define them with an Angular-based template file, and angular binds the data queried from Elasticsearch to your template.
    Our traffic light visualization will be drawn with basic HTML and CSS, so we’ll use this type of visualization for now.
  • VisLib visualizations. An object called ‘VisLibVisType’ also directly inherits from ‘VisType’ and uses D3 to render more complex visualizations on an HTML canvas. These are used for all of the other visualizations that Kibana supports such as pie charts, line charts, and histograms.

Here are links to Kibana’s GitHub repository for more information on these objects:

The definition of our visualization should look like this:

As you can see, the TemplateVisType constructor receives a JSON of the parameters of our visualization:

  • ‘name’ is for internal Kibana use
  • ‘title’, ‘icon’, and ‘description’ are for the visualization-creation wizard
  • ‘template’ is the HTML template that Kibana will use to render the visualization
  • ‘params’ is the list of parameters that can be configured by the user for this visualization
  • ‘schemas’ is a list of metric types that we’re allowing the user to choose for this visualization

Styling our visualization

In the definition of our visualization, we linked to a non-existent template file. We can create this file now. We called it ‘traffic_light_vis.html’, and it should look like this:

You can see that the metrics we receive are in the array ‘metrics’. (This comes from our visualization controller, which we will describe below.)

All of the defined parameters that this visualization can configure are injected to our template under the ‘vis.params’ object. You can see the CSS definitions in this GitHub repository.

The visualization editor

The ‘params’ object that we defined in our visualization object consists of parameters that we would like users to be able to configure. In the params section, we also referenced an HTML file under the ‘editor’ value that we will explain in detail here

When you are editing a visualization, you will see an HTML template on the left side of the screen. The parameters are shown in your template as ‘vis.params’, and whatever you bind here to the model will be saved in your Kibana visualization object.

The template file in our parameters editor looks like this:

The controller

The controller that we referenced earlier in the definition of our visualization definition is responsible for passing the response from the Elasticsearch query to our template for rendering.

In our case, I copied the controller logic from the metric visualization and removed the field formatting (since we want to deal with a clear number). It should look like this:

After following this process, you should have your custom Kibana visualization!

What Does Logz.io Provide?

If you would like, you can download this visualization and more in our ELK Apps library and add them to your Kibana. If you would also like to test a hosted ELK environment with more features and cooler visualizations, you can create a free trial account.


The Lessons to Learn from Elasticsearch Cluster Disconnects

By Tomer Levy

If you are running large Elasticsearch clusters on Amazon Web Services for log analysis or any other search function, you’re probably suffering from some form of Elasticsearch cluster issue.

elasticsearch disconnects

I know we used to. When we crossed the multi-billion-records-per-cluster threshold, we started seeing different kinds of scalability issues.

I wanted to discuss one of the first issues we had and then go through the process we went through to find the root cause of the problem. Many of us rely on Elasticsearch to run our businesses, and I’m sure making sure that it operates smoothly — and that is a top priority for many people.

The Initial Elasticsearch Alert

As a log analytics company, we intensively use our platform to monitor our internal clusters. A while back, we had noticed that one of our clusters has started to generate  “Node disconnection” events. It started with one node. Then another node, and it started to spread even further. It was not good, to say the least:

The Initial Research

We started running through a couple of steps, here are some of them:

  • Increased the level of resources that were being devoted to the machine.
  • Checked to see if the network interface or CPU were overloaded — or if perhaps if there were a bottleneck somewhere.
  • We figured it must be a networking issue, so we have increased cluster sync timeouts
  • Looked at all of the available data in our Docker-based environment.
  • Searched online including places such as StackOverflow and Elasticsearch user groups.
  • We even spoke to a few people directly at Elastic but had no luck.
  • At disconnection time, we could not find any anomaly in load or in any other parameter. It felt almost like a random, freak occurrence.

None of these steps really took us any closer to the resolution of the problem.

The Power of Correlation

As part of a different project, we had started to ship various OS-level log files such as kern.log to our log analysis platform.

That was actually where we were able to achieve a breakthrough. Finally, we saw that there was a clear correlation between Elasticsearch disconnections and a cryptic error log from the kern.log file. Here it is:

example-of-log-errors-found-1024x440

Obviously, we had no idea what that message meant and if it had anything to do with the Elasticsearch cluster issues.

So, we had spent some time researching this issue and found out that this event actually means that there is something wrong with the network interface in the machine itself — a packet is being dropped by the network device. Now, that made more sense and we were quickly able to come up with a solution to the problem by finding this thread on Launchpad:

ubuntu-bug-1024x501

This is a bug in Ubuntu’s network interfaces (which are the default Ubuntu AWS offer). To solve the problem, we disabled the Scatter / Gather ability in the network interface. To do the same, just add this to your AWS and Elasticsearch clusters:

The Big Picture

We were lucky to have solved the problem by correlating these issues together — but companies cannot and should not depend on luck. Bad luck is expensive. We had a difficult time finding the answer because, first, we did not aggregate all logs to a central location and, second, we did not know the questions that we should be asking.

When running a large environment, problems can originate from an issue in one of many different databases, a network disruption causing an application-level issue, a spike in CPU on the disk-level side, or something else in the operating system or other infrastructure.

It all boils down to a single issue:

How can people make sense of all of their log data?

The answer is fairly simple:

— Correlate all system events together to gain full visibility and then find the exact log entries that contain the answers

— Correlation is not enough, you need to understand what questions one should be asking about their data – if we had known in the first place that the “xen_netfront: xennet: skb rides the rocket: 19 slots” log message indicates a severe network interference, we would have saved a lot of time and effort.

What Does Logz.io Provide?

Don’t worry about any Elasticsearch cluster disconnects. Logz.io maintains our own Elasticsearch clusters so you do not have to think about it. It’s part of our complete ELK cloud-based service.


How to Avoid and Fix the Top 5 Elasticsearch Mistakes

By Asaf Yigal

elasticsearch-mistakes-1024x536

Elasticsearch is open-source software indexes and stores information in a NoSQL database that is based on the Lucene search engine — and it also happens to be one of the most popular indexing engines today. Elasticsearch is also part of the ELK Stack.

The software is used by growing startups such as DataDog as well as established enterprises such as The Guardian, StackOverflow, and GitHub, to make their infrastructures, products, and services more scalable.

Despite the increasing popularity of Elasticsearch, there are several common and critical mistakes that users tend to make while using the software. Let’s take a closer look at five of the mistakes and how you can avoid making them.

1. Not Defining Elasticsearch Mappings

Say that you start Elasticsearch, create an index, and feed it with JSON documents without incorporating schemas. Elasticsearch will then iterate over each indexed field of the JSON document, estimate its field, and create a respective mapping. While this may seem ideal, Elasticsearch mappings are not always accurate. If, for example, the wrong field type is chosen, then indexing errors will pop up.

To fix this issue, you should define mappings, especially in production-line environments. It’s a best practice to index a few documents, let Elasticsearch guess the field, and then grab the mapping it creates with GET /index_name/doc_type/_mapping. You can then take matters into your own hands and make any appropriate changes that you see fit without leaving anything up to chance.

For example, if you index your first document like this:

Elasticsearch will mark the “payload” field as “date.”

Now, suppose that your next document looks like this:

Here, “payload” isn’t actually a date, and an error message may pop up and the new index will not be saved because Elasticsearch has already marked it as “date.”

2. Combinatorial Explosions

Combinatorial explosions are computing problems that can cause an exponential growth in bucket generation for certain aggregations and can lead to uncontrolled memory usage. In some aggregations, there is not enough memory in the world to support their combinatorial explosions.

The Elasticsearch “terms” field builds buckets according to your data, but it cannot predict how many buckets will be created in advance. This can be problematic for parent aggregations that are made up of more than one child aggregation. Combining the unique values in each child aggregation may cause a vast increase in the number of buckets that are created.

Let’s look at an example.

Say that you have a data set that represents a sports team. If you want to look at specifically the top 10 players and supporting players on that team, the aggregation will look like this:

The aggregation will return a list of the top 10 players and a list of the top five supporting players for each top player — so that a total of 50 values will be returned. The created query will be able to consume a large amount of memory with minimal effort.

A terms aggregation can be visualized as a tree that uses buckets for every level. Therefore, a bucket for each top player in the players aggregation will make up the first level and a bucket for every supporting player in the other aggregation will make up the second level. Consequently, a single team will produce n² buckets. Imagine what would happen if you would have a dataset of 500 million documents.

Collection modes are used to help to control how child aggregations perform. The default collection mode of an aggregation is called depth-first and entails first the building of an entire tree and then trimming the edges. While depth-first is an appropriate collection mode for most aggregations, it would not work in the players aggregation example above. Therefore, Elasticsearch allows you to change collection modes in specific aggregations to something more appropriate.

Anomalies, such as the example above, should use the breadth-first collection mode, which builds and trims the tree one level at a time to control combinatorial explosions. This collection mode drastically helps to reduce the amount of memory that is consumed and keeps nodes stable:

3. Production Flags

By default, the first cluster that Elasticsearch starts is called elasticsearch. If you are unsure about how to change a configuration, it’s best to stick to the default configuration. However, it is a good practice to rename your production cluster to prevent unwanted nodes from joining your cluster.

Below is an example of how you might want to rename your cluster and nodes:

Recovery settings affect how nodes recover when clusters restart. Elasticsearch allows nodes that belong to the same cluster to join that cluster automatically whenever a recovery occurs. While some nodes within a cluster boot up quickly after recovery, however, others may take a bit longer at times (due to nodes receiving a restart command at different times, for example).

This difference in startup times can cause inconsistencies within the data that is meant to be evenly distributed among the nodes in the cluster. In particular, when large amounts of data are involved, rebalancing nodes after a restart can take quite a while — from several hours to a few days — and a lot out of your budget:

Additionally, it is important to configure the number of nodes that will be in each cluster as well as with the amount of time that it will take for them to boot up in Elasticsearch:

With the right configurations in place, a recovery that would have taken hours or days to complete can be finished in a matter of seconds. Additionally,minimum_master_nodes are very important for cluster stability. They help prevent split brains, which is the existence of two master nodes in a single cluster and can result in data loss.

The recommended value for this setting is (N/2) + 1 — where N is the number of master-eligible nodes. With that, if you have 10 regular nodes that can hold data and become masters, the value would be six. If you have three dedicated master nodes and 1,000 data nodes, the value would two (only counting the potential masters):

4. Capacity Provisioning

Provisioning can help to equip and optimize Elasticsearch for operational performance. It requires that Elasticsearch be designed in such a way that will keeps nodes up, stop memory from growing out of control, and prevent unexpected actions from shutting down nodes.

“How much space do I need?” is a question that users often ask themselves. Unfortunately, there is no set formula, but certain steps can be taken to assist with the planning of resources.

First, simulate your actual use-case. Boot up your nodes, fill them with real documents, and push them until the shard breaks. Booting up and testing nodes can quite easy with Amazon Web Services’ Elasticsearch offering (but it needs additional features to become a fully-functioning ELK Stack).

Still, be sure to keep in mind that the concept of “start big and scale down” can save you time and money when compared to the alternative of adding and configuring new nodes when your current amount is no longer enough. Once you define a shard’s capacity, you can easily apply it throughout your entire index. It is very important to understand resource utilization during the testing process because it allows you to reserve the proper amount of RAM for nodes, configure your JVM heap space, and optimize your overall testing process.

5. Oversized Template

Large templates are directly related to large mappings. In other words, if you create a large mapping for Elasticsearch, you will have issues with syncing it across your nodes, even if you apply them as an index template. The issues with big index templates are mainly practical — you might need to do a lot of manual work with the developer as the single point of failure — but they can also relate to Elasticsearch itself. Remember: You will always need to update your template when you make changes to your data model.

Is there a better solution? Yes, dynamic templates.

Dynamic templates automatically add field mappings based on your predefined mappings for specific types and names. However, you should always try to keep your templates small in size.

In Conclusion

Elasticsearch is a distributed full-text search and analytics engine that enables multiple tenants to search through their entire data sets, regardless of size, at unprecedented speeds. In addition to its full-text search capabilities, Elasticsearch doubles as an analytics system and distributed database. While these three capabilities are impressive on their own, Elasticsearch combines all of them to form a real-time search and analytics application that can keep up with customer needs.

What Does Logz.io Provide?

Elasticsearch is difficult to setup and maintain. Let Logz.io do it for you as part of our complete ELK cloud-based service.


How to Use AWS Elasticsearch for Log Management

By Tomer Levy

amazon web services

One of the latest new releases of Elasticsearch-based offerings has been AWS-hosted Elasticsearch. The news is undoubtedly a reflection of the fact that the ELK software stack — of which Elasticsearch is part — is increasingly being used by many organizations around the world.

So, for those who are looking to build an Elasticsearch or ELK cluster, I wanted to suggest ways to improve your Elasticsearch use based on your specific use-case:

  • The search engine use-case: Using Elasticsearch as part of an application stack as a powerful search engine. This is used most often in website searches on e-commerce sites or in similar situations.
  • The log analytics use-case: Using the full ELK Stack for log aggregation and search.

In the first use-case, AWS-hosted Elasticsearch makes a lot of sense. It provides you with direct Elasticsearch access that allows for specific configurations and customizations to fit your application’s exact needs. (Still, the one drawback is the legacy Elasticsearch version that Amazon Web Services currently supports — I hope that this will change in the future.)

Elasticsearch on its own, however, is not a log management solution. It is one of several components that are all needed to set up a log management solution.

At Logz.io, we have spent a lot of time working with customers who have tried to use AWS-hosted Elasticsearch. Together, we’ve compiled a list of tips and add-ons that will improve the Elasticsearch service for log analysis.

Screen-Shot-2015-11-30-at-15.23.07

For those who are looking for a log analytics solution and plan to use AWS-hosted Elasticsearch service, here are our fourteen recommendations:

1. Queuing. Install a queuing system such as Redis, RabbitMQ, or Kafka. This is imperative to include in any ELK reference architecture because Logstash might overutilize Elasticsearch, which will then slow down Logstash until the small internal queue bursts and data will be lost. In addition, without a queuing system it becomes almost impossible to upgrade the Elasticsearch cluster because there is no way to store data during critical cluster upgrades.

2. Logstash. The ELK Stack includes Logstash (the L from ELK), which reads data from the queuing system, parses the logs, and then creates JSON messages for Elasticsearch to index. Running Logstash in a consistent and scalable manner is not a small challenge. You’ll need to hook up at least one Logstash server to read data and push logs to from the queuing system to AWS Elasticsearch. Also, you need to make sure that you correctly configure memory consumption for Logstash and verify GROK patterns to correctly parse the data. (See more in our Logstash tutorial and more on the pitfalls to avoid.) Logstash also plays a role as a log shipper, see the point on log shipping below for more details.

3. Elasticsearch Scalability. This is easier said than done. The AWS offering is static and does not scale-out automatically. Naive scalability (with scaling groups) would not work here since when the load on Elasticsearch increases, so the last thing you want to do is add another node — it will increase the load significantly as Elasticsearch moves shards to the new nodes, often resulting in cluster failure. Unfortunately, there is no easy answer here. At Logz.io, we’ve invested years of developers’ time to solve this problem in our environment. What you’ll probably have to do is allocate additional resources, monitor clusters carefully, and manually increase capacity whenever you think “winter is coming.”

4. High Availability. If you’re running on top of AWS, you probably know that EC2 instances sometimes just stop working. It happens to us almost every day. If you want a system on which you can rely in production, you need to make sure the ELK implementation is highly available by setting up:

  • A highly-available queuing system, one that runs on two AZs with full replication
  • Logstash servers that read from the queuing system
  • An Elasticsearch cluster that will run across AZs and have three master nodes

5. Data Curation. You’ll need to implement a cron job that uses the Curator application to delete old indices (otherwise, you’re running with a sand-clock until Elasticsearch will crash). It is also recommended to optimize older indices to improve the performance of Elasticsearch — just be careful not to run these processes during load times because they are resource heavy!

6. Conflict Mapping. This is probably one of the more challenging items in this list. Mapping is like a ‘database schema’ in Elasticsearch lingo. When using Elasticsearch for log management, the software usually uses dynamic mapping that can build the schema “on the fly” as new log types appear. However, this mechanism is highly susceptible to any deviation from the initially-created mapping. When you have a deviation — and trust me, you’ll have plenty of them! — you’ll need to go back to your log data, grok parsing, or application code itself and modify it to avoid future conflicts. If you don’t do that, these events will load Elasticsearch and be dropped.

7. Security With Multi-User & Role-Based Access. As with any cloud system, you’ll want to collaborate with your team by assigning various users different access roles. That’s not possible with the open-source ELK stack. You can put an Nginx reverse proxy in front of a Kibana server, but that will give you only very simple access. Elastic offers Shield, a proprietary module that supports role-based access for Elasticsearch, but it is not supported in AWS-hosted Elasticsearch.

8. Dashboards and Visualizations. If you’re already a pro with ELK and have all of your data parsed and correctly visualized, you can skip this section. If you’re in the process of visualizing your data to understand it, you can use our ELK Apps library. You can download relevant dashboards and import them to your Kibana.

9. Log Shipping. This part actually happens within your datacenter or VPC. You’ll need to securely and effectively ship data to the queuing system. Logstash and Fluentd are good options (see our comparison of the two), and using rsyslog is also very common. For Windows, NxLog has a reasonably good reputation. Fine tuning each of these agents and correctly configuring them to ship data is imperative. You can have a look here for more information on the available shippers. If you’ll want to read logs from AWS Cloudtrail, ELB, S3, or other AWS repositories, you’ll need to implement a pull module (Logstash offers some) that can periodically go to S3 and pull data. AWS-hosted Elasticsearch does not offer out-of-the-box integration with these agents but you can read online and set them up independently.

10. Log Parsing. Parsing and correctly structuring logs is fundamental to the ability to search and visualize the data effectively. Even a small environment can include 20 or 30 different log types. Parsing logs can be done with Logstash, which transforms text lines to structured JSON objects. Manually building grok expressions to parse through log data is a tedious and error-prone process. This is also not offered by the AWS Elasticsearch service out-of-the-box. However, many grok expressions can be found online but careful compatibility testing with the specific version is required. If you search the Web, you will probably find some GitHub repo that will include grok data that other people have already developed.

11. Alerts. The open-source ELK Stack does not provide an alerting capability. Instead, you can develop a cron job that will automatically run queries and generate e-mails based on search results. A lot of bits and pieces need to be included to make it effective such as sliding window, statefulness, and more, but hard-refreshing dashboards and searches manually is more difficult. Elastic offers a premium service called Watcher, but it is not available on AWS-hosted Elasticsearch.

12. S3 Archiving. The ability to retain full access to logs for long periods of time such as one year — or even seven years — requires you to build a system that can archive all log events and then ingest them back when needed. You can either extend the cluster to support one year of log data or archive the events from Elasticsearch into a static repository such as S3 or Glacier. Data retention has a critical effect on the memory, CPU, and disk that are all used by Elasticsearch, so it is highly recommended that such mechanisms be developed to allow for longer periods of retention times.

13. ELK Monitoring. Monitoring each component of a stack is important in any critical system. Nagios has some plugins to monitor Elasticsearch, and you’ll need to make sure that you correctly monitor the queue size of your queuing system and the health of your Logstash and Kibana components.

14. Cost. Since auto-scaling is not supported in AWS-hosted Elasticsearch, what many people need to do is over-allocate resources — sometimes by as much as ten times the day-to-day normal usage — in order to create a sustainable system. Elasticsearch can get very expensive very quickly, especially as a cluster grows.

The AWS-hosted Elasticsearch is a great sign of the dominance of Elasticsearch and the ELK Stack, but as was noted in a post on The New Stack, it is far from being an out-of-the-box solution for log analysis.

What Does Logz.io Provide?

Don’t want to deal with the mess of turning AWS Elasticsearch into a log analytics platform? Use the Logz.io complete ELK cloud-based service.


5 Logstash Pitfalls That You Need to Avoid

By Tomer Levy

Logstash is very easy to start using out-of-the-box. You simply download it, run it and start working. While you don’t need to be an expert from the get-go, when you delve deeper into configurations, certain complexities may surface.

At Logz.io, our users use Logstash extensively. As a result of the great deal of time we’ve spent configuring and running Logstash, we wanted to explore and share the top five pitfalls that we’ve experienced, as well as some corresponding solutions and tips.

A Bit about Logstash

Logstash is a system that receives, processes and outputs logs in a structured format. By sending a string of information, you receive a structured and enriched JSON format of the data. One of Logstash’s main uses is to index documents in data stores that require structured information, most commonly Elasticsearch. For example, if you send,  “Hello world”, in a string to Logstash, you will receive a JSON output. By default, this structured information of key values will include the message, “Hello world”, a timestamp of when the message was received, a host name from the source of the message, and a version.

Five Logstash Pitfalls, Tips, and Possible Solutions

Although Logstash is great, no product is flawless. Below are the top five pitfalls that we’ve encountered in our journey working with Logstash users.

1. Key-Value Filter (KV Plugin)

Key-values is a filter plug-in that extracts keys and values from a single log using them to create new fields in the structured data format. For example, let’s say a log line contains “x=5”. If you pass that through a key-value filter, it will create a new field in the output JSON format where the key would be “x” and the value would be “5”.

By default, the key-value filter will extract every key=value pattern in the source field. However, the downside is that you don’t have control over the keys and values that are created when you let it work automatically, out-of-the-box with the default configuration. It may create many keys and values with an undesired structure, and even malformed keys that make the output unpredictable. If this happens, Elasticsearch may fail to index the resulting document and parse irrelevant information.

Our Solution:

In order to get the most out of this plug-in, it is important to specify which keys should be extracted. This can be done by adding the “include_keys” parameter to the configuration. As you can see below, we’ve added “name”, “type” and “count”. Therefore, the plug-in will only extract the name, type and count keys as long as they are in the right format (e.g. name=x).

2. Memory Consumption

Logstash runs on JVM and consumes a hefty amount of resources to do so. Many discussions have been floating around regarding Logstash’s significant memory consumption. Obviously this can be a great challenge when you want to send logs from a small machine (such as AWS micro instances) without harming application performance.

Our Tip:

In order to save resources, you can use the Logstash Forwarder (previously known as Lumberjack), which is a lighter version of Logstash that includes the minimum amount of plug-ins. The forwarder uses Lumberjack’s protocol, enabling you to securely ship compressed logs, reducing resource consumption and bandwidth. The sole input is file/s, while the output can be directed to multiple destinations.

Other options do exist, as well, to send logs. You can use rsyslog on Linux machines, and there are other agents for Windows machines, such as nxlog and syslog-ng.

3. Multiple Configuration Files

When you begin working with Logstash, you tend to start with a small configuration file that grows over time. As a result, the file becomes difficult to maintain, read and understand.

Our Tip:

Did you know that you can separate your large configuration file into several different smaller files? Instead of supplying a path to a configuration file, you can set the path to the configuration folder that contains multiple configuration files. For example, you can have one file that contains the output/input transport plug-ins and have other files that contain filters. The files are merged by name, alphabetically, so it is important to name them according to how you’d like them to be ordered.

4. The Multi-Line Plug-In

Sometimes, an event message is spread across a few log lines. For example, let’s say that Java exception takes up 10 lines in a log file. When looking at the event via Elasticsearch, it’s better to be able to view all 10 lines as a single event. The Multi-Line plug-in can join multiple log lines together. Simply specify the desired pattern, and the plug-in will be able to identify which lines should be joined together accordingly.

Pitfall#1

In general, Logstash is multi-threaded based on the plug-ins you use. Surprisingly, not all of Logstash’s plug-ins are built to run in parallel. For example, the Multi-Line plug-in is not thread-safe. If you configure Logstash to run multiple filter threads, there is a good chance that the Multi-Line filter will break and may cause Logstash to crash.

Pitfall #2

When sending multiple logs with TCP, generally speaking, TCP will break them up log by log, sending one after the other in separate packets in a stream. However, TCP might place two logs in the same packet in a stream. Multi-Line doesn’t know how to handle this since it expects each message to come in a separate packet.

There is no single tip for dealing with this correctly. Usually when you use plug-ins in Logstash, you don’t need to think about whether or not they are thread safe or work in TCP. However, while you may think everything is working correctly with Multi-Line, you may find out later that it’s not. Be sure to use it correctly.

5. Varying Syntax between Plug-Ins

There are a few common things you want to do with Logstash. For example, since it creates a structured file with fields and values, it is common to add and remove fields and tags. Most of the plug-ins allow you to perform these types of global operations. However, this can be problematic because plug-ins have different syntax. Therefore, the configuration that you use to add a field in one plug-in, may not work in another.

example

i.e. adding tags to the event in tcp input or file input is done by

but in mutate filter adding tags is configured by:

Our Tip

Since you don’t know if one plug-in’s configuration will work on another plug-in, be sure to test the configuration before you run it. You can test the configuration by running Logstash with the —configtest command line parameter. This doesn’t actually run Logstash, but it does validate the configuration.

Bonus Tip: Ruby Debug Performance

It is very useful to print incoming and outgoing messages. This makes it easier to debug the system. However, forgetfulness happens. The issue here is that forgetting about these print outs could result in excessive resource consumption and increased latency.

Our Tip

When you move to production, it is obviously important to remove the STDout plug-ins. The debug mode should be off in production, or else you run the risk of slowing down the environment.

What Does Logz.io Provide?

Logstash is a great tool that has created a lot of ease in centralizing logs for DevOps operations. However, while open source has its advantages, it also has its disadvantages. Logz.io offers Logstash as a service as part of our full cloud-based ELK platform.


A Comparison of Fluentd and Logstash

By Noni Peri

logstash and fluentd

The unsung heroes of log analysis are the log collectors. They are the hard-working daemons that run on servers to pull server metrics, parse log files, and transport them to backend systems such as Elasticsearch and PostgreSQL. While visualization tools such as Kibana and re:dash bask in the glory, the log collectors ensure that all logs are routed to the correct locations in the first place.

In the open source world, the two most-popular data collectors are Logstash and Fluentd. Logstash is most known for being part of the ELK Stack while Fluentd has become increasingly used by communities of users of software such as Docker, GCP, and Elasticsearch.

In this article, we aim to give a no-frills comparison of Logstash, which is owned by by Elastic, and Fluentd, which is owned by Treasure Data. The goal is to collect all of the facts about these excellent software platforms in one place so that readers can make informed decisions for their next projects.

We at Logz.io support both Logstash and Fluentd, and we see a growing number of customers leveraging Fluentd to ship logs to us. As a result, it was important for us to make this comparison. Here, we have compiled a summary chart of the differences between Logstash and Fluentd, and then we go into more detail below.

fluentd logstash comparison

Platform Comparison

For a long time, one of the advantages of Logstash was that it is written in JRuby, and hence it ran on Windows. Fluentd, on the other hand, did not support Windows until recently due to its dependency on a *NIX platform-centric event library. Not anymore. As of this pull request, Fluentd now supports Windows.

Logstash: Mac and Windows
Fluentd: Mac and Windows

Event Routing Comparison

One of the key features of log collectors is event routing. Both log collectors support routing, but their approaches are different.

Logstash routes all data into a single stream and then uses algorithmic if-then statements to send them to the right destination. Here is an example that sends error events in production to PagerDuty:

Fluentd relies on tags to route events. Each Fluentd event has a tag that tells Fluentd where it wants to be routed. For example, if you are sending error events in production to PagerDuty, the configuration would look something like this:

Fluentd’s approach is more declarative whereas Logstash’s method is procedural. For programmers trained in procedural programming, Logstash’s configuration can be easier to get started. On the other hand, Fluentd’s tag-based routing allows complex routing to be expressed cleanly. For example, the following configuration applies different logic to all production and development events based on tag prefixes.

Logstash: Uses algorithmic statements to route events and is good for procedural programmers
Fluentd: Uses tags to route events and is better at complex routing

Plugin Ecosystem Comparison

Both Logstash and Fluentd have rich plugin ecosystems covering many input systems (file and TCP/UDP), filters (mutating data and filtering by fields), and output destinations (Elasticsearch, AWS, GCP, and Treasure Data)

One key difference is how plugins are managed. Logstash manages all its plugins under a single GitHub repo. While the user may write and use their own, there seems to be a concerted effort to collect them in one place. As of this writing, there are 199 plugins under logstash-plugins GitHub repo.

Fluentd, on the other hand, adopts a more decentralized approach. Although there are 516 plugins, the official repository only hosts 10 of them. In fact, among the top 5 most popular plugins (fluent-plugin-record-transformer, fluent-plugin-forest, fluent-plugin-secure-forward, fluent-plugin-elasticsearch, and fluent-plugin-s3), only one is in the official repository!

Logstash: Centralized plugin repository
Fluentd: Decentralized plugin repository

Transport Comparison

Logstash lacks a persistent internal message queue: Currently, Logstash has an on-memory queue that holds 20 events (fixed size) and relies on an external queue like Redis for persistence across restarts. This is a known issue for Logstash, and it is actively worked on this issue where they aim to persist the queue on-disk.

Fluentd, on the other hand, has a highly configurable buffering system. It can be either in-memory or on-disk with more parameters that you ever care to know.

The upside of Logstash’s approach is simplicity: the mental model for its sized queue is very simple. However, you must deploy Redis alongside Logstash for improved reliability in production. Fluentd has built-in reliability, but its configuration parameters take some getting used to.

Logstash: Needs to be deployed with Redis to ensure reliability
Fluentd: Built-in reliability, but its configuration is more complicated

Performance Comparison

This is a nebulous topic. As discussed in this talk at OpenStack Summit 2015, both perform well in most use cases and consistently grok through 10,000+ events per second.

That said, Logstash is known to consume more memory at around 120MB compared to Fluentd’s 40MB. For modern machines, this is hardly a meaningful difference between the two aggregators. For leaf machines, it’s a different story: Spread across 1,000 servers, this can means 80GB of additional memory use, which is significant. (This hypothetical number comes from the 80MB difference between Logstash and FluentD on a single machine multiplied by 1,000 machines.)

Don’t worry, Logstash has a solution. Instead of running the fully featured Logstash on leaf nodes, Elastic recommends that you run Elastic Beats, resource-efficient, purpose-built log shippers. Each Beat focuses on one data source only and does that well. On Fluentd’s end, there is Fluent Bit, an embeddable low-footprint version of Fluentd written in C, as well as Fluentd Forwarder, a stripped down version of Fluentd written in Go.

Logstash: Slightly more memory use. Use Elastic Beats for leaf machines.
Fluentd: Slightly less memory use. Use Fluent Bit and Fluentd Forwarder for leaf machines.

What Does Logz.io Provide?

While there are several differences, the similarities between Logstash and Fluentd are greater than their differences. Users of either Logstash or Fluentd are miles ahead of the curve when it comes to log management. The Logz.io cloud-based ELK log management platform supports both Fluentd and Logstash.


A Guide to Logstash Plugins

By Ofer Velich

logstash plugins

Log monitoring and management is one of the most important functions in DevOps, and the open-source software Logstash is one of the most common platforms that are used for this purpose.

Often used as part of the ELK Stack, Logstash version 2.1.0 now has shutdown improvements and the ability to install plugins offline. Here are just a few of the reasons why Logstash is so popular:

  • Logstash is able to do complex parsing with a processing pipeline that consists of three stages: inputs, filters, and outputs
  • Each stage in the pipeline has a pluggable architecture that uses a configuration file that can specify what plugins should be used at each stage, in which order, and with what settings
  • Users can reference event fields in a configuration and use conditionals to process events when they meet certain, desired criteria
  • Since it is open source, you can change it, build it, and run it in your own environment

For more information on using Logstash, see this Logstash tutorial, this comparison of Fluentd vs. Logstash, and this blog post that goes through some of the mistakes that we have made in our own environment (and then shows how to avoid them). However, these issues are minimal — Logstash is something that we recommend and use in our environment.

In fact, many Logstash problems can be solved or even prevented with the use of plugins that are available as self-contained packages called gems and hosted onRubyGems. Here are several that you might want to try in your environment.

logstash input

Logstash Input Plugins

Input plugins get events into Logstash and share common configuration options such as:

  • ype — filters events down the pipeline
  • tags — adds any number of arbitrary tags to your event
  • codec — the name of Logstash codec used to represent the data

File

This plugin streams events from a file by tracking changes to the monitored files and pulling the new content as it’s appended, and it keeps track of the current position in each file by recording it. The input also detects and handles file rotation.

You can configure numerous items including plugin path, codec, read start position, and line delimiter. Usually, the more plugins you use, the more resource that Logstash may consume.

Lumberjack

This plugin receives events using the Lumberjack Protocol, which is secure while having low latency, low resource usage, and a reliable protocol. It uses a logstash-forwarder client as its data source, so it is very fast and much lighter than logstash. All events are encrypted because the plugin input and forwarder client use a SSL certificate that needs to be defined in the plugin.

Here are the required configuration:

 

Beats

Filebeat is a lightweight, resource-friendly tool that is written in Go and collects logs from files on servers and forwards them to other machines for processing.The tool uses the Beats protocol to communicate with a centralized Logstash instance. You can also using an optional SSL certificate to send events to Logstash securely.

The required configuration:

TCP

This plugin reads events over a TCP socket. Each event is assumed to be one line of text. The optional SSL certificate is also available. In the codec, the default value is “line.”

The required configuration:

Filter Plugins

This is an optional stage in the pipeline during which you can use filter plugins to modify and manipulate events. Within the filter (and output) plugins, you can use:

  • Field references — The syntax to access a field is [fieldname]. To refer a nested field, use [top-level field][nested field]
  • Sprintf format — This format enables you to access fields using a the value of a printed field. The syntax “%{[fieldname]}”

The power of conditional statements syntax is also available:

Grok

This plugin is the “bread and butter” of Logstash filters and is used ubiquitously to derive structure out of unstructured data. It helps you to define a search and extract parts of your log line into a structured fields. Roughly 120 integrated patterns are available.

Grok works by combining text patterns into something that matches your logs. The plugin sits on top of regular expressions, so any regular expressions are valid in grok. You can define your own custom patterns in this manner:

A mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events:

GEOIP

This plugin looks up IP addresses, derives geographic location information from the addresses, and adds that location information to logs.

The configuration options:

  • Source — The field containing the IP address, this is a required setting
  • Target — By defining a target in the geoip configuration option, You can specify the field into which Logstash should store the geoip data

If you save the data to a target field other than geoip and want to use the geo\_point related functions in Elasticsearch, you need to alter the template provided with the Elasticsearch output and configure the output to use the new template:

Multiline

This plugin will collapse multiline messages from a single source into one logstash event.

The configuration options:

  • Pattern — This required setting is a regular expression that matches a pattern that indicates that the field is part of an event consisting of multiple lines of log data
  • What — This can use one of two options (previous or next) to provide the context for which (multiline) event the current message belongs

For example:

This means that any line starting with whitespace belongs to the previous line.

Important note: This filter will not work with multiple worker threads.

KV

This plugin helps to parse messages automatically and break them down into key value pairs. By default, it will try to parse the message field and look for an ‘=’ delimiter. You can configure any arbitrary strings to split your data on any event field

The configuration options:

This powerful parsing mechanism should not be used without a limit because the production of an unlimited number of fields can hurt your efforts to index your data in Elasticsearch later.

Date

The date plugin is used for parsing dates from fields and then using that date as the logstash @timestamp for the event. It is one of the most important filters that you can use — especially if you use Elasticsearch to store and Kibana to visualize your logs — because Elasticsearch will automatically detect and map that field with the listed type of timestamp.

This plugin ensures that your log events will carry the correct timestamp and not a timestamp based on the first time Logstash sees an event.

The configuration options:

  • Match — You can specify an array of field name, followed by a date-format pattern. That can help to support fields that have multiple time formats. The date formats allowed are defined by the Java library Joda-Time.

One example:

Logstash Codecs

Codecs can be used in both inputs and outputs. Input codecs provide a convenient way to decode your data before it enters the input. Output codecs provide a convenient way to encode your data before it leaves the output. Some common codecs:

  • The default “plain” codec is for plain text with no delimitation between events
  • The “json” codec is for encoding json events in inputs and decoding json messages in outputs — note that it will revert to plain text if the received payloads are not in a valid json format
  • The “json_lines” codec allows you either to receive and encode json events delimited by \n or to decode jsons messages delimited by \n in outputs
  • The “rubydebug,” which is very useful in debugging, allows you to output Logstash events as data Ruby objects

Logstash Output Plugins

An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline.

Redis

The Redis plugin is used to output events to redis using a RPUSH, Redis is a key-value data store that can serve as a buffer layer in your data pipeline. Usually, you will use Redis as a message queue for Logstash shipping instances that handle data ingestion and storage in the message queue.

The configuration options:

Kafka

Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable. We at Logz.io use Kafka as a message queue for all of our incoming message inputs, including those from Logstash.

Usually, you will use Kafka as a message queue for your Logstash shipping instances that handles data ingestion and storage in the message queue. The Kafka plugin writes events to a Kafka topic and uses the Kafka Producer API to write messages.

The only required configuration is the topic name:

Stdout

This is a simple output that prints to the stdout of the shell running logstash. This output can be quite convenient when debugging plugin configurations

A Final Note

What Logstash plugins to you like to use when you monitor and manage your log data in your own environments? I invite your additions and thoughts in the comments below.

What Does Logz.io Provide?

Sick of dealing with all of the plugins that Logstash needs to function as an effective log management platform? The Logz.io cloud-based ELK platform as a service already has them.


How to Deploy the ELK Stack in Production

By Tomer Levy

Log management has become a must-do action for every organization to resolve problems and ensure that applications are running in healthy manner. As such, log management has become a mission-critical system.

graphs kibana 4

When you’re troubleshooting a production issue or trying to identify a security hazard, the system must be up and running around the clock. Otherwise, you won’t be able to troubleshoot or resolve issues that arise — potentially resulting in performance degradation, downtime or security breach. Log analytics that runs continuously can equip you with the means to track and locate the specific issues that are wreaking havoc on your system.

ELK (a.k.a., Elasticsearch, Logstash, and Kibana) is one of the largest-growing, open-source log management platforms — plus, it comes with a vibrant community! ELK’s rapid growth is demonstrated by its adoption by huge companies such as Netflix and Bloomberg. In this article, I will use our experiences in building Logz.io to introduce the challenges as well as offer some guidelines in building a production-grade ELK deployment.

checklist

Overall, ELK implementation needs to:

  1. Save and index all of the log files that it receives (sounds obvious, right?)
  2. Operate when the production system is overloaded or even failing (because that’s when most issues occur, isn’t it?)
  3. Keep the log data protected from unauthorized access
  4. Have maintainable approaches to data retention policies, upgrades, and more

Don’t Lose Log Data

If you are troubleshooting an issue and go over a set of events, it only takes missing one log line to get incorrect results. Every log event must be captured. For example, you’re viewing a set of events in MySQL that ends with a database exception. If you lose one of the events, it might be impossible to pinpoint the cause of the problem.

Use Redis

redis logo

Place a Redis server in front of your Logstash machine that will be the entry point for all log events that are shipped to your system. It will then buffer the data until the downstream components have enough resources to index. Elasticsearch is the engine at the heart of ELK. It is very susceptible to load, which means you need to be extremely careful when indexing and increasing your amount of documents. When Elasticsearch is busy, Logstash works slower than normal — which is where Redis comes in, accumulating more documents that can then be pushed to Elasticsearch. This is critical not to lose log events.

See the suggested architecture below:

suggested architecture

 

Keep Track of Logstash/Elasticsearch Exceptions

Logstash may fail when it tries to index logs to Elasticsearch that can’t fit into the automatically-generated mapping. For example, let’s say you have a log entry that looks like this:

And later, the system generates a similar log:

In the first case, a number is used. In the second case, a string is used. As a result, Elasticsearch will NOT index the document — it will just return a failure message and the log will be dropped. To make sure that such logs are still indexed, you need to:

  1. Use the “Type” field for each type of log. “Type” is somewhat similar to an SQL table. For a given “type,” a field can have one schema (integer OR string).
  2. Work with developers to make sure they’re keeping log formats consistent. If a log schema change is required, just change the “Type” field according to the type of log.
  3. Ensure that Logstash is consistently fed with information and monitors Elasticsearch exceptions to ensure that logs are not shipped in the wrong formats. Using mapping that is fixed and less dynamic is probably the only solid solution here (that doesn’t require you to start coding).

At Logz.io, we solve this problem by building a pipeline to handle mapping exceptions that eventually indexes these documents in manners that don’t collide with existing mapping.

Keep Up with Growth and Bursts

data servers

Machines pile up, data grows, and log files follow suit. As you scale out with more products, applications, features, developers, and operations, you also accumulate more logs. This requires a certain amount of compute resource and storage capacity so that your system can process all of them. In general, log management solutions consume large amounts of CPU, memory, and storage. Log systems are bursty by nature, and sporadic bursts are typical. If a file is purged from your database, the frequency of logs that you receive may range from 100 to 200 to 100,000 logs per second. As a result, you need to allocate up to 10 times more capacity than normal. When there is a real production issue, many systems generally report failures or disconnections, which cause them to generate many more logs. This is actually when log management systems are needed more than ever.

ELK Elasticity

One of the biggest challenges of building an ELK deployment, then, is making it scalable. Let’s say you have an e-commerce site and experience an increasing number of incoming log files during a particular time of year. To ensure that this influx of log data does not become a bottleneck, you need to make sure that your environment can scale with ease. This requires that you scale on all fronts — from Redis to Logstash and Elasticsearch — which is challenging in multiple ways. Regardless of where you’re deploying your ELK platform — be it on Amazon Web Services, in the Google Cloud, or in your own datacenter — we recommend having a cluster of Elasticsearch nodes that run in different availability zones, or in different segments of a datacenter, to ensure high availability.

Let’s discuss it one component at a time.

Redis

You probably want to run more than one Redis server, an action that requires some work. You can either build it on you own or run Redis Labs as a service. The only challenge is dividing inbound log traffic from UDP/TCP across multiple Redis servers. For this, we use AWS Route 53 and will share some more about this in the future.

Logstash

To read and push to Elasticsearch, it’s best to use a Logstash instance for each Redis server. That’s easy. Each one of your Logstash instances should run in a different AZ (on AWS). You should also separate Logstash and Elasticsearch by using different machines for them. This is critical because they both run as JVMs and consume large amounts of memory, which makes them unable to run on the same machine effectively. I also recommend allocating half of the memory on each machine to Elasticsearch or Logstash.

Cluster Elasticsearch

Elasticsearch is composed of a number of different components, two of which are the most important: the master nodes and the data nodes. The master nodes are responsible for cluster management while the data nodes, as the name suggests, are in charge of the data. We recommend clustering Elasticsearch with at least three master nodes because of the common occurrence of split brain, which is essentially a dispute between two nodes regarding which one is actually the master. As a result, using three master nodes prevents split brain from happening. As far as the data nodes go, we recommend having at least two data nodes so that your data is replicated at least once. This results in a minimum of five nodes: the three master nodes can be small machines, and the two data nodes need to be scaled on solid machines with very fast storage and a large capacity for memory.

A quick note about AWS: Since multicast doesn’t work with AWS, Elasticsearch’s cluster discovery protocol will not work. You need to install the Elasticsearch AWS plugin or configure unicast cluster discovery.

Run in Different AZs (But Not in Different Regions)

We recommend having your Elasticsearch nodes run in different availability zones or in different segments of a datacenter to ensure high availability. This can be done through the Elasticsearch setting that allows you to configure every log to be replicated between different AZs.

While it may seem odd, we don’t recommend running Elasticsearch clusters across AWS regions. Due to the increased latency that is experienced when synchronizing data between regions, it doesn’t work well… — at least according to the tests we have done.

Here is a suggested architecture to run ELK on multiple AZs or multiple separated data centers:

suggested architecture two

 

Protect the Environment

data security

Access Control

Due to the fact that logs may contain sensitive data, it is crucial to protect who can see what. How can you limit access to specific dashboards, visualizations, or data inside your log analytics platform? As far as we know, there is no simple way to do this in the ELK Stack. One option is to use nginx reverse proxy to access your Kibana dashboard, which entails a simple nginx configuration that requires those who want to access the dashboard to have a username and password. This quickly blocks access to your Kibana console.

The challenge here arises if you would like to limit access on a more granular level.  This is currently impossible within open source ELK. The only thing you can do is use Elasticsearch Shield and build the security from Elasticsearch up the stack. At Logz.io, we take a different approach that allows for role-based access.

Last but not least, be careful when exposing Elasticsearch because it is very susceptible to attacks. We recommend limiting all access to Elasticsearch inbound port 9200 from the internet.

Maintainability

Log Data Consistency

Logstash has a component that parses logs and tokenizes them in accordance with a set of rules. Therefore, if you have an access log from nginx, you want the ability to view each field and have visualizations and dashboards built based on specific fields. You need to apply the relevant parsing abilities to Logstash — which has proven to be quite a challenge, particularly when it comes to building groks, debugging them, and actually parsing logs to have the relevant fields for Elasticsearch and Kibana.

At the end of the day, it is very easy to make mistakes using Logstash, which is why you should carefully test and maintain all of your log configurations by means of version control. That way, while you may get started using nginx and MySQL, you may incorporate custom applications as you grow that result in large and hard-to-manage log files. The community has generated a lot of solutions around this topic, but trial and error is extremely important with open source tools before using them in production.

Data Retention

Another aspect of maintainability comes into play with excess indices. Depending on how long you want to retain data, you need to have a process set up that will automatically delete old indices — otherwise, you will be left with too much data and your Elasticsearch will crash, resulting in data loss. To prevent this from happening, you can use Elasticsearch Curator to delete indices. We recommend having a cron job that automatically spawns Curator with the relevant parameters to delete any old indices, ensuring you don’t end up holding too much data. It is commonly required to save logs to S3 in a bucket for compliance, so you want to be sure to have a copy of the logs in their original format. Copying should be done before logs are parsed by Logstash.

Upgrades

Performing Elasticsearch upgrades can be an endeavor on its own. First and foremost, you don’t want to lose any data in the lengthy process. New versions are released every few weeks, so pay careful attention to the upgrade process if you choose to go that route. Make sure you have at least one replication on Elasticsearch and perform warm upgrades (sync data and upgrade one node at a time).

Logstash upgrades are generally easier, but pay close attention to the compatibility between Logstash and Elasticsearch.

Kibana upgrades are hard, but possible. Kibana 3 does not have an upgrade path to Kibana 4. We’ve just met a user who has 600 different Kibana 3 dashboards and now has to rewrite them to use Kibana 4. I guess that’s one of the drawbacks of free open-source solutions.

logz.io logo

Summary

Getting started with ELK to process logs from a server or two is easy and fun. Like any other production system, it takes much more work to reach a solid production deployment. We know this because we’ve been working with many users who struggle with making ELK operational in production.

What Does Logz.io Provide?

Don’t want to deploy and manage an entire ELK Stack on your own? The Logz.io cloud-based ELK platform as a service might just be the thing for you.


How to Install the ELK Stack on AWS

By Asaf Yigal

elk stack amazon web services

ELK is a great open-source stack for log aggregation and analytics. It stands for Elasticsearch (a NoSQL database and search server), Logstash (a log shipping and parsing service), and Kibana (a web interface that connects users with the Elasticsearch database and enables visualization and search options for system operation users). With a large open-source community, ELK has become quite popular, and it is a pleasure to work with.

In this article, we will guide you through the simple ELK installation process on Amazon Web Services.

The following instructions will lead you through the steps involved in creating a working sandbox environment. Due to the fact that a production setup is more comprehensive, we decided to elaborate on how each component configuration should be changed to prepare for use in a production environment.

We’ll start by describing the environment, then we’ll walk through how each component is installed, and finish by configuring our sandbox server to send its system logs to Logstash and view them via Kibana.

Note: All of the ELK components need Java to work, so we will have to install a Java Development Kit (JDK) first.

The AWS Environment

amazon web servicesWe ran this tutorial on a single AWS Ubuntu 14.04 server (ami-d05e75b8 in US-East zone) on an m4.large instance using its local storage. We started an EC2 instance in the public subnet of a VPC, and then we set up the security group (firewall) to enable access from anywhere using SSH and TCP 5601 (Kibana). Finally, we added a new elastic IP address and associated it with our running instance in order to connect to the internet.

Production tip: A production installation needs at least three EC2 instances — one per component, each with an attached EBS SSD volume.

Step-by-Step ELK Installation

To start, connected to the running server via SSH: ssh ubuntu@YOUR_ELASTIC_IP

Package installations

Prepare the system by running:

Install OpenJDK

All of the packages we are going to install require Java. Both OpenJDK and Oracle Java are supported, but installing OpenJDK is simpler:

Verify that Java is installed:

If the output of the previous command is similar to this, then you’ll know that you’re heading in the right direction:

Elasticsearch Installation

elasticsearchElasticsearch is a widely used database and search server, and it’s the main component of the ELK setup.

Elasticsearch’s benefits include:

  • Easy installation and use
  • A powerful internal search technology (Lucene)
  • A RESTful web interface
  • The ability to work with data in schema-free JSON documents (noSQL)
  • Open source

To begin the process of installing Elasticsearch, add the following repository key:

Add the following Elasticsearch list to the key:

Install:

Start service:

Test:

If the output is similar to this, then you will know that Elasticsearch is running properly:

In order to make the service start on boot run:

Production tip: DO NOT open any other ports, like 9200, to the world! There are many bots that search for 9200 and execute groovy scripts to overtake machines.

Logstash Installation

Screen Shot 2015-10-13 at 13.02.23Logstash is an open-source tool that collects, parses, and stores logs for future use and makes rapid log analysis possible. Logstash is useful for both aggregating logs from multiple sources, like a cluster of Docker instances, and parsing them from text lines into a structured format such as JSON. In the ELK Stack, Logstash uses Elasticsearch to store and index logs.

To begin the process of installing Logstash, add the following Elasticsearch list:

Then, install the service, have it start on boot, and run:

To make sure it runs, execute the following command:

The output should be:

Redirect System Logs to Logstash

Create the following file:

You will have to use Sudo to write in this directory:

This file tells Logstash to store the local syslog ‘/var/log/syslog’ and all the files under ‘/var/log*.log’ inside the Elasticsearch database in a structured way.

The input section specifies which files to collect (path) and what format to expect (syslog). The output section uses two outputs – stdout and elasticsearch. The stdout output is used to debug Logstash – you should find nicely-formatted log messages under ‘/var/log/logstash/logstash.stdout’. The elasticsearch output is what actually stores the logs in Elasticsearch.

In this example, we are using localhost for the Elasticsearch hostname. In a real production setup, however, the Elasticsearch hostname would be different because Logstash and Elasticsearch should be hosted on different machines.

Finally, restart Logstash to reread its configuration:

Kibana Installation

kibanaKibana is an open-source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line, and scatter plots; pie charts; and maps on top of large volumes of data.

Among other uses, Kibana makes working with logs easy. Its graphical web interface even lets beginning users execute powerful log searches.

To begin the process of installing Kibana, download the following binary with this command:

Extract it:

Move the files to ‘/opt’, create a service file, and have it start on boot:

Test: Point your browser to ‘http://YOUR_ELASTIC_IP:5601’ after Kibana is started.

You should see a page similar to this:

install kibana aws

Before continuing with the Kibana setup, you must configure an Elasticsearch index pattern.

What does an “index pattern” mean, and why do we have to configure it? Logstash creates a new Elasticsearch index (database)every day. The names of the indices look like this: logstash-YYYY.MM.DD — for example, “logstash-2015.09.10” for the index that was created on September 10, 2015.

Kibana works with these Elasticsearch indices, so it needs to know which ones to use. The setup screen provides a default pattern, ‘logstash-*’, that basically means “Show the logs from all of the dates.”

Clicking the “Create” button creates the pattern and enables Kibana to find the logs.

Production tip: In this tutorial, we are accessing Kibana directly through its application server on port 5601, but in a production environment you might want to put a reverse proxy server, like Nginx, in front of it.

To configure Kibana to show the logs:

1. Go to the Kibana configuration page
2. Click on “Create”
3. Click on “Discover” in the navigation bar to find your log

The result should look like this:

run kibana on aws

As you can see, creating a whole pipeline of log shipping, storing, and viewing is not such a tough task. In the past, storing, and analyzing logs was an arcane art that required the manipulation of huge, unstructured text files. But the future looks much brighter and simpler.

What Does Logz.io Provide?

Don’t want to deploy and manage an entire ELK Stack in your AWS environment? The Logz.io cloud-based ELK platform as a complete service might help.


How to Install the ELK Stack on AWS: A Step-By-Step Guide

By Asaf Yigal

elk stack amazon web services

ELK is a great open-source stack for log aggregation and analytics. It stands for Elasticsearch (a NoSQL database and search server), Logstash (a log shipping and parsing service), and Kibana (a web interface that connects users with the Elasticsearch database and enables visualization and search options for system operation users). With a large open-source community, ELK has become quite popular, and it is a pleasure to work with.

In this article, we will guide you through the simple ELK installation process on Amazon Web Services.

The following instructions will lead you through the steps involved in creating a working sandbox environment. Due to the fact that a production setup is more comprehensive, we decided to elaborate on how each component configuration should be changed to prepare for use in a production environment.

We’ll start by describing the environment, then we’ll walk through how each component is installed, and finish by configuring our sandbox server to send its system logs to Logstash and view them via Kibana.

Note: All of the ELK components need Java to work, so we will have to install a Java Development Kit (JDK) first.

The AWS Environment

We ran this tutorial on a single AWS Ubuntu 14.04 server (ami-d05e75b8 in US-East zone) on an m4.large instance using its local storage. We started an EC2 instance in the public subnet of a VPC, and then we set up the security group (firewall) to enable access from anywhere using SSH and TCP 5601 (Kibana). Finally, we added a new elastic IP address and associated it with our running instance in order to connect to the internet.

Production tip: A production installation needs at least three EC2 instances — one per component, each with an attached EBS SSD volume.

Step-by-Step ELK Installation

To start, connected to the running server via SSH: ssh ubuntu@YOUR_ELASTIC_IP

Package installations

Prepare the system by running:

sudo apt-get update
sudo apt-get upgrade

Install OpenJDK

All of the packages we are going to install require Java. Both OpenJDK and Oracle Java are supported, but installing OpenJDK is simpler:

sudo apt-get install openjdk-7-jre-headless

Verify that Java is installed:

java -version

If the output of the previous command is similar to this, then you’ll know that you’re heading in the right direction:

java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

You can set up your own ELK stack using this guide or try out our simple ELK as a Service solution.


Elasticsearch Installation

Elasticsearch is a widely used database and search server, and it’s the main component of the ELK setup.

Elasticsearch’s benefits include:

  • Easy installation and use
  • A powerful internal search technology (Lucene)
  • A RESTful web interface
  • The ability to work with data in schema-free JSON documents (noSQL)
  • Open source

To begin the process of installing Elasticsearch, add the following repository key:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Add the following Elasticsearch list to the key:

echo "deb http://packages.elastic.co/elasticsearch/1.7/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-1.7.list
sudo apt-get update

Install:

sudo apt-get install elasticsearch

Start service:

sudo service elasticsearch restart

Test:

curl localhost:9200

If the output is similar to this, then you will know that Elasticsearch is running properly:

{
"status" : 200,
"name" : "Jigsaw",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.1",
"build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
"build_timestamp" : "2015-07-29T09:54:16Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}

In order to make the service start on boot run:

sudo update-rc.d elasticsearch defaults 95 10

Production tip: DO NOT open any other ports, like 9200, to the world! There are many bots that search for 9200 and execute groovy scripts to overtake machines.

Logstash Installation

Screen Shot 2015-10-13 at 13.02.23Logstash is an open-source tool that collects, parses, and stores logs for future use and makes rapid log analysis possible. Logstash is useful for both aggregating logs from multiple sources, like a cluster of Docker instances, and parsing them from text lines into a structured format such as JSON. In the ELK Stack, Logstash uses Elasticsearch to store and index logs.

To begin the process of installing Logstash, add the following Elasticsearch list:

echo "deb http://packages.elasticsearch.org/logstash/1.5/debian stable main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update

Then, install the service, have it start on boot, and run:

sudo apt-get install logstash
sudo update-rc.d logstash defaults 97 8
sudo service logstash start

To make sure it runs, execute the following command:

sudo service logstash status

The output should be:

logstash is running

Redirect System Logs to Logstash

Create the following file:

/etc/logstash/conf.d/10-syslog.conf

You will have to use Sudo to write in this directory:

input {
file {
type => "syslog"
path => [ "/var/log/messages", "/var/log/*.log" ]
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
host => "localhost" # Use the internal IP of your Elasticsearch server
# for production
}
}

This file tells Logstash to store the local syslog ‘/var/log/syslog’ and all the files under ‘/var/log*.log’ inside the Elasticsearch database in a structured way.

The input section specifies which files to collect (path) and what format to expect (syslog). The output section uses two outputs – stdout and elasticsearch. The stdout output is used to debug Logstash – you should find nicely-formatted log messages under ‘/var/log/logstash/logstash.stdout’. The elasticsearch output is what actually stores the logs in Elasticsearch.

In this example, we are using localhost for the Elasticsearch hostname. In a real production setup, however, the Elasticsearch hostname would be different because Logstash and Elasticsearch should be hosted on different machines.

Production tip: Running Logstash and Elasticsearch is a very common pitfall of the ELK stack and often causes servers to fail in production. You can read some more tip on how to install ELK in production.

Finally, restart Logstash to reread its configuration:

sudo service logstash restart

You can set up your own ELK stack using this guide or try out our simple ELK as a Service solution.


Kibana Installation

kibanaKibana is an open-source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of the content indexed on an Elasticsearch cluster. Users can create bar, line, and scatter plots; pie charts; and maps on top of large volumes of data.

Among other uses, Kibana makes working with logs easy. Its graphical web interface even lets beginning users execute powerful log searches.

To begin the process of installing Kibana, download the following binary with this command:

wget https://download.elastic.co/kibana/kibana/kibana-4.1.1-linux-x64.tar.gz

Extract it:

tar -xzf kibana-4.1.1-linux-x64.tar.gz
cd kibana-4.1.1-linux-x64/

Move the files to ‘/opt’, create a service file, and have it start on boot:

sudo mkdir -p /opt/kibana
sudo mv kibana-4.1.1-linux-x64/* /opt/kibana
cd /etc/init.d && sudo wget https://raw.githubusercontent.com/akabdog/scripts/master/kibana4_init -O kibana4
sudo chmod +x /etc/init.d/kibana4
sudo update-rc.d kibana4 defaults 96 9
sudo service kibana4 start

Test: Point your browser to ‘http://YOUR_ELASTIC_IP:5601’ after Kibana is started.

You should see a page similar to this:

install kibana aws

Before continuing with the Kibana setup, you must configure an Elasticsearch index pattern.

What does an “index pattern” mean, and why do we have to configure it? Logstash creates a new Elasticsearch index (database)every day. The names of the indices look like this: logstash-YYYY.MM.DD — for example, “logstash-2015.09.10” for the index that was created on September 10, 2015.

Kibana works with these Elasticsearch indices, so it needs to know which ones to use. The setup screen provides a default pattern, ‘logstash-*’, that basically means “Show the logs from all of the dates.”

Clicking the “Create” button creates the pattern and enables Kibana to find the logs.

Production tip: In this tutorial, we are accessing Kibana directly through its application server on port 5601, but in a production environment you might want to put a reverse proxy server, like Nginx, in front of it.

To configure Kibana to show the logs:

1. Go to the Kibana configuration page
2. Click on “Create”
3. Click on “Discover” in the navigation bar to find your log

The result should look like this:

run kibana on aws

As you can see, creating a whole pipeline of log shipping, storing, and viewing is not such a tough task. In the past, storing, and analyzing logs was an arcane art that required the manipulation of huge, unstructured text files. But the future looks much brighter and simpler.

What Does Logz.io Provide?

Don’t want to install and then have to worry about an entire ELK Stack in your AWS environment? The Logz.io cloud-based ELK platform as a complete service might help.


Troubleshooting 5 Common ELK Glitches

By Daniel Berman

Getting started with the ELK Stack is straightforward enough and usually includes just a few commands to get all three services up and running. But — and this is big “but” — there are some common issues that can cause users some anguish.

The first piece of good news is that these issues are usually easy to resolve. The other piece of good news is that we’ve put together the top five most-common issues and explained how to troubleshoot them.

#1. Kibana is Unable to Connect to Elasticsearch

You’ve installed Elasticsearch, Logstash, and Kibana. You open the latter in your browser and get the following screen:

kibana does not connect to elasticsearch

All is not lost! This is a pretty common issue, and it can be easily resolved.

As the error message implies, Kibana cannot properly establish a connection with Elasticsearch. The reasons for this vary, but it is usually a matter of defining the Elasticsearch instance correctly in the Kibana configuration file.

Open the file at /opt/kibana/config/kibana.yml and verify that the server IP and host for ‘elasticsearch_url’ are configured correctly (both the URL and port):

Here is an example for a locally-installed Elasticsearch instance:

Restart Kibana:

That should do it. If the problem persists, there may be an issue with Elasticsearch. Check out the Elasticsearch troubleshooting sections below.

#2. Kibana is Unable to Fetch Mapping

In this case, Kibana has established a connection with Elasticsearch but cannot fetch mapping for an index:

kibana does not fetch mapping

As the message displayed on the grey button at the bottom of the page indicates, Kibana cannot find any indices stored in Elasticsearch that match the default logstash-* pattern — the default pattern for data being fed into the system by Logstash (which is the method Kibana assumes you are using).

If you’re not using Logstash to forward the data into Elasticsearch or if you’re using a non-standard pattern in your Logstash configuration, enter the index pattern that matches the name of one or more of your Elasticsearch indices. If Kibana finds the index pattern, the grey button will turn into a pretty green one, allowing you to define the index into Kibana.

If you are using the conventional Logstash configuration to ship data, then there is most likely a communication issue. In other words, your logs aren’t making it into Elasticsearch. For some reason, either Logstash or Elasticsearch may not be running. See the sections below for more details on how to make sure that these services are running properly.

#3. Logstash is Not Running

Logstash can be a tricky component to manage and work with. We’ve previously covered a number of pitfalls you should look out for, but there are a number of reasons that Logstash still may not be running even after taking care to avoid these landmines.

A common issue causing Logstash to fail is a bad configuration. Logstash configuration files, which are located in the /etc/logstash/conf.d directory, follow strict syntax rules that, if broken, will cause a Logstash error. The best way to validate your configurations is to use the configtest parameter in the service command:

If there’s a configuration error, it’ll show up in the output. Fix the syntax and try to run Logstash again:

Check the status of the service with:

If Logstash is still not running after you fix the issue, take a look at the Logstash logs at: /var/log/logstash/logstash.log.

Read the log message and try to fix the issue as reported in the log. Here’s an example of a log message warning us of a deprecated host configuration:

As the message itself points out, use the Elastic forums to search for an answer to the particular issue you’ve encountered and as reported in the log.

#4 Logstash is Not Shipping Data

You’ve got Logstash purring like a cat, but there is no data being shipped into Elasticsearch.

The prime suspect in this case is Elasticsearch, which may not be running for some reason or other. You can verify this by running the following cURL:

You should see the following output in your terminal:

If Elasticsearch is still not shipping data, skip over to the Elasticsearch troubleshooting section below for more reasons why Elasticsearch might not be running properly.

Another common issue that may be causing this error is a bad output configuration in the Logstash configuration file. Open the configuration file at: /etc/logstash/conf.d/xxx.conf and verify that the Elasticsearch host is configured correctly:

Restart Logstash:

#5 Elasticsearch is Not Running

How do you know Elasticsearch is not running? There are a number of indicators, and the most obvious one is that no no logs are appearing in Kibana. As specified above, the most reliable way to ping the Elasticsearch service is by cURLing it:

If all is well, you should see the following output in your terminal:

If not, the output will look like this:

Now, there are a number of possible reasons Elasticsearch is not running.

First, if you just installed Elasticsearch, you need to manually start the service because it is not started automatically upon installation:

If you still get a message that Elasticsearch is not running, you will have to dig in deeper. As with Logstash, the best place to try and debug the service is the log file: /var/log/elasticsearch/elasticsearch.log.

A common cause for a failing Elasticsearch is a bad host definition in the configuration file. Live tailing of the log file while starting the service is a good method for identifying a specific error. Here is an example:

The host configuration is located in the Network section of the Elasticsearch configuration file, and it should look like this:

Verify the configuration, and restart the service:

If the issue is not the host definition, the log will give you an indication as to the cause of the error and will help you resolve it. Search the Elastic forums — the chances are that someone else has encountered the issue before.

And one last tip (on Ubuntu only): If you had Elasticsearch working properly and it suddenly it stopped, this might be due to a restart of your server as Elasticsearch is not configured to start on boot. To change this, you can use:

A Final Note

Here at Logz.io, we’ve had a lot of experience with troubleshooting the various quirks in the ELK Stack. This article covered some common and basic setup issues that newcomers to the system might encounter. More advanced tips can be found in these Elasticsearch and Logstash cheatsheets.

Happy indexing!

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack. Start your free trial today!


Using the ELK Stack for NGINX or IIS Log Analysis

By Asaf Yigal

According to Netcraft’s latest web server survey in October 2015, NGINX and IIS servers are two of the most widely-used ones (after Apache) among the one million busiest sites worldwide.

NGINX is popular because of its focus on concurrency, high performance, and low memory usage. It serves dynamic HTTP content and is used to handle requests, caching, and load balancing. Although IIS’s popularity is declining, it’s still the most popular commercial web server and it is understandably popular among Microsoft developers.

But despite the popularity of NGINX and IIS, it is still challenging to obtain relevant and useful information from the thousands of log entries that NGINX and IIS web servers generate every second. In this article, I will take a deeper look at NGINX logs and give three use cases on how users can leverage ELK to store, parse, and analyze their NGINX and IIS web server logs.

With Elasticsearch, Logstash, and Kibana, this vast amount of log data can be collected, parsed, and stored. The digested data can then be transformed into insights that can be presented in a way so that users can receive immediate notifications and quickly find and fix the root causes of problems.

How to Parse Server Logs Using Logstash

One of the most-common things that need to be done first is to take the logs and apply some filtering and enhancements with Logstash. Here is an example of a log line and the Logstash configuration that we at Logz.io use to parse such logs in our own environment.

A sample log entry:

The Logstash configuration to parse that log entry:

A sample error log:

The Logstash configuration to parse that error log:

These are two of the configurations that we are currently using ourselves — of course, there are more fields that can be added to the NGINX and IIS log files and then can be parsed and analyzed accordingly.

The following use cases exemplify the benefits of using ELK with NGINX and IIS logs.

Log Analysis Use Cases

Use Case #1: Operational Analysis

nginx operational analysis

This is one of the most common use cases. DevOps engineers and site reliability engineers can get notifications of event such as whenever traffic is significantly higher than usual or the error rate exceeds a certain level. As a result of these issues, site page response rates can slow down to undesirable levels and create a poor user experience.

By using ELK log management to analyze error logs, users can quickly see, for example, that there is a significant decrease in the number of users who are accessing the servers or an unprecedented peak in traffic that overloaded the server and caused it to crash. If these unusual traffic patterns occur in a single dashboard, then that can indicate a DDoS attack. In response, ELK log management solutions can quickly drill down to find the suspicious source IP address of the traffic generator and block it.

One of the most helpful visualizations and ELK Stack alerts we have is the number of log lines that cache responds to disk. You can read more here about this configuration and how to track it.

This visualization and more can be found in our ELK Apps library by searching for NGINX.

Use Case #2: Technical SEO

nginx technical seo analysis

Quality content creation is now extremely important for SEO purposes, although it’s basically useless if Google has not crawled, parsed, and indexed the content. As shown in the dashboard above, tracking and monitoring your NGINX or IIS access log with ELK can provide you with the last Google crawl date to validate that your site is constantly being crawled by Googlebot.

By capturing and analyzing web server access logs with ELK, you can also find out if you have hit your Google crawl limits, how Google crawlers prioritize your web pages, and which URLs get the most and least attention. Learn how to use server log analysis for technical SEO.

Use Case #3: Business Intelligence

Access logs contain all the information needed in order to run a thorough analysis of your application users, from their geographic location to the pages they visit to the experience they are receiving. The benefit of using ELK to monitor the NGINX and IIS logs is that you can also correlate it with infrastructure-level logs and better understand your audience’s experience as it is affected by your underlying infrastructure. For example, you can analyze response times and correlate them with the CPUs and memory loads on the machines to see if stronger machines may provide a better UX.

Many of these visualizations can be found in our free ELK Apps library by searching for “NGINX” or “IIS.” Here are two examples: one is the response time that we’re getting per response code, and the other is a heat map of all of our visitors.

nginx business intelligence

nginx user heat map

What Does Logz.io Provide?

These examples are just a few of many operational reasons that users need to track their NGINX and IIS logs. From business intelligence to technical SEO, we have dashboards for these use cases and more in our free ELK Apps library.

Interested in monitoring NGINX or IIS log files with the Logz.io cloud-based ELK Stack as a service? Learn how we can help.


How to Use the ELK Stack to Monitor Performance

By Noni Peri

kibana example logz.io
Very often, when I was troubleshooting performance issues, I saw a service that is or a couple of machines that are slowed down and reaching high-CPU utilization. This might mean that it lacks resources because of high load, but very often it means that there is a bug in the code, an exception, or an error flow that over-utilizes resources. To find that out, I had to jump between NewRelic/Nagios and ELK.

So, I decided that I wanted to have one pane-of-glass to view performance metrics combined with all the events generated by the apps, operating systems, and network devices.

In order to use ELK to monitor your platform’s performance, a couple of tools and integrations are needed. Probes are required to run on each host to collect various system performance metrics. Then, the data needs to be shipped to Logstash, stored and aggregated in Elasticsearch, and then turned into Kibana graphs. Ultimately, software service operations teams use these graphs to present their results. In this article, I will share how we built our ELK stack to monitor our own service performance.

1. Collecting and Shipping

Collection

In the first stage of collecting and shipping data to Logstash, we used a tool called Collectl. This cool open-source project comes with a great number of options that allow operations to measure various metrics from multiple different IT systems and save the data for later analysis. We used it to generate, track, and save metrics such as network throughput, CPU Disk IO Wait %, free memory, and idle CPU (indicating overuse/underuse of compute resources). It can also be used to monitor other system resources such as inode use and open sockets.

Collectl command example:

collectl command example

Finally, Collectl outputs metrics into a log file in plot format. This open-source, live project knows how to gather information but does not automatically ship it to the ELK stack.

Using a Docker Container

We encapsulated Collectl in a Docker container in order to have a Docker image that basically covered all of our data collecting and shipping needs. We used Collectl version 4.0.0 and made the following configurations to avoid a couple of issues:

— In order to avoid data overflow within the container, we only keep data collected from the current day. Longer data retention periods are maintained by virtue of the ELK stack itself so you don’t need to worry about keeping all of the data in the container’s log file.

— Collectl can collect various samples in a specified interval but will dump output to a disk on a different interval. This is called a flush interval. As this is the closest speed to real-time that you can achieve, data needs to be flushed every second. A 30-second collection interval, for example, is quite an aggressive sampling interval that may not be necessary for every use case. An output formatter is used to output a plot format, which has various values on the same line with a space delimiter.

The Collect configuration file should look something like this:

Using RSYSLOG

RSYSLOG is another component of the container. It comes into play in the form of picking up data from the log file and shipping it to the ELK stack. In order for Logstash to only perform grokking on the necessary fields instead of on everything, it is advisable to add a bit more metadata to the log selected using RSYSLOG. This can be done by taking the metrics just before shipping and adding more information such as the instance name and host IP. Along with the proper time stamp, this information can be sent to Logstash.

Small Gotchas

At this stage, there are two issues that require some attention:

1 – Time stamp: First of all, Collectl doesn’t output in its time stamp of the collected data. Therefore, if you are running hosts in various time zones, they won’t be aligned properly in your ELK. In order to hack this problem, we query the time zone the container is running in and set the time stamp accordingly.

2 – Follow the Collectl log filename: The other complication is that Collectl outputs the data into a file, but the name doesn’t remain constant. Only the filename prefix is customizable, and Collectl automatically appends the current date. The issue is that RSYSLOG is unable to follow this file if the name changes from day to day. A way around this is to use the latest version of RSYSLOG — version 8, which I assume most users are not yet utilizing. We created a short script with the older version that runs in cron inside the container and links a specific constant name to the ever-changing data collection name. RSYSLOG is then able to pick up that name despite the fact that it is a link to a target that changes daily. It’s like a pointer that moves to whatever the Collectl log filename is at any moment.

Container Checklist

— Collectl
— RSYSLOG
— Collectl output file link rotation script
— crontab config for rotation script

The Docker image can be pulled from DockerHub at this link: https://registry.hub.docker.com/u/logzio/logzio-perfagent/

2. Parsing the Data

After the collection and shipping stages comes data parsing. Collectl returns unstructured log data, which is basically a series of numbers that are fed into the Logstash Grok expression in order to get each field name and specific values.

Collectl configuration parameters explicitly set a specific output pattern. The RSYSLOG log configuration adds the time zone in a specific place in a shipped message. If you are using these two configurations together the Grok pattern that you need is:

3. Visualization

If you have a fast ELK stack, you will get the information almost immediately. Obviously this depends on the performance of your ELK, but you can expect results in half a minute or less, giving you a very up to date stream of information.

What Does Logz.io Provide?

Need help to monitor your environment’s performance? Our cloud-based ELK as a service can do that.


Conclusion: The Logz.io Free ELK Apps Library

By Asaf Yigal

elk apps library nginx

To assist the open source ELK community, we at Logz.io introduced ELK Apps, the largest collection of applications for the ELK Stack — and it’s all open to everyone free of charge. We already have more than one hundred apps for the community — and you can find them all here!

Why ELK Apps?

While working with our users community, we realized that everyone has much in common. Many people, for example, use similar tools and have similar goals when it comes to visualizing and getting alerts on their machine-generated data. Time and time again, we saw different users creating the same dashboards for MongoDB or the same alert parameters for Apache or Nginx logs. And a lot more!

It really makes no sense for every single user to do the same thing that others have already done — especially when their goals are similar.

So, we created ELK Apps because our goal as a company is to make the ELK Stack easily accessible and simple to use. (As popular as the stack is, hosted ELK is difficult to host and maintain on-premise.) Our objective is for users to be able to do the following in five minutes or less:

  • Get access to enterprise-grade ELK as a service with unlimited scalability and high availability
  • Ship logs and have the Logz.io automatically parse them
  • Gain visibility into the data via dashboards and alerts

What Are ELK Apps?

ELK Apps today include saved searches, visualizations, dashboards, and alert definitions that anyone can add to his or her environment with one click.

Apps are generated by our large community of users as well as Logz.io’s proprietary machine-learning algorithms that automatically build certain applications by looking at log data and determining in what format the data should be visualized for analysis.

Here are just a few of the ELK Apps:

How Can I Use ELK Apps?

Let’s say that you have shipped IIS server logs to Logz.io. You can go to the ELK Apps page and search for “iis.” Then, you will immediately see a collection of apps for IIS server monitoring:

elk apps iis

You can go through the different apps to see which specific ones would suitable for you and then add them to your environment with one click of a button. Then, you can open them and see the data immediately.

What If I Run My Own ELK Stack?

Even if you run your own ELK Stack, you can still take advantage of ELK Apps by opening a free account with Logz.io, downloading the apps that you need, and then going to the Kibana settings page to export the relevant objects.

Warning: Although it’s possible to use ELK Apps in your own stack, only experts should attempt it. The Kibana export and import requires a deep understanding of the platform, and we at Logz.io cannot guarantee that ELK Apps would function properly outside the Logz.io environment.

How Can I Contribute Apps of My Own?

We could not have done this without the contributions from our users.

We’ve added a “Contribute” button to every object in Kibana. All you need to do is save the search, visualization, or dashboard that you want to contribute and click on the “Contribute” button. You can select an image, write a short description, and name the app.

All apps are reviewed by us, and it usually takes between one to two business days to approve an app.

We’re really excited to release ELK Apps, and we hope the community will like it — check it out and let us know what you think!


Appendix: Our Additional ELK Stack Resources

Looking to use the ELK Stack for a very specific use case? Here is our ever-growing list of tutorials: