Management of Java Logs with the ELK Stack & Logz.io

By: Dotan Horovits

Java is a well-established object-oriented programming language that epitomizes cross-platform software development and helped to popularize the “write once, run anywhere” (WORA) concept. Java runs on billions of devices worldwide and powers a huge range of important software, such as the popular Android operating system and Elasticsearch. In this tutorial, we will go over how to manage Java logs with the ELK Stack and Logz.io.

Java applications, like applications running in any other language, need to provide visibility into their operations so that the personnel who manage them can identify and troubleshoot problems. This can be done by simply logging diagnostic information, but, since the observability requirements of these applications grow proportionally to their scope and scale, finding the right information can be tedious and time-consuming.

Fortunately, Elasticsearch (which is written in Java) is an excellent tool to store and search through lots of unstructured data. In this blog post, I’ll explain how to write logs from Java applications, how to get Java logs into Elasticsearch, and how to use Kibana to find the information you want.

Overview of Logging in Java

There are several different libraries used for writing Java logs, all of which have similar capabilities. I will be using the popular Log4j2 library in this example. In fact, Elasticsearch itself uses Log4j2 as its logging framework, so you may have encountered it during your Elasticsearch logging configuration.

Getting Started with Log4j2

To use Log4j2, you first need to add the relevant dependencies to your build tool’s configuration file. For instance, if you are using Maven, you’ll need to include the following in your pom.xml file:

<dependencies>
  <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-api</artifactId>
    <version>2.13.3</version>
  </dependency>
  <dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>2.13.3</version>
  </dependency>
</dependencies>

With that in place, a simple program like the one below is enough to see some logging output:

import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.LogManager;

public class Main {
    
    private static final Logger logger = LogManager.getLogger(Main.class);
    
    public static void main(String[] args) {
        
        logger.error("Application is running!");
    }
}

This produces output such as the following in the console:

11:57:44.146 [main] ERROR com.mycompany.javalogging.Main - Application is running!

Log4j2 configuration is defined in log4j2.xml. Since we have not configured anything yet, the logging library is using a default configuration which has:

A ConsoleAppender that writes the output to the console.
Appenders are used to send log data to different local or remote destinations, such as files, databases, sockets, and message brokers.
A PatternLayout that structures the output as shown above.
Layouts are used to format the log data as formatted strings in JSON, HTML, XML, and other formats. This allows the best fitting format for the log consumer.
A minimum log level of error, defined as the Logger’s root level in log4j. This means that any logs with a lower level (such as info) will not be written at all. That’s why, for illustration only, we are using a level of error for something that should really be an info message.

Note that the Java code is independent of the above configuration details, so changing those details does not require a code change. We’ll need to change these settings to reach our goal of centralizing logs in Elasticsearch and to make sure that the logs can be used by a wider range of Java applications.

Configuring Log4j2

Create a file called code style=”color:dodgerblue;”>log4j2.xml, and put it somewhere within reach of your classpath. Add the following inside it:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
  <Appenders>
    <File name="FileAppender" filename="/path_to_logs/myapp.log">
      <JSONLayout compact="true" eventEol="true">
          <KeyValuePair key="@timestamp" value="${date:yyyy-MM-dd'T'HH:mm:ss.SSSZ}" />
      </JSONLayout>
    </File>
  </Appenders>
  <Loggers>
    <Root level="trace">
      <AppenderRef ref="FileAppender"/>
    </Root>
  </Loggers>
</Configuration>

This configuration uses FileAppender with JSONLayout to write JSON-formatted output to a file for logs with the level trace and above. It also includes an @timestamp field which will help Elasticsearch determine the ordering of the time series data.

You will also need to add the Jackson Databind package as a dependency using your build tool, since it is needed at runtime. Using Maven, this means adding the following to pom.xml:

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.11.1</version>
</dependency>

Running the application with this configuration writes output such as the following to the specified file:

{"instant":{"epochSecond":1593862784,"nanoOfSecond":979948000},"thread":"main","level":"INFO","loggerName":"com.mycompany.javalogging.Main","message":"Application is running!","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","threadId":1,"threadPriority":5,"@timestamp":"2020-07-04T13:39:44.979+0200"}

The JSON structure makes it very easy to ship these logs to Elasticsearch, as I’ll explain in the next section.

Shipping Java Logs to Elasticsearch

Java logs can be sent to Elasticsearch for later retrieval and analysis. They can be sent directly from the application or written to files and later shipped by a data shipper such as Elasticsearch’s own Filebeat. The latter approach is more robust, since logs are offloaded to disk, meaning that they neither impact the application’s performance directly nor risk being lost if the application crashes.

If you’re able to set up and maintain your own ELK Stack, follow the instructions in our Complete Guide to the ELK Stack. Also look at our tutorials to install and configure Elasticsearch, Kibana, or Filebeat.

Shipping Java Logs to Elasticsearch Directly

You can also avoid the lightweight Filebeat and go straight to Logstash, but there are other options too. Officially supported Elasticsearch clients for specific languages are another option. Official support exists for Java—obviously—as well as Python, Ruby, Golang, JavaScript, Perl, PHP, and .NET (for C#).

But even more important, an application can write its logs directly to Elasticsearch. If it uses a logging library that supports Elasticsearch as a target, then this approach can be as simple as configuring an Elasticsearch Appender. If your library doesn’t support Elasticsearch, you can ship Java logs as HTTP payload to Elasticsearch REST API endpoint. Log4j2 library, for example, offers an official HTTPAppender, and you can find non-official appenders for Elasticsearch on GitHub

Because the logs never touch the disk before reaching Elasticsearch, direct shipping is a quick process. However, it requires that the application itself do extra work to buffer the logs in memory and eventually send them out over the network, which can impact your application’s performance. Another concern is that if the application is non-gracefully terminated before that buffer is flushed, there is a very real risk that logs may be lost.

While shipping logs with Filebeat as described in the following section is largely considered more robust and performant, shipping directly to Elasticsearch is a viable option in environments such as cloud-hosted Docker containers where a durable and persistent disk is not available.

Shipping JSON Logs with Filebeat

Filebeat is a lightweight, open-source log shipper that sends logs from files to Elasticsearch. Since it already supports JSON-structured logs, all we need to do is set up the configuration in /etc/filebeat/filebeat.yml as follows:

filebeat.inputs:

- type: log
  enabled: true
  paths:
    - /path_to_logs/*.log
  json:
    keys_under_root: true
    overwrite_keys: true
    message_key: 'message'

output.elasticsearch:
  hosts: [elasticsearch_endpoint]

processors:
  - decode_json_fields:
      fields: ['message']
      target: json

Now, all that remains is to start Filebeat using the following command:

sudo service filebeat start

Filebeat should then pick up the logs and send them to Elasticsearch. Figure 1 shows what the logs look like when a subset of their fields is organized into a Kibana table:

Figure 1: Log data shown in Kibana for a self-managed Elasticsearch setup

Shipping Raw Text Logs with Filebeat

It’s not always possible to structure logs in JSON format. It might be too time-consuming to change the logging configuration for hundreds of microservices that already use a consistent format. Or, when using third party tools and frameworks, such as an Apache HTTP server, which prebuilt logging supports text format only. In these cases, text logs with a defined format can still be shipped into Elasticsearch, but the process is more complex. These logs need to go through either Logstash or an ingest pipeline, and Grok expressions must be configured in order to parse the log strings, extract the relevant fields, and turn them into a JSON document. These steps are required for Elasticsearch to index and store the logs correctly.

Shipping Java Logs to Logz.io

Running and managing ELK can be difficult for some people though – especially as ELK deployments get larger, and developers and DevOps find themselves investing more and more time in handling scaling, configuring ingestion queues, managing shards and indices and doing ELK upgrades. That’s where Logz.io can help – teams use the same ELK Stack they already know, with the same logging libraries and log shippers, without needing to manage and maintain ELK themselves at scale.

In the following section, I’ll describe the different ways to get your logs into Logz.io.

Using the Java Appenders for Logz.io

You can send your logs directly to Logz.io from your application using the provided appenders for either the Log4j2 or Logback logging libraries.

Assuming you’re using Log4j2 as the logging library and Maven as the build tool, your first step is to make sure you have the following dependencies in your pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-log4j12</artifactId>
        <version>1.7.26</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.13.3</version>
    </dependency>
    <dependency>
        <groupId>io.logz.log4j2</groupId>
        <artifactId>logzio-log4j2-appender</artifactId>
        <version>1.0.12</version>
    </dependency>
</dependencies>

Then, add the following configuration to a log4j2.xml file that is within reach of your classpath:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
      <LogzioAppender name="Logzio">
          <logzioToken>your_logzio_token</logzioToken>
          <logzioUrl>https://listener.logz.io:8071</logzioUrl>
          <logzioType>java</logzioType>
      </LogzioAppender>
    </Appenders>
    <Loggers>
        <Root level="info">
          <AppenderRef ref="Logzio"/>
        </Root>
    </Loggers>
</Configuration>

There are a couple of things you’ll need to update in this configuration. The first is your Logz.io token, which you can find in your Logz.io account. Go to the cogwheel in the top right corner of the Logz.io interface, then click Settings -> General. Also, you might need to update the listener URL, depending on the region in which your account is hosted.

At this point, writing logs to Logz.io is no different from writing them to a file or any other standard destination as far as the code is concerned. The destination of the logs is purely a matter of configuration. Let’s use the following code to write some logs:

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class NewMain {

    private static final Logger logger = LogManager.getLogger(NewMain.class);
    
    public static void main(String[] args) {
        
        logger.info("Application is running!");
        
        try {
            var array = new int[2];
            var item = array[5];
        }
        catch (Exception ex) {
            logger.error("A problem occurred!", ex);
        }
        
        logger.warn("Disk is filling fast!.");
        
        logger.info("Application is stopping.");
    }
    
}

After a few seconds, the logs should appear in Logz.io’s Kibana interface:

Figure 2: Logs shipped directly to Logz.io show up in Kibana

Shipping JSON Logs with Filebeat to Logz.io

While it’s easy to send logs directly from an application to Logz.io, this approach is not always ideal because it makes the application directly responsible for sending the logs. This means that accumulating logs can affect the application’s memory footprint, and logs may be lost if the application crashes before the log buffer has been flushed.

A more robust solution is to log to files on disk and offload to Filebeat the responsibility of shipping the files to Elasticsearch. You’ve already seen how to do this with a self-managed ELK Stack, and this same approach works with Logz.io. Follow the instructions in our post Shipping Logs to Logz.io with Filebeat, and set up your inputs in /etc/filebeat/filebeat.yml as follows:

filebeat.inputs:

- type: log
  paths:
    - path_to_logs/*.log
  fields:
    logzio_codec: json
    token: your_logzio_token
    type: java
  fields_under_root: true
  encoding: utf-8
  ignore_older: 3h

You will need to set your Logz.io token (as explained earlier) and the path(s) to your logs. You should have an output section that looks like this:

output:
  logstash:
    hosts: ["listener-nl.logz.io:5015"]  
    ssl:
      certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']

This is all you need for Filebeat to ship your JSON-structured logs to Logz.io.

Shipping Raw Text Logs with Filebeat to Logz.io

Logs that aren’t in JSON format can also be shipped. The setup for this process requires a little more effort, however, because specific parsing rules need to be applied depending on the format of the logs. Logz.io provides pre-built log parsing pipelines for a variety of common tools and frameworks, such as Apache Access logs, Nginx, MySQL, and various AWS services. You can read more about the built-in log types here. If you need a new pipeline created or want a customized version of an existing pipeline, contact Logz.io Support to leverage the Parsing-as-a-Service included in your package.

Analyzing the Data

You can use Kibana, Elasticsearch’s handy user interface, to search and view your log data. Its Discover component gives access to this functionality. Kibana shows a time series representation of the log data which you can search, filter, and organize in different ways. You can also use the Visualize and Dashboard components to present aggregations of your log data, such as a statistical breakdown of the different log levels or a list of the most frequent errors. Elasticsearch is capable of storing all of the arbitrary fields you need to enrich your data, making your troubleshooting needs efficient and painless, even for distributed applications.

Figure 3: A dashboard in Kibana showing a comparison of different log levels alongside a list of the most common errors

Conclusion

Java applications of all sizes can benefit from the enhanced visibility provided by a good logging mechanism. There are several logging libraries available for Java. This post looked at the commonly used Log4j2 library. Logs are only useful when the process of extracting information from them, such as when troubleshooting a production issue, is smooth and efficient.

Centralizing logs is becoming increasingly important as systems become more distributed and the adoption of microservices architecture grows.

Elasticsearch and its companion, Kibana, are the most widely used tools for storing, visualizing, and analyzing log data at any scale. Logs can be sent to Elasticsearch in different ways, including directly from the application and by using a data shipper such as Filebeat. Different options are also available for running the Elasticsearch cluster, ranging from managing everything yourself to using a managed ELK Stack, such as Logz.io, that prevents you from having to worry about infrastructure and provides pre-built parsing common logs.

Whatever setup you choose, centralizing and indexing logs from various sources and being able to search through them easily will allow you to quickly access the diagnostic information you need when you need it most.