Shipping Python Logs with the ELK Stack & Logz.io

By: Gedalyah Reback

Logging Python Apps with the ELK Stack & Logz.io

Logging is a feature that virtually every application must have. No matter what technology you choose to build on, you need to monitor the health and operation of your applications. This gets more and more difficult as applications scale and you need to look across different files, folders, and even servers to locate the information you need. While you can use built-in features to write Python logs from the application itself, you should centralize these logs in a tool like the ELK stack.

Thanks to Elasticsearch’s efficiency sifting large volumes of data, application developers can quickly narrow down the most important logs. Dashboards are sendable to operations teams, enabling them to react quickly if it detects anomalous behavior.

This article will focus on building a robust application logging infrastructure for one specific technology: Python applications. Python is an extremely popular and easy-to-use general purpose programming language. It’s a great choice for a wide range of activities, from learning to program to implementing complex machine learning solutions.

Overview of Logging in Python

Python comes with a logging module that is very flexible and easy to use. Like many logging libraries, Python can log at multiple levels (e.g., INFO or ERROR), format the log output in various ways, and write to different destinations (e.g., console or file). In fact, logging something to a file is as simple as this:

import logging

logging.basicConfig(filename="app.log", level=logging.DEBUG)

logging.info('Application running!')

The app.logfile is automatically generates if it does not already exist, and, after running the above code, it should contain the following:

INFO:root:Application running!

As you add more information to your logs, it makes sense to write them in a format that is both human-readable and machine-parsable. This is called structured logging. JSON-structured logs are particularly easy to ship into an ELK stack.

It’s easy to integrate the process of structuring logs in JSON format with Python’s logging module, which provides handlers and formatters to separate the concerns of dealing with output destinations and formatting the logs themselves. This separation allows you to customize any part of a log’s journey from the application code to its destination(s). In fact, python-json-logger is one freely available JSON logger for Python. To set it up, first install it via pip:

pip install python-json-logger

Next, you can set up JSON logging using a configuration file logging.conf with the following structure:

[loggers]
keys = root

[logger_root]
level = INFO
handlers = root

[handlers]
keys = root

[handler_root]
class = FileHandler
level = INFO
formatter = json
args = ('application.log',)

[formatters]
keys = json

[formatter_json]
class = __main__.ElkJsonFormatter

Finally, the following code allows you to write JSON logs:

import logging
import logging.config
from pythonjsonlogger import jsonlogger
from datetime import datetime;

class ElkJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(ElkJsonFormatter, self).add_fields(log_record, record, message_dict)
        log_record['@timestamp'] = datetime.now().isoformat()
        log_record['level'] = record.levelname
        log_record['logger'] = record.name

logging.config.fileConfig('logging.conf')
logger = logging.getLogger("MainLogger")

logging.info('Application running!')

This code loads the earlier-defined configuration file that uses the <codeElkJsonFormatter class from the code. We could have used the JsonFormatter class (from python-json-logger) directly to produce JSON logs. However, in this case, we are setting specific fields (particularly @timestamp) which make it easier to ship logs to Elasticsearch, producing the following structure:

{"message": "Application running!", "@timestamp": "2020-02-22T16:02:58.874694", "level": "INFO", "logger": "root"}

Shipping Python Logs to the ELK Stack

Once our logs are in a structure that we can reason about, they can be shipped to Elasticsearch for processing and for the production of the insights we need. This is easiest to set up when the logs are in JSON format, but it is also possible to work with other non-JSON logs, as long as they have a sufficiently clear structure to be parsed.

Setting Up the ELK Stack

If you want to set up your own ELK stack, follow the Installing Elasticsearch official documentation and choose your preferred installation method. Installing and maintaining your own ELK stack involves more work than using a managed one, but it is a perfectly viable solution if you have the knowledge to manage it and would like a higher level of control.

Alongside Elasticsearch, you will also need to install Kibana (to search through and visualize the logs) and Filebeat (to ship the logs).

Shipping JSON Logs with Filebeat

Filebeat already has the ability to ship JSON logs directly into Elasticsearch, thanks to its JSON processor. We can use the following configuration in /etc/filebeat/filebeat.yml to achieve this:

filebeat.inputs:

- type: log
  enabled: true
  paths:
    - /path_to_logs/*.log
  json:
    keys_under_root: true
    overwrite_keys: true
    message_key: 'message'

output.elasticsearch:
  hosts: ["localhost:9200"]

processors:
  - decode_json_fields:
      fields: ['message']
      target: json

Once this is configured, we can start Filebeat using the following command:

sudo service filebeat start

When Filebeat ships the logs, you will see that they appear in Elasticsearch with fields corresponding to the JSON structure we set up earlier. We can filter based on these fields or use them in visualizations and dashboards, as will be shown later.

Figure 1: JSON logs shipped with Filebeat appear in Elasticsearch with structured fields

Shipping Raw Text Logs with Filebeat

Logs that are not in JSON can still be shipped to Elasticsearch as long as their arbitrary structure is sound enough to allow them to be parsed. However, since they are not directly compatible with Elasticsearch, another component between Filebeat and Elasticsearch needs to parse the logs. Either Logstash or an ingest pipeline in Elasticsearch itself is typically used to do this. Use Grok expressions to define the parsing rules.

Shipping Python Logs to Logz.io

If you prefer to focus on developing and deploying applications instead of setting up and maintaining Elasticsearch clusters, then you should consider using a managed ELK stack, such as Logz.io. To get your logs into Logz.io, you can use either the provided Python Handler or Filebeat.

Using the Python Handler for Logz.io

You can easily send your Python logs directly to Logz.io using the provided Python Handler. To use it, you’ll first need to install it via pip:

pip install logzio-python-handler

Next, create a configuration file (e.g., logging.conf) and set up the Python Handler as follows:

[handlers]
keys=LogzioHandler

[handler_LogzioHandler]
class=logzio.handler.LogzioHandler
formatter=logzioFormat
args=('your_logzio_token', 'python', 3, 'https://listener.logz.io:8071')

[formatters]
keys=logzioFormat

[loggers]
keys=root

[logger_root]<
handlers=LogzioHandler
level=INFO

[formatter_logzioFormat]
format={"additional_field": "value"}

It’s important to note that you need to set your Logz.io token. Click the cogwheel in the top right portion of the screen, then go to Settings -> General. If you’re set up in a different region, you’ll also need to change the listener URL accordingly.

We can test this out using the following code:

import logging

import logging.config

logging.config.fileConfig(‘logging.conf')
logger = logging.getLogger('LogzioLogger')

logger.info('Application is running! (Python Handler)')

try:
    1/0
except:
    logger.exception("Don't divide by zero!")

After running this, logs are visible in Logz.io within a few seconds:

Figure 2: Logs shipped using the Python Handler appear in Logz.io

Shipping JSON Logs with Filebeat to Logz.io

The Python Handler is typically a great solution for shipping Python logs to Logz.io. However, it might not be the best choice when working with legacy applications that are too expensive and/or too risky to change. Additionally, a potential problem with Python Handler is the loss of logs buffered in memory. This can happen if an application crashes suddenly before those logs have been sent to Logz.io.

To get around these issues, you can write logs to a file and ship them to Logz.io using Filebeat, exactly as we did earlier for the self-managed ELK stack. Follow the instructions in Shipping Logs to Logz.io with Filebeat, and set up your Filebeat inputs in /etc/filebeat/filebeat.yml as shown below:

filebeat.inputs:

- type: log
  paths:
    - /path_to_logs/*.log
  fields:
    logzio_codec: json
    token: your_logzio_token
   type: python
  fields_under_root: true
  encoding: utf-8
  ignore_older: 3h

Once again, you’ll need to set your Logz.io token using the process explained in the previous section. If you followed the setup instructions correctly, you should also have an output section that looks something like this:

output:
  logstash:
    hosts: ["listener.logz.io:5015"]
    ssl:
      certificate_authorities: ['/etc/pki/tls/certs/COMODORSADomainValidationSecureServerCA.crt']

This should be enough to get your JSON-structured logs into Logz.io with Filebeat.

Shipping Raw Text Logs with Filebeat to Logz.io

If your logs are not in JSON structure, then you’ll need to set up data parsing rules in order for Logz.io to be able to accept them. Our support engineers can help you to do this.

Analyzing the Data

Once application logs are in ELK stack with the right structure, they’re easy to search and filter. For instance, you can look specifically for errors, and, once you find them, probe deeper to identify what the application was doing when the error occurred. It’s also possible to gain additional application-specific insights when adding custom fields to the logs (e.g., requests from a particular user).

Kibana can also be used to create visualizations and dashboards that offer quick snapshots of the overall system health, such as the one below:

Figure 3: An example of a visualization showing the proportion of log levels (INFO, WARNING and ERROR) in a system

The screenshot above shows a simple example of a log level ratio visualization. You can make visualizations for other fields as well, and later combine them within custom, adjustable dashboards.

Conclusion

Python logging is flexible and easy to set up, and it offers a variety of ways to format and output application logs. However, it can be very tedious to manually sift through logs stored in different places. Shipping these logs to an ELK stack can solve this problem, where developers and other staff can leverage the power of ELK to search and analyze them.

It is easiest to ship logs to Elasticsearch if they are formatted in JSON. Otherwise, you will need some additional setup to ensure that the various fields can be parsed and indexed correctly. Log shipping is typically done via Filebeat, although, if you are shipping logs to Logz.io, you also have the option of using the purpose-built Python Handler. As a managed ELK service, Logz.io takes away the complexity of managing the ELK stack, allowing you to focus on application monitoring.