featured-image (1)

Docker Swarm is a native clustering management tool for Docker. Essentially, it turns a pool of Docker containers into one single, virtual Docker host. Swarm serves the standard Docker API, and any tool that already communicates with a Docker daemon can use Swarm to scale to multiple hosts transparently.

Properly monitoring the health of a distributed system is crucial to be able to identify and troubleshoot issues on time, but is also a challenge.

This guide describes how to establish a centralized logging architecture for a Swarm cluster by collecting event data (such as container status per nodes and container actions) and shipping them to the Logz.io ELK Stack (Elasticsearch, Kibana and Logstash).

These steps will explain how to create the Swarm cluster, prepare the nodes for logging,  and track events using the Docker Swarm API.

Creating the Swarm cluster

Our first step is to create a Swarm cluster in a sandbox environment so that we can safely test the logging architecture. We will create a local cluster consisting of three virtual machines: one for the Swarm manager and two for additional cluster nodes.

To follow the next steps, make sure that you have Docker Toolbox, Docker Machine, and VirtualBox installed.

Before we begin, we have to stop any virtual machine that is running to avoid a conflict when creating and connecting the Swarm manager with the nodes. Use the docker-machine ls command to see if there are any machines running, and you should get this output:

To stop a running virtual machine, you can use:

Creating the cluster nodes

Once you’ve stopped the virtual machines, use the following command to create the Swarm manager that will be responsible for the entire cluster and in charge of managing the resources of all of the containers in the cluster:

Our next step is to deploy the two additional cluster nodes (node-01 and node-02) by using a similar command to the one used for creating the Swarm manager.

For node-01:

For node-02:

Creating a discovery token

Our next step is to use the hosted discovery service on the Docker Hub to create a unique discovery token for the cluster, which we will then use to form the nodes into one cohesive cluster.

First, connect the Docker client to the manager:

Then, create a token for the Swarm cluster:

Docker will now retrieve the latest Swarm image and run it as a container. The create argument instructs the Swarm container to connect to the Docker Hub discovery service using a unique Swarm ID (also known as a “discovery token.”) The token appears in the output, but it’s a good idea to save it for later use because it is not not saved on file due to security considerations.

Enter the docker-machine ls command, and you will see a list of locally-running virtual machines. The output should look as follows:

Forming the cluster

We will now add the two nodes to the cluster, where a manager is responsible for the entire cluster.

First, connect to the manager using this command:

Next, we will enter the following command to run a Swarm container as the primary cluster manager (where <PORT> is to be replaced by the desired port and <TOKEN> will be replaced with the actual discovery token):

To join node-01 to the cluster, use:

To join node-02 to the cluster, use:

Let’s review our cluster using this command:

And then:

The output should look something like this:

Collecting the event logs

Now, it starts to get interesting. Here, we will describe how to collect the machine logs generated by Swarm for subsequent forwarding into the ELK Stack for analysis.

Now, the –engine-env flag that we used above when creating and forming the cluster nodes already flagged which manager and agent nodes can use the UNIX socket for logging. Also, we used DOCKER_OPTS to enable logging per node.

Using these parameters enables us to get event information using the following command:

It’s important to note that we’re using a UNIX socket to retrieve the log data. You can, if you like, use the Docker remote API as well. Check this guide to learn how to change and use the Remote API.

To see the logs of both of our cluster nodes, we need to use the following commands:

Next, open a new terminal window and connect the manager node with:

Then, run:

You will get the following event data displayed:

Additional machine data that can be useful for logging containers can be retrieved using this command:

In this case, the output returned will be much extensive, and will look as follows:

For a complete list of Docker Swarm events that can logged using APIs, check out the Docker docs.

Using the commands above, we can retrieve event logs for any of the nodes in our Swarm cluster.

We want this data to be shipped into ELK, so our next step is to output these events into files using the following commands:

Of course, retrieving logs from the nodes can also be automated by using cron jobs or customized schedulers.

Shipping the event logs into ELK

There are a number of methods for shipping the Swarm event logs into ELK. This section will outline two of them: AWS S3 buckets and Logstash. Please note: The configurations here are optimized for shipping to the ELK Stack that hosted by Logz.io. If you are using your own ELK instance, you should use the Logstash method and apply changes to the configuration file.

Using AWS S3

Using the AWS CLI sync command, you can sync your local storage easily:

We have set the date here to group and store the logs on S3 based on their timestamps.

Once synced with S3,  we need to configure shipping from the S3 buckets in the Logz.io Log Shipping section.

As shown below, after clicking on the Log Shipping item and selecting “S3 Bucket” from the “AWS” dropdown box, you will see the fields that you need to fill out to run the service properly. The required fields are the bucket names — in this case, the swarm logs, an S3 access key, and a secret key that you should have received from AWS.

S3 shipping in Logz.io

Using Logstash

As explained above, this shipping option is obviously better if you’re using your own ELK Stack (and the guide below presumes that you have the stack installed already).

Open the Logstash configuration file and configure Logstash to track the Swarm event files that we have created. The Filter section, in this case, includes
the user token for shipping to Logz.io — so, if you’re using your own Logstash, you can remove this section. Also, in the output section, enter your Elasticsearch host IP instead of the Logz.io listener:

Last but not least, start Logstash while passing the configuration file above as an argument:

Building a Kibana dashboard

Our final step is to begin to analyze the logs. As an example of how to use Kibana to visualize Swarm events logs, we will describe how to create a dashboard to monitor the containers in the cluster.

Number of containers over time

The first chart that we will create is an area chart that is displayed in a histogram that shows the number of containers over time.

To create the chart, click the “Visualize” tab in Kibana and select the area chart visualization type from the menu. Then, in the left-hand box under the “Data” tab, use the X axis as a date histogram and under “Split Area,” select “Terms” as a sub-aggregation. As a field, select Actor.Attributes.container.

You can also specify the size of the items that will be taken in the query. After hitting the green button above the settings box, you will see the result chart (note the configuration on the left):

Number of containers over time

Distribution of actions

In this visualization, we can view all of the various Swarm actions (for example, pull, commit, create, connect, and disconnect).

Again, click the “Visualize” tab in Kibana and this time, select the pie chart visualization type. On the left-hand side, you will have to select the “Action” field from the aggregation drop-down menu.

After hitting the green button above the settings box, you will see this resulting chart:

Distribution of events

Swarm event logs over time

Another example of how to visualize Docker Swarm event logs is to create a line chart that displays logs over time. Here is the configuration, and an example, of the resulting visualization:

Swarm event logs over time

Creating a dashboard

Once you’ve created a series of visualizations, you can put them all together in a comprehensive dashboard. This allows you to monitor your Swarm cluster from a single place.

Creating a dashboard is simple — simply select the “Dashboard” tab in Kibana and manually add each visualization to create something like this:

Swarm Kibana dashboard

If you’re using Logz.io, this dashboard is available in our ELK Apps library of pre-made dashboards and visualizations, so you can easily install it here with one click.

Summary

Docker Swarm is a great tool for building and managing a Docker clustered environment, but it is critical to know what is going on inside the cluster to be able to make sure that everything is functioning as expected.

Being able to monitor the cluster will enable you to identify whenever something is going wrong with your services by providing you with a clear picture of the events taking place within Swarm in real time.

Of course, this guide outlined our recommended method for logging Swarm with ELK, but you can create and deploy your own ELK Stack and configure the shipping however you wish.

We’d love to hear how you’re handling logging for Docker Swarm — leave a comment below.

Logz.io is a predictive, cloud-based log management platform that is built on top of the open-source ELK Stack and can be used for log analysis, Docker monitoring, business intelligence, and more. Start your free trial today!

Asaf Yigal is co-founder and VP Product at Logz.io. Prior to Logz.io, Asaf co-founded Currensee, a social-trading platform, which was later acquired by OANDA in 2013. Prior to Currensee, Asaf played executive roles at Akorri in developing an end-to-end performance monitoring platform and at Onaro in developing a storage resource management platform. Both Akorri and Onaro were acquired by NetApp. Prior to Onaro, Asaf headed a research team in the Israeli Navy, taking an artificial intelligence system to military deployment. Asaf holds a B.S. from the Technion and is an Instrument-rated private pilot.