Quick Guide to Log Shipping: Collectors, Code, and Cloud with Logz.io

By: Charlie Klein

A Quick Guide to Log Shipping To Logz.io: Collectors, Code, and Clouds

One of the great things about Logz.io Log Management is that it’s based on the most popular open source logging technology out there: the ELK Stack (click here to view our thoughts and plans on the recent Elastic license). This means Logz.io users get to leverage log shipping and collector options within the rich ELK ecosystem.

So how do you know which log shipping technology to use? We’ve assembled the most popular log shipping technologies our customers use, so you can decide what’s best for you.

Open Source Log Shippers & Collectors

Filebeat

In an ELK-based logging pipeline, Filebeat plays the role of the logging agent—installed on the machine generating the log files, tailing them, and forwarding the data to either Logstash for more advanced processing or directly into Elasticsearch for indexing.

Written in Go, Filebeat was designed to have a low memory footprint, handle large bulks of data, support encryption, and deal efficiently with back pressure.

Should I use it?

If you know where your logs are being written (to a file or directory), then FileBeat is probably your best and easiest option. The Filebeat configuration has an input and an output. This means if you can find an input method for your data source, FileBeat can forward your logs to Logz.io or the relevant end point.

If you’re already using FileBeat, then it’s especially easy to get started with Logz.io. Simply add a new configuration with Logz.io Filebeat Configuration Wizard, restart filebeat, and you should begin to see your logs stream into Logz.io.

Fluentd

Fluentd is an open source log collector, processor, and aggregator that was created back in 2011 by the folks at Treasure Data. Fluentd is one of the most popular log aggregators used in ELK-based logging pipelines. In fact, it’s so popular that the “EFK Stack” (Elasticsearch, Fluentd, Kibana) has become an actual thing.

It’s especially popular for Kubernetes log shipping and collecting. Deploying Fluentd as a daemonset, users can spin up a Fluentd pod for each node in their Kubernetes cluster with the correct configurations to forward data to Logz.io. There is a specific Kubernetes Fluentd daemonset for running Fluentd. You can clone the repository here:

$ git clone https://github.com/fluent/fluentd-kubernetes-daemonset

Design wise — performance, scalability, and reliability are some of Fluentd’s outstanding features. A vanilla Fluentd deployment will run on ~40MB of memory and is capable of processing above 10,000 events per second.

Should I use it?

If you’re using Kubernetes, Fluentd is usually industry standard to collect and ship your logs. The reason is because Fluentd is deployed as a daemonset, meaning information will be automatically and natively transmitted to Logz.io, or many other logging tools.

It’s similar to its sister program Fluent Bit in a lot of ways (more detail on Bit below), so we wrote this comparison of the two technologies here. To summarize Fluentd has a much richer ecosystem, including 700 plugins that extend its functionality. Fluent Bit has a much more limited ecosystem, but is the right choice for scenarios where limited computing resources is a huge consideration.

Filebeat Autodiscover

Microservices are constantly moving — whether started, killed, duplicated, or rebooted. This can make it difficult to ensure that you’re collecting all of the log data from short-lived containers.

Autodiscover allows you to track pods and adapt settings as changes happen in your environment. You achieve that by configuring respective nodes or pods, and consequently the autodiscover subsystem can recognize and monitor services as they start running within the cluster.

Click here to learn more about use cases and configuring Filebeat Autodiscover.

Should I use it?

Filebeat Autodiscover is another alternative to log shipping collectors from Kubernetes environments. It’s not as popular as Fluentd, but it works just the same!

If you’re a Filebeat enthusiast, it can serve as a reliable alternative to Fluentd or Fluent Bit.

Fluent Bit

Fluent Bit is an open source log collector and processor also created by the folks at Treasure Data in 2015. Written in C, Fluent Bit was created with a specific use case in mind — highly distributed environments where limited capacity and reduced overhead (memory and CPU) are a huge consideration.

For Kubernetes deployments, a dedicated filter plugin will add metadata to log data, such as the pod’s name and namespace, and the container’s name/ID.

Should I use it?

Fluent Bit was originally designed for IoT devices, but also works well for collecting and shipping Kubernetes logs. It’s deployed as an agent that runs alongside your Kubernetes cluster.

While Fluent Bit consumes fewer resources than Fluentd, it has less adoption among the community and has far fewer plugins.

Still trying to decide between Fluentd and Fluent Bit? Check out our comparison of the two technologies here.

Shipping Logs Directly from Code

This section is much more straightforward than the last: if you want to ship application logs, you’ll need to instrument your code to expose the relevant data and send it to the ELK Stack or Logz.io. There are many libraries out there that make this relatively easy.

But don’t be fooled. Application logging requires careful planning and consideration to make the logs data structured so it can be actionable. Check out our blog on what to consider when logging your apps.

Logging Java apps

Configure Log4j 2 or Logback to ship Java logs to Logz.io. Both libraries send logs using non-blocking threading, bulks, and HTTPS encryption to port 8071.

Logging Python apps

Logz.io Python Handler sends logs in bulk over HTTPS to Logz.io. Logs are grouped into bulks based on their size. Get more detail in this blog on Python logging.

Logging node.js apps

logzio-nodejs collects log messages in an array, which is sent asynchronously when it reaches its size limit or time limit (100 messages or 10 seconds), whichever comes first.

Logging Go apps

logzio-go uses goleveldb and goqueue as a persistent storage implementation of a persistent queue, so the shipper backs up your logs to the local file system before sending them. Check for more detail in this blog about shipping Go logs.

There are other libraries you can use for other languages, which you can find here.

Log Shipping with Plug-and-play Cloud Integrations (Logz.io only)

AWS S3 Integration

One easy way to get your logs generated on AWS workloads into Logz.io is via S3. Logz.io can periodically read your S3 buckets and ingest the new logs collected in the buckets.

First, you’ll need to be sure that your logs are being written to an S3 bucket, and then you can pull directly from S3 by defining your S3 bucket and IAM policy from within Logz.io.

Should I use it?

If you don’t want to install an agent like Filebeat, this is a great option. Logz.io has a native integration that pulls directly from your S3 bucket every X seconds.

Note: Logz.io will pull your logs in ascending alphanumeric of order of your S3 folders. This is important because the S3 fetcher’s offset is determined by the name of the last file fetched. We recommend using standard AWS naming conventions to determine the file name ordering and to avoid log duplication.

AWS CloudWatch Integration

Another way to get your logs from AWS to Logz.io is through Logz.io’s CloudWatch integration, which uses a Lambda shipper to automatically forward logs from CloudWatch to Logz.io. Since it’s easy to consolidate your AWS logs with CloudWatch, this is a popular integration among our customers.

Should I use it?

This is a good option for those who are already collecting their logs in CloudWatch. You can simply add Logz.io’s Lambda integration function to automatically forward your logs (and metrics) to Logz.io.

The downside, of course, is that you’ll need to pay for CloudWatch.

Check out this AWS S3 and Cloudwatch logging tutorial for more.

Azure EventHub Integration

Logz.io’s integration with Azure EventHub is a fast and easy way to get your logs on Azure workloads into Logz.io.

Our Azure Deployment template automatically deploys a namespace and an Event Hub to collect log data from an Azure region, and uses a Function to forward that data to Logz.io. All you’ll need to do is add some parameters to the template. Learn about the details for deploying the template here.

The architecture is relatively similar to the AWS CloudWatch integration. But like the CloudWatch integration, the obvious downside is that you’ll need to pay for EventHub.

Should I use it?

This is a great option for those already using Azure EventHub and want to forward that data to Logz.io. Also, if you don’t want to install an agent, this is another good option.

Check out our full Azure Monitoring Guide.

GCP Stackdriver

Stackdriver is a GCP cloud service that collects logs for analysis. Our integration with Stackdriver also makes it easy to forward those logs to Logz.io.

You can use Google Cloud Pub/Sub to forward your logs from Stackdriver to Logz.io. Learn how to configure your Pub/Sub forwarder here. Get more info at this blog on Stackdriver logging.

Use Cases

Like the EventHub and CloudWatch integrations, the Stackdriver integration is great if you’re already using Stackdriver or do not want to install an agent on your GCP workload. But, since you have to pay for it, many of Logz.io’s GCP customers prefer other methods.

Wrapping it up

The log shipping collectors, code sources, and cloud sources that we covered here are the most popular ways Logz.io customers typically get their log data to Logz.io. But you can find more options in our log shipping docs.

If you didn’t get the answer you were looking for here, and you have a Logz.io account, your best resource is our highly responsive 24/7 Customer Support team. They’ll help you get your logs shipped (and parsed!) in no time.