Originally published at The New Stack.
Logs, metrics and traces are the three pillars of the Observability world. The distributed tracing world, in particular, has seen a lot of innovation in recent months, with OpenTelemetry standardization and with Jaeger open source project graduating from the CNCF incubation. According to the recent DevOps Pulse report, Jaeger is used by over 30% of those practicing distributed tracing. Many companies realize the need for distributed tracing to gain better observability into their systems and troubleshoot performance issues, especially when dealing with elaborate microservices architectures.
Starting with Jaeger, the first step is to instrument your code to send traces to Jaeger. The second part is setting up the Jaeger backend to collect, process and visualize your traces. In this post I’ll go over what it takes to deploy and manage Jaeger backend in production. I’ll cover:
- Jaeger components for installation
- Non-Jaeger components used by Jaeger, such as backend storage
- Jaeger Deployment strategies, in particular around production systems
- Agent vs. Agentless
- Agent installation methods: sidecar vs. DaemonSet
- Installation tools: Manual, Operator, Helm chart
When deploying Jaeger Tracing, you’ll need to address the following components:
- Agent is the component co-located with your application to gather the Jaeger trace data locally. It handles the connection and traffic control to the Collector (see below) as well as data enrichment.
- Collector is a centralized hub collecting traces from the various agents in the environment and sends for backend storage. The collector can run validations and enrichment on the spans.
- Query retrieves the traces and serves them over a UI.
There are obviously many more details on each component and other optional Jaeger components, but I’ll keep it simple for the sake of this discussion. Let’s see how to deploy the agent, collector and query components in various setups and strategies.
External Components Used By Jaeger
Depending on your deployment strategy (see below), Jaeger may make use of other (non-Jaeger) components, primarily a persistent backend storage (Elasticsearch, Cassandra or others) and a streaming ingestion queue (Kafka). These services are typically deployed independently and you’ll just need to point Jaeger to the relevant endpoints, though you can also have Jaeger self-provision them. This is a broad topic, which I plan to address on a separate post, so stay tuned.
You may want to deploy Jaeger on many different systems ranging from from your own laptop for development purposes to large scale and high load production environments. Here are some useful deployment strategies you can use:
- jaegertracing All-in-One: This is an easy setup to deploy, good for trying out the product, development and demo usage. You can run it as a prepackaged binary or a Docker image. It packages and deploys all the services together with in-memory storage in a single replica.
- Production: focused on production environment needs for high availability and scalability. It deploys each backend service independently, and supports multiple replicas and scaling options. It also uses persistent backend storage to keep the tracing data resilient. It currently supports Elasticsearch and Cassandra storage solutions, with Elasticsearch as the recommended solution for production environments.
- Streaming: For high load environments, this setup adds Kafka to the Production deployment strategy to take pressure off the backend storage. If you need to run post-processing logic on the traces, it makes it easier to execute before writing to the storage.
The all-in-one setup is easy to start with, and comes with an executable bundle to launch. If you want to start experimenting with it, check out this tutorial on how to do it in conjunction with Elasticsearch backend as well as Kibana which you can use for extra visualization.
For the rest of this post I’ll focus on the deployment for production and the considerations and options involved in this process.
Can Jaeger Run Agentless?
The agent needs to reside with any instance of your application. If you run an elaborate microservices architecture, with multiple agents needed, you may find yourself wondering if you can avoid the agent. The short answer to that would be – don’t go agentless.
The longer answer is that technically you can make your Jaeger client libraries send the span data directly to the Collector, but you would need to handle various aspects yourself such as the lookup of the Collector, traffic control and tagging the spans with additional metadata based on local system information.
While using Jaeger Agent is the recommended deployment, there are scenarios in which you cannot deploy an agent. For example, if your application runs as AWS lambda function or similar serverless frameworks where you cannot control pod deployment and Agent co-location. Agent will also be ineffective if you use Zipkin instrumentation. In such cases, the spans should be submitted directly to the Jaeger Collector.
Jaeger Agent Installation methods
The agent needs to reside together with your application, so the Jaeger client libraries can access it on localhost and send it the data over UDP without the risk of data loss due to network hiccups (unlike TCP, UDP transmission protocol doesn’t include data loss protection, but is therefore faster and more economical).
The ways to achieve co-location for Jaeger in Kubernetes environments are either as a sidecar or as a daemonset. Let’s look at the options:
Jaeger Agent as a DaemonSet
Installing the agent as a deamonset is the simplest and most economical option. This will provide one Agent instance on the node, serving all the pods on that node.
This strategy may, however, prove too simple for production environments that involve multi-tenancy, security segregation requirements or multiple instances of Jaeger for different applications. If this is your case, consider deploying as a sidecar (below).
Jaeger Agent as a Sidecar
The sidecar option means the agent will run as an additional container with each pod. This setup can support a multi-tenant environment, where each tenant has its respective Jaeger Collector, and each agent can be configured to ship to its relevant Collector. You can also get more control over memory allocation, which can prevent cannibalization by specific tenants. Security configuration is simpler as well when running in the same pod. The sidecar approach naturally comes with the overhead of the additional containers. Some installation tools can auto-inject the agent sidecar and simplify management.
Installation tools for Jaeger
Now that we know which components we should deploy, as well as the strategy, let’s see which tools can help us put this plan into action:
- Manually using kubectl: If you need a quick start and don’t want to bother with automation, then this may be suitable for you. For production deployments, this would be less recommended. It was the official way supported by the Jaeger community, with curated YAML templates provided in this repo. However it has been deprecated recently (May 2020). Another option for manual execution is to use Jaeger Operator to generate a static manifest file: run jaeger-operator generate to generate the YAML, then kubectl apply to manually apply it to your environment. This is currently an experimental feature of Jaeger Operator, so use with caution.
- Kubernetes Operator: Jaeger Operator implements the popular Operator pattern for Kubernetes, so you can have a designated Controller manage your Jaeger backend as a custom resource. It will deploy the Jaeger Agent as a sidecar by default. If you run the controller in your cluster as a Deployment, then the Jaeger Operator can also auto-inject Jaeger Agent sidecars, saving you the need to manually define it in your specification. You can also set the agent strategy to DaemonSet. One thing to note is that the Jaeger Operator seems to fall short when using external persistent storage based on gRPC plugins. If that’s your case, you may prefer to use Helm. Check out the Jaeger Operator repo for full details.
- Helm Chart: This option has the advantages of a full package manager, and if you use Helm to manage the other applications in your production environment (such as the persistent storage used by Jaeger) then it would be your natural choice. You can find the official Jaeger charts in this repo, but note that it is still stated as beta. The chart will install a Jaeger Agent as a DaemonSet by default. Note that you can also use Helm to install Jaeger Operator (see the chart here).
Ingesting Zipkin Traces With Jaeger Tracing
Up till now we’ve talked about Jaeger spans ingestion. But there are quite a few systems with Zipkin instrumentation out there, so it’s worth noting that Jaeger can accept spans in Zipkin formats as well, namely Thrift, JSON v1/v2 and Protobuf.
If your Jaeger backend deployment is meant to ingest Zipkin protocols:
- Jaeger Agent is not relevant for gathering Zipkin spans.
- Your Zipkin instrumentation should ship the Zipkin spans directly to the Jaeger Collector. Zipkin spans can be submitting via POST requests to the following RESTful endpoints:
- /api/v1/spans for Zipkin JSON v1 or Zipkin Thrift format
- /api/v2/spans for Zipkin JSON v2
- Jaeger Collector should be configured to ingest Zipkin spans on a designated HTTP port, with the flag
--collector.zipkin.http-port=9411(port 9411 is used by Zipkin collectors).
Jaeger is a fairly young project, born in the Kubernetes sphere, with a strong community providing Kubernetes deployment best practices and automation. However, as a young project, the best practices for managing it in production are still shaping up, and it takes some careful consideration to run it in production in a way that suits your organization, while catching up on community updates. We at Logz.io offer distributed tracing as a service based on Jaeger, as we do with log management based on open source ELK Stack and with metrics based on Prometheus, so you can adopt the leading open source Observability projects without having to operate them yourself. Join our beta program and try it out yourself.